3w_dataset by ricardovvargas

The first realistic and public dataset with rare undesirable real events in oil wells.

updated at May 8, 2024, 9:01 a.m.

Jupyter Notebook

13 +0

105 +0

57 +0

GitHub
American-Gut by biocore

American Gut open-access data and IPython notebooks

updated at May 9, 2024, 3:30 a.m.

Jupyter Notebook

32 +0

105 +0

81 +0

GitHub
TCPD by alan-turing-institute

The Turing Change Point Dataset - A collection of time series for the evaluation and development of change point detection algorithms

updated at May 10, 2024, 4:42 a.m.

Python

8 +0

131 +0

28 +0

GitHub
lemon-dataset by softwaremill

Lemons quality control dataset

updated at May 15, 2024, 8:50 a.m.

Unknown languages

5 +0

100 +0

12 +0

GitHub
usa-soccer by gavinr

USA soccer teams - location and metadata

updated at May 16, 2024, 10:19 a.m.

JavaScript

5 +0

15 +0

12 +0

GitHub
open-data by freeCodeCamp

None

updated at May 16, 2024, 11:03 p.m.

HTML

16 +0

156 +0

41 +0

GitHub
covid-19-data by yahoo

COVID-19 datasets are constructed entirely from primary (government and public agency) sources

updated at May 18, 2024, 9:55 a.m.

Unknown languages

31 +0

110 +0

25 +0

GitHub
38-Cloud-A-Cloud-Segmentation-Dataset by SorourMo

This data set includes Landsat 8 images and their manually extracted pixel-level ground truths for cloud detection.

updated at May 19, 2024, 9:21 a.m.

MATLAB

6 +0

139 +0

37 +0

GitHub
All-Age-Faces-Dataset by JingchunCheng

All-Age-Faces (AAF) Database.

updated at May 22, 2024, 2:07 a.m.

Unknown languages

4 +0

174 +0

16 +0

GitHub
medal by McGill-NLP

Large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain

updated at May 22, 2024, 2:40 p.m.

Python

11 +0

214 +0

36 +0

GitHub
caption-contest-data by nextml

Data from the caption contest.

updated at May 22, 2024, 5:15 p.m.

HTML

7 +0

5 +0

2 +0

GitHub
awesome-citygml by OloOcki

The ultimate list of open data semantic 3D city models

updated at May 23, 2024, 7:32 a.m.

Unknown languages

7 +0

187 +0

24 +1

GitHub
gun-violence-data by jamesqo

A comprehensive, accessible database that contains records of over 260k US gun violence incidents from January 2013 to March 2018.

updated at May 23, 2024, 3:48 p.m.

Python

0 +0

4 +0

2 +0

GitHub
SaudiNewsNet by ParallelMazen

This repo contains a set of Arabic newspaper articles alongwith metadata, extracted from various Saudi newspapers.

updated at May 26, 2024, 9:55 a.m.

Unknown languages

7 +0

65 +0

15 +0

GitHub
open-traffic-collection by graphhopper

Collection of open data resources for traffic information

updated at May 27, 2024, 6:47 p.m.

Unknown languages

27 +1

383 +1

49 +0

GitHub
tennis_wta by JeffSackmann

WTA Tennis Rankings, Results, and Stats

updated at May 27, 2024, 7:20 p.m.

Unknown languages

30 +0

212 +1

143 +1

GitHub
shopper-intent-prediction-nature-2020 by coveooss

🏟

updated at May 28, 2024, 9:57 a.m.

Unknown languages

6 +0

25 +1

5 +0

GitHub
pudl by catalyst-cooperative

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.

updated at May 29, 2024, 8:01 a.m.

Python

18 +0

450 +1

106 +0

GitHub
covid-19-data by NYTimes

A repository of data on coronavirus cases and deaths in the U.S.

updated at May 30, 2024, 4:31 p.m.

Unknown languages

318 +0

6,993 -1

3,471 +0

GitHub
tennis_atp by JeffSackmann

ATP Tennis Rankings, Results, and Stats

updated at May 31, 2024, 3:46 p.m.

Unknown languages

97 +1

944 +1

594 +1

GitHub