38-Cloud-A-Cloud-Segmentation-Dataset by SorourMo

This data set includes Landsat 8 images and their manually extracted pixel-level ground truths for cloud detection.

updated at April 12, 2024, 8:25 a.m.

MATLAB

6 +0

138 +0

37 +0

GitHub
collection by tategallery

Tate Collection metadata

updated at April 13, 2024, 6:16 p.m.

Python

59 +0

505 +0

187 +0

GitHub
All-Age-Faces-Dataset by JingchunCheng

All-Age-Faces (AAF) Database.

updated at April 26, 2024, 6:52 a.m.

Unknown languages

4 +0

173 +0

16 +0

GitHub
data-2C-beyond-the-limit-usa by washingtonpost

The Washington Post's analysis of NOAA climate change data for the contiguous United States

updated at April 29, 2024, 8:08 a.m.

HTML

4 +0

60 +0

20 +0

GitHub
lemon-dataset by softwaremill

Lemons quality control dataset

updated at May 1, 2024, 6:10 a.m.

Unknown languages

5 +0

99 +0

12 +0

GitHub
twofishes by foursquare

MOVED - The project is still under development but this page is deprecated.

updated at May 3, 2024, 9:37 p.m.

Scala

203 +0

434 +0

64 +0

GitHub
medal by McGill-NLP

Large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain

updated at May 4, 2024, 4:36 a.m.

Python

11 +0

209 +0

36 +0

GitHub
congresstweets by alexlitel

Datasets of the daily Twitter output of Congress.

updated at May 5, 2024, 4:24 a.m.

SCSS

7 +0

100 +0

38 +0

GitHub
uber-tlc-foil-response by fivethirtyeight

Uber trip data from a freedom of information request to NYC's Taxi & Limousine Commission

updated at May 6, 2024, 1:50 a.m.

Unknown languages

70 +0

709 +1

374 +0

GitHub
geo-maps by simonepri

🗺 High Quality GeoJSON maps programmatically generated.

updated at May 6, 2024, 5:55 p.m.

JavaScript

25 +0

1,235 +1

65 +0

GitHub
domains by tb0hdan

World’s single largest Internet domains dataset

updated at May 8, 2024, 6:58 a.m.

HTML

30 +1

643 +2

103 +0

GitHub
3w_dataset by ricardovvargas

The first realistic and public dataset with rare undesirable real events in oil wells.

updated at May 8, 2024, 9:01 a.m.

Jupyter Notebook

13 +0

105 +1

56 +0

GitHub
countries by mledoze

World countries in JSON, CSV, XML and Yaml. Any help is welcome!

updated at May 8, 2024, 9:16 a.m.

PHP

159 +0

5,895 +3

1,261 -1

GitHub
caption-contest-data by nextml

Data from the caption contest.

updated at May 8, 2024, 5:19 p.m.

HTML

7 +0

5 +0

2 +0

GitHub
American-Gut by biocore

American Gut open-access data and IPython notebooks

updated at May 9, 2024, 3:30 a.m.

Jupyter Notebook

32 +0

105 +1

81 +0

GitHub
country-list by umpirsky

globe with meridians List of all countries with names and ISO 3166-1 codes in all languages and data formats.

updated at May 9, 2024, 5:08 a.m.

HTML

155 +0

5,119 -1

1,545 +1

GitHub
pudl by catalyst-cooperative

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.

updated at May 9, 2024, 4:59 p.m.

Python

18 +0

447 +1

104 +0

GitHub
List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words by LDNOOBW

List of Dirty, Naughty, Obscene, and Otherwise Bad Words

updated at May 9, 2024, 11:18 p.m.

Unknown languages

71 +0

2,776 +6

655 +0

GitHub
TCPD by alan-turing-institute

The Turing Change Point Dataset - A collection of time series for the evaluation and development of change point detection algorithms

updated at May 10, 2024, 4:42 a.m.

Python

8 +0

131 +1

28 +0

GitHub
awesome-citygml by OloOcki

The ultimate list of open data semantic 3D city models

updated at May 10, 2024, 8:20 a.m.

Unknown languages

7 +0

185 +2

24 +0

GitHub