domains by tb0hdan

World’s single largest Internet domains dataset

updated at May 8, 2024, 6:58 a.m.

HTML

30 +1

643 +2

103 +0

GitHub
geo-maps by simonepri

🗺 High Quality GeoJSON maps programmatically generated.

updated at May 6, 2024, 5:55 p.m.

JavaScript

25 +0

1,235 +1

65 +0

GitHub
uber-tlc-foil-response by fivethirtyeight

Uber trip data from a freedom of information request to NYC's Taxi & Limousine Commission

updated at May 6, 2024, 1:50 a.m.

Unknown languages

70 +0

709 +1

374 +0

GitHub
congresstweets by alexlitel

Datasets of the daily Twitter output of Congress.

updated at May 5, 2024, 4:24 a.m.

SCSS

7 +0

100 +0

38 +0

GitHub
medal by McGill-NLP

Large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain

updated at May 4, 2024, 4:36 a.m.

Python

11 +0

209 +0

36 +0

GitHub
twofishes by foursquare

MOVED - The project is still under development but this page is deprecated.

updated at May 3, 2024, 9:37 p.m.

Scala

203 +0

434 +0

64 +0

GitHub
lemon-dataset by softwaremill

Lemons quality control dataset

updated at May 1, 2024, 6:10 a.m.

Unknown languages

5 +0

99 +0

12 +0

GitHub
data-2C-beyond-the-limit-usa by washingtonpost

The Washington Post's analysis of NOAA climate change data for the contiguous United States

updated at April 29, 2024, 8:08 a.m.

HTML

4 +0

60 +0

20 +0

GitHub
All-Age-Faces-Dataset by JingchunCheng

All-Age-Faces (AAF) Database.

updated at April 26, 2024, 6:52 a.m.

Unknown languages

4 +0

173 +0

16 +0

GitHub
collection by tategallery

Tate Collection metadata

updated at April 13, 2024, 6:16 p.m.

Python

59 +0

505 +0

187 +0

GitHub
38-Cloud-A-Cloud-Segmentation-Dataset by SorourMo

This data set includes Landsat 8 images and their manually extracted pixel-level ground truths for cloud detection.

updated at April 12, 2024, 8:25 a.m.

MATLAB

6 +0

138 +0

37 +0

GitHub
SaudiNewsNet by ParallelMazen

This repo contains a set of Arabic newspaper articles alongwith metadata, extracted from various Saudi newspapers.

updated at April 10, 2024, 1:19 a.m.

Unknown languages

7 +0

66 +0

15 +0

GitHub
skytrax-reviews-dataset by quankiquanki

An air travel dataset consisting of user reviews from Skytrax (www.airlinequality.com)

updated at April 2, 2024, 5:40 p.m.

Python

1 +0

70 +0

39 +0

GitHub
tracebase by areinhardt

The tracebase appliance-level power consumption data set

updated at April 2, 2024, 5:40 p.m.

Unknown languages

2 +0

36 +0

15 +0

GitHub
dbfc-dataset by ECSIM

Single DBFC Dataset

updated at April 1, 2024, 8:02 a.m.

Jupyter Notebook

4 +0

21 +0

3 +0

GitHub
pem-dataset1 by ECSIM

Proton Exchange Membrane (PEM) Fuel Cell Dataset

updated at April 1, 2024, 8:02 a.m.

Jupyter Notebook

6 +0

76 +0

23 +0

GitHub
JsonOfCounties by evangambit

A repo containing various data (demographics, employment, etc.) in JSON form.

updated at March 29, 2024, 10 a.m.

Python

7 +0

55 +0

10 +0

GitHub
CubePlusPlus by Visillect

Cube++ is a novel dataset collected for illumination estimation problem. It has 4890 raw 18-megapixel images, each containing a SpyderCube color target in their scenes, manually labelled categories, and ground truth illumination chromaticities.

updated at March 15, 2024, 6:44 a.m.

Python

13 +0

49 +0

5 +0

GitHub
usa-soccer by gavinr

USA soccer teams - location and metadata

updated at Feb. 11, 2024, 4:12 p.m.

JavaScript

5 +0

14 +0

12 +0

GitHub
reversegeocode by kno10

Simple but fast reverse geocoding up to city granularitiy level

updated at Feb. 4, 2024, 7:35 a.m.

Java

7 +0

55 +0

7 +0

GitHub