igorbarinov/awesome-data-engineering

gockerize by redbooth

Package golang service into minimal docker containers.

created at Aug. 4, 2015, 2:02 a.m.

Shell

65 -1

667 +0

20 +0

GitHub

tidb by pingcap

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://tidbcloud.com/free-trial

created at Sept. 6, 2015, 4:01 a.m.

Go

1,270 -1

36,170 +30

5,715 -1

GitHub

snappydata by TIBCOSoftware

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster

created at Sept. 16, 2015, 10:36 a.m.

Scala

84 +0

1,037 +0

203 +0

GitHub

gpdb by greenplum-db

Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.

created at Oct. 23, 2015, 12:25 a.m.

C

418 -1

6,203 +3

1,702 -1

GitHub

mysql_utils by pinterest

Pinterest MySQL Management Tools

created at Oct. 24, 2015, 5:33 p.m.

Python

72 +0

879 +0

141 +0

GitHub

Gaffer by gchq

A large-scale entity and relation database supporting aggregation of properties

created at Dec. 14, 2015, 12:12 p.m.

Java

142 +0

1,734 +1

354 +0

GitHub

timely by NationalSecurityAgency

Accumulo backed time series database

created at April 12, 2016, 9:33 p.m.

CSS

51 +0

374 +0

110 +0

GitHub

incubator-hivemall by apache

Mirror of Apache Hivemall (incubating)

created at Sept. 15, 2016, 7 a.m.

Java

32 +0

310 +0

119 +0

GitHub

bistro by asavinov

A general-purpose data analysis engine radically changing the way batch and stream data is processed

created at Nov. 9, 2017, 3:42 p.m.

Java

2 +0

7 +0

0 +0

GitHub

datacompy by capitalone

Pandas and Spark DataFrame comparison for humans and more!

created at March 23, 2018, 1:16 p.m.

Python

25 +0

389 +6

122 +2

GitHub

dagster by dagster-io

An orchestration platform for the development, production, and observation of data assets.

created at April 30, 2018, 4:30 p.m.

Python

114 +1

10,282 +59

1,277 +13

GitHub

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

created at Feb. 26, 2019, 1:39 a.m.

Python

61 +0

3,805 +3

668 +1

GitHub

ekuiper by lf-edge

Lightweight data stream processing engine for IoT edge

created at July 3, 2019, 7:37 a.m.

Go

41 +0

1,365 +1

387 +5

GitHub

rudder-server by rudderlabs

Privacy and Security focused Segment-alternative, in Golang and React

created at July 19, 2019, 9:24 a.m.

Go

61 +0

3,940 +8

289 +1

GitHub

lakeFS by treeverse

lakeFS - Data version control for your data lake | Git for data

created at Sept. 12, 2019, 11:46 a.m.

Go

40 +0

4,083 +17

329 +0

GitHub

nessie by projectnessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

created at April 9, 2020, 6:39 p.m.

Java

27 -1

841 +7

116 +1

GitHub

hstream by hstreamdb

HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.

created at Aug. 31, 2020, 9:42 a.m.

Haskell

23 +0

690 -2

56 +0

GitHub

faust by faust-streaming

Python Stream Processing. A Faust fork

created at Oct. 22, 2020, 3:32 p.m.

Python

28 +0

1,465 +15

171 +1

GitHub

delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

created at Oct. 26, 2020, 1:56 p.m.

Scala

16 +0

335 +1

50 +0

GitHub

DataProfiler by capitalone

What's in your data? Extract schema, statistics and entities from datasets

created at Nov. 9, 2020, 3:20 p.m.

Python

21 +0

1,363 +1

154 +0

GitHub

gockerize by redbooth

tidb by pingcap

snappydata by TIBCOSoftware

gpdb by greenplum-db

mysql_utils by pinterest

Gaffer by gchq

timely by NationalSecurityAgency

incubator-hivemall by apache

bistro by asavinov

datacompy by capitalone

dagster by dagster-io

aws-sdk-pandas by aws

ekuiper by lf-edge

rudder-server by rudderlabs

lakeFS by treeverse

nessie by projectnessie

hstream by hstreamdb

faust by faust-streaming

delight by datamechanics

DataProfiler by capitalone