igorbarinov/awesome-data-engineering

kestra by kestra-io

Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...

created at Aug. 24, 2019, 1:56 p.m.

Java

164 +0

13,682 +496

1,177 +26

GitHub

druid by apache

Apache Druid: a high performance real-time analytics database.

created at Oct. 23, 2012, 7:08 p.m.

Java

584 -1

13,526 +3

3,707 +2

GitHub

opentsdb by OpenTSDB

A scalable, distributed Time Series Database.

created at Aug. 27, 2010, 2:05 a.m.

Java

334 +0

5,006 +4

1,247 +0

GitHub

elasticsearch-jdbc by jprante

JDBC importer for Elasticsearch

created at June 2, 2012, 11:17 p.m.

Java

230 +0

2,838 +1

709 +0

GitHub

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

created at Dec. 1, 2014, 6:10 p.m.

Java

165 +0

2,228 -4

750 -1

GitHub

secor by pinterest

Secor is a service implementing Kafka log persistence

created at April 15, 2014, 10:26 p.m.

Java

68 +0

1,845 -2

540 +0

GitHub

Gaffer by gchq

A large-scale entity and relation database supporting aggregation of properties

created at Dec. 14, 2015, 12:12 p.m.

Java

138 +0

1,772 +2

354 +0

GitHub

kairosdb by kairosdb

Fast scalable time series database

created at Feb. 5, 2013, 10:27 p.m.

Java

115 -1

1,738 -3

344 +0

GitHub

nessie by projectnessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

created at April 9, 2020, 6:39 p.m.

Java

31 +0

1,049 +5

133 +3

GitHub

heroic by spotify

The Heroic Time Series Database

created at May 29, 2015, 5:20 a.m.

Java

58 +0

848 +0

109 +0

GitHub

blueflood by rax-maas

A distributed system designed to ingest and process time series data

created at May 15, 2013, 2:50 p.m.

Java

96 +0

595 -1

102 +0

GitHub

zilla by aklivity

🦎 A multi-protocol edge & service proxy. Seamlessly interface web apps, IoT clients, & microservices to Apache Kafka® via declaratively defined, stateless APIs.

created at Dec. 7, 2021, 10:10 p.m.

Java

8 +0

550 +2

50 +0

GitHub

incubator-hivemall by apache

Mirror of Apache Hivemall (incubating)

created at Sept. 15, 2016, 7 a.m.

Java

32 +0

311 +1

119 +0

GitHub

deep-spark by Stratio

Connecting Apache Spark with different data stores [DEPRECATED]

created at Feb. 18, 2014, 8:34 a.m.

Java

114 +0

197 +0

42 +0

GitHub

dqo by dqops

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

created at March 8, 2022, 3:18 p.m.

Java

7 +0

114 +2

17 +0

GitHub

bistro by asavinov

A general-purpose data analysis engine radically changing the way batch and stream data is processed

created at Nov. 9, 2017, 3:42 p.m.

Java

2 +0

7 +0

0 +0

GitHub

kestra by kestra-io

druid by apache

opentsdb by OpenTSDB

elasticsearch-jdbc by jprante

gobblin by apache

secor by pinterest

Gaffer by gchq

kairosdb by kairosdb

nessie by projectnessie

heroic by spotify

blueflood by rax-maas

zilla by aklivity

incubator-hivemall by apache

deep-spark by Stratio

dqo by dqops

bistro by asavinov