nessie by projectnessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

updated at May 18, 2024, 11:06 p.m.

Java

27 +0

858 +7

116 +0

GitHub
dqo by dqops

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

updated at May 18, 2024, 2:49 p.m.

Java

5 +0

56 +1

12 +0

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

updated at May 17, 2024, 9:19 p.m.

Java

166 +0

2,193 +1

743 +0

GitHub
zilla by aklivity

🦎 A multi-protocol, event-native proxy. Securely interface web apps, IoT clients, & microservices to Apache Kafka® via declaratively defined, stateless APIs.

updated at May 17, 2024, 8:22 p.m.

Java

9 +0

491 +3

47 +0

GitHub
druid by apache

Apache Druid: a high performance real-time analytics database.

updated at May 17, 2024, 6:11 p.m.

Java

590 -1

13,213 -2

3,643 +4

GitHub
elasticsearch-jdbc by jprante

JDBC importer for Elasticsearch

updated at May 17, 2024, 7:27 a.m.

Java

231 +0

2,841 +1

712 +1

GitHub
Gaffer by gchq

A large-scale entity and relation database supporting aggregation of properties

updated at May 16, 2024, 10:05 a.m.

Java

140 -1

1,737 +1

354 +0

GitHub
secor by pinterest

Secor is a service implementing Kafka log persistence

updated at May 16, 2024, 7:08 a.m.

Java

70 +0

1,837 +1

541 +0

GitHub
heroic by spotify

The Heroic Time Series Database

updated at May 14, 2024, 8:06 p.m.

Java

58 +0

845 +2

109 +0

GitHub
opentsdb by OpenTSDB

A scalable, distributed Time Series Database.

updated at May 11, 2024, 1:01 a.m.

Java

336 +0

4,956 +0

1,252 -1

GitHub
kairosdb by kairosdb

Fast scalable time series database

updated at May 10, 2024, 1:17 p.m.

Java

118 +0

1,728 +0

344 +0

GitHub
incubator-hivemall by apache

Mirror of Apache Hivemall (incubating)

updated at April 6, 2024, 6:43 a.m.

Java

32 +0

310 +0

119 +0

GitHub
blueflood by rax-maas

A distributed system designed to ingest and process time series data

updated at April 3, 2024, 8:32 p.m.

Java

95 +0

592 +0

102 +0

GitHub
bistro by asavinov

A general-purpose data analysis engine radically changing the way batch and stream data is processed

updated at Feb. 7, 2024, 7:30 p.m.

Java

2 +0

7 +0

0 +0

GitHub
deep-spark by Stratio

Connecting Apache Spark with different data stores [DEPRECATED]

updated at Jan. 1, 2024, 6:17 p.m.

Java

115 +0

197 +0

42 +0

GitHub