nessie by projectnessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

updated at May 26, 2024, 10:48 p.m.

Java

27 +0

866 +8

116 +0

GitHub
flocker by ClusterHQ

Container data volume manager for your Dockerized application

updated at May 26, 2024, 7:17 p.m.

Python

168 +0

3,376 +1

286 +0

GitHub
dqo by dqops

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

updated at May 26, 2024, 4:34 p.m.

Java

5 +0

62 +6

12 +0

GitHub
prometheus by prometheus

The Prometheus monitoring system and time series database.

updated at May 26, 2024, 4:07 p.m.

Go

1,130 -2

53,200 +114

8,813 +17

GitHub
faust by faust-streaming

Python Stream Processing. A Faust fork

updated at May 26, 2024, 11:22 a.m.

Python

28 +0

1,482 +12

173 +0

GitHub
lakeFS by treeverse

lakeFS - Data version control for your data lake | Git for data

updated at May 26, 2024, 10:25 a.m.

Go

40 +0

4,113 +17

330 +1

GitHub
rqlite by rqlite

The lightweight, distributed relational database built on SQLite.

updated at May 26, 2024, 9:27 a.m.

Go

227 +0

15,018 +47

688 +2

GitHub
kafka-docker by wurstmeister

Dockerfile for Apache Kafka

updated at May 26, 2024, 8:24 a.m.

Shell

162 +1

6,873 +6

2,734 -2

GitHub
DataProfiler by capitalone

What's in your data? Extract schema, statistics and entities from datasets

updated at May 26, 2024, 7:23 a.m.

Python

21 +0

1,369 +0

157 +1

GitHub
datacompy by capitalone

Pandas and Spark DataFrame comparison for humans and more!

updated at May 26, 2024, 7:20 a.m.

Python

25 +0

403 +6

124 +2

GitHub
librdkafka by confluentinc

The Apache Kafka C/C++ library

updated at May 26, 2024, 6:56 a.m.

C

413 +2

7,329 +6

3,110 +1

GitHub
scylladb by scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra

updated at May 26, 2024, 6:10 a.m.

C++

337 +0

12,705 +46

1,223 +8

GitHub
airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

updated at May 26, 2024, 4:37 a.m.

Python

755 +1

34,811 +81

13,643 +20

GitHub
cayley by cayleygraph

An open-source graph database

updated at May 26, 2024, 4:34 a.m.

Go

576 +0

14,779 +6

1,251 +0

GitHub
seaweedfs by seaweedfs

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

updated at May 26, 2024, 4:12 a.m.

Go

535 +0

21,324 +74

2,187 +4

GitHub
tidb by pingcap

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://www.pingcap.com/tidb-serverless/

updated at May 26, 2024, 4:08 a.m.

Go

1,266 +1

36,298 +24

5,723 +3

GitHub
dagster by dagster-io

An orchestration platform for the development, production, and observation of data assets.

updated at May 26, 2024, 3:57 a.m.

Python

116 +1

10,440 +51

1,304 +10

GitHub
weave by weaveworks

Simple, resilient multi-host containers networking and more.

updated at May 26, 2024, 1:48 a.m.

Go

233 -1

6,596 +5

662 +0

GitHub
superset by apache

Apache Superset is a Data Visualization and Data Exploration Platform

updated at May 26, 2024, 1:34 a.m.

TypeScript

1,500 +2

59,593 +120

12,760 +51

GitHub
ekuiper by lf-edge

Lightweight data stream processing engine for IoT edge

updated at May 26, 2024, 1:08 a.m.

Go

41 +0

1,378 +5

393 +1

GitHub