snappy by google

A fast compressor/decompressor

created at March 3, 2014, 9:58 p.m.

C++

195 +0

5,995 +6

969 +1

GitHub
kryo by EsotericSoftware

Java binary serialization and cloning: fast, efficient, automatic

created at Nov. 6, 2013, 1:24 p.m.

HTML

296 -1

6,080 +6

817 +0

GitHub
gpdb by greenplum-db

Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.

created at Oct. 23, 2015, 12:25 a.m.

C

418 -1

6,203 +3

1,702 -1

GitHub
weave by weaveworks

Simple, resilient multi-host containers networking and more.

created at Aug. 18, 2014, 5:19 a.m.

Go

237 +0

6,584 +4

662 -1

GitHub
kafka-docker by wurstmeister

Dockerfile for Apache Kafka

created at Dec. 23, 2013, 10:01 p.m.

Shell

160 +0

6,848 +7

2,717 -2

GitHub
librdkafka by confluentinc

The Apache Kafka C/C++ library

created at Sept. 19, 2012, 10:14 a.m.

C

409 +1

7,297 +4

3,109 +0

GitHub
dagster by dagster-io

An orchestration platform for the development, production, and observation of data assets.

created at April 30, 2018, 4:30 p.m.

Python

114 +1

10,282 +59

1,277 +13

GitHub
CMAK by yahoo

CMAK is a tool for managing Apache Kafka clusters

created at Jan. 28, 2015, 6:33 p.m.

Scala

534 +0

11,676 +4

2,496 +0

GitHub
scylladb by scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra

created at Dec. 24, 2014, 1:16 p.m.

C++

340 +0

12,591 +32

1,208 +5

GitHub
druid by apache

Apache Druid: a high performance real-time analytics database.

created at Oct. 23, 2012, 7:08 p.m.

Java

592 -1

13,204 +9

3,637 +3

GitHub
nomad by hashicorp

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.

created at June 1, 2015, 10:21 a.m.

Go

537 -1

14,450 +20

1,894 +6

GitHub
cayley by cayleygraph

An open-source graph database

created at June 5, 2014, 6:49 p.m.

Go

577 +0

14,775 +3

1,251 +0

GitHub
rqlite by rqlite

The lightweight, distributed relational database built on SQLite.

created at Aug. 23, 2014, 4:31 a.m.

Go

228 +0

14,909 +33

681 +1

GitHub
cadvisor by google

Analyzes resource usage and performance characteristics of running containers.

created at June 9, 2014, 4:36 p.m.

Go

387 -2

16,363 +28

2,276 +1

GitHub
luigi by spotify

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

created at Sept. 20, 2012, 3:06 p.m.

Python

474 -1

17,342 +24

2,373 +1

GitHub
dash by plotly

Data Apps & Dashboards for Python. No JavaScript Required.

created at April 10, 2015, 1:53 a.m.

Python

418 +1

20,535 +37

1,992 +5

GitHub
seaweedfs by seaweedfs

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

created at July 14, 2014, 4:41 p.m.

Go

537 +0

21,123 +47

2,172 +4

GitHub
influxdb by influxdata

Scalable datastore for metrics, events, and real-time analytics

created at Sept. 26, 2013, 2:31 p.m.

Rust

740 -1

27,808 +42

3,489 +5

GitHub
airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

created at April 13, 2015, 6:04 p.m.

Python

755 +2

34,583 +70

13,573 +23

GitHub
tidb by pingcap

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://tidbcloud.com/free-trial

created at Sept. 6, 2015, 4:01 a.m.

Go

1,270 -1

36,170 +30

5,715 -1

GitHub