kafka-docker by wurstmeister

Dockerfile for Apache Kafka

created at Dec. 23, 2013, 10:01 p.m.

Shell

162 +0

6,940 +1

2,729 +1

GitHub
weave by weaveworks

Simple, resilient multi-host containers networking and more.

created at Aug. 18, 2014, 5:19 a.m.

Go

228 +0

6,619 -1

671 +1

GitHub
kryo by EsotericSoftware

Java binary serialization and cloning: fast, efficient, automatic

created at Nov. 6, 2013, 1:24 p.m.

HTML

289 +0

6,214 +7

828 +1

GitHub
snappy by google

A fast compressor/decompressor

created at March 3, 2014, 9:58 p.m.

C++

194 +0

6,196 +11

985 +1

GitHub
kcat by edenhill

Generic command line non-JVM Apache Kafka producer and consumer

created at March 30, 2014, 4:25 a.m.

C

77 +0

5,461 +6

484 +0

GitHub
opentsdb by OpenTSDB

A scalable, distributed Time Series Database.

created at Aug. 27, 2010, 2:05 a.m.

Java

334 +0

5,006 +4

1,247 +0

GitHub
zombodb by zombodb

Making Postgres and Elasticsearch work together like it's 2023

created at July 17, 2015, 4:53 p.m.

PLpgSQL

92 +0

4,685 +1

212 +0

GitHub
lakeFS by treeverse

lakeFS - Data version control for your data lake | Git for data

created at Sept. 12, 2019, 11:46 a.m.

Go

44 +0

4,468 +6

359 +0

GitHub
rudder-server by rudderlabs

Privacy and Security focused Segment-alternative, in Golang and React

created at July 19, 2019, 9:24 a.m.

Go

62 -1

4,103 +5

318 +1

GitHub
aws-sdk-pandas by aws

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

created at Feb. 26, 2019, 1:39 a.m.

Python

60 +0

3,938 -1

702 +1

GitHub
flocker by ClusterHQ

Container data volume manager for your Dockerized application

created at April 28, 2014, 6:02 p.m.

Python

169 +0

3,389 -1

290 +0

GitHub
heka by mozilla-services

DEPRECATED: Data collection and processing made easy.

created at Oct. 16, 2012, 5:20 p.m.

Go

203 +0

3,389 +0

528 +0

GitHub
flockdb by twitter-archive

A distributed, fault-tolerant graph database

created at April 12, 2010, 3:53 a.m.

Scala

278 +0

3,338 +1

258 +0

GitHub
smart_open by piskvorky

Utils for streaming large files (S3, HDFS, gzip, bz2...)

created at Jan. 2, 2015, 1:05 p.m.

Python

47 +0

3,221 +3

382 +0

GitHub
elasticsearch-jdbc by jprante

JDBC importer for Elasticsearch

created at June 2, 2012, 11:17 p.m.

Java

230 +0

2,838 +1

709 +0

GitHub
kafka-node by SOHU-Co

Node.js client for Apache Kafka 0.8 and later.

created at Oct. 23, 2013, 3:34 a.m.

JavaScript

97 +0

2,664 -1

628 +0

GitHub
pipelinedb by pipelinedb

High-performance time-series aggregation for PostgreSQL

created at Nov. 26, 2013, 12:11 a.m.

C

104 +0

2,637 +3

241 +0

GitHub
pyxley by stitchfix

Python helpers for building dashboards using Flask and React

created at June 22, 2015, 10:23 p.m.

JavaScript

279 +0

2,271 +1

258 +0

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

created at Dec. 1, 2014, 6:10 p.m.

Java

165 +0

2,228 -4

750 -1

GitHub
hamilton by DAGWorks-Inc

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

created at Feb. 23, 2023, 5:16 p.m.

Jupyter Notebook

17 +0

1,885 +7

126 +1

GitHub