datacompy by capitalone

Pandas and Spark DataFrame comparison for humans and more!

updated at May 3, 2024, 8:43 a.m.

Python

25 +0

389 +6

122 +2

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

updated at May 2, 2024, 9:14 p.m.

Java

167 +0

2,190 +0

742 +0

GitHub
kafka-docker by wurstmeister

Dockerfile for Apache Kafka

updated at May 2, 2024, 8:13 p.m.

Shell

160 +0

6,848 +7

2,717 -2

GitHub
kafkat by airbnb

KafkaT-ool

updated at May 2, 2024, 4:43 p.m.

Ruby

243 +0

503 -1

86 +0

GitHub
zilla by aklivity

🦎 A multi-protocol, event-native proxy. Securely interface web apps, IoT clients, & microservices to Apache Kafka® via declaratively defined, stateless APIs.

updated at May 2, 2024, 3:48 p.m.

Java

9 +0

486 +0

47 +0

GitHub
smart_open by piskvorky

Utils for streaming large files (S3, HDFS, gzip, bz2...)

updated at May 2, 2024, 12:46 p.m.

Python

49 +0

3,094 +1

378 +0

GitHub
DataProfiler by capitalone

What's in your data? Extract schema, statistics and entities from datasets

updated at May 2, 2024, 2:22 a.m.

Python

21 +0

1,363 +1

154 +0

GitHub
FiloDB by filodb

Distributed Prometheus time series database

updated at May 1, 2024, 4:06 p.m.

Scala

89 +0

1,413 +0

223 +0

GitHub
Gaffer by gchq

A large-scale entity and relation database supporting aggregation of properties

updated at May 1, 2024, 11:33 a.m.

Java

142 +0

1,734 +1

354 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

updated at April 30, 2024, 9:48 p.m.

Scala

16 +0

335 +1

50 +0

GitHub
kafka-node by SOHU-Co

Node.js client for Apache Kafka 0.8 and later.

updated at April 30, 2024, 9:57 a.m.

JavaScript

99 +0

2,659 +1

630 +0

GitHub
opentsdb by OpenTSDB

A scalable, distributed Time Series Database.

updated at April 29, 2024, 1:49 p.m.

Java

337 +0

4,951 +2

1,253 +0

GitHub
ccm by riptano

A script to easily create and destroy an Apache Cassandra cluster on localhost

updated at April 29, 2024, 12:45 p.m.

Python

76 +0

1,212 +0

302 +0

GitHub
Akumuli by akumuli

Time-series database

updated at April 28, 2024, 8:05 a.m.

C++

44 +0

838 +1

86 +0

GitHub
flockdb by twitter-archive

A distributed, fault-tolerant graph database

updated at April 27, 2024, 5:35 p.m.

Scala

279 +0

3,330 +0

273 +0

GitHub
zodiac by CenturyLinkLabs

A lightweight tool for easy deployment and rollback of dockerized applications.

updated at April 25, 2024, 7:03 p.m.

Go

22 +0

194 +0

20 +0

GitHub
mysql_utils by pinterest

Pinterest MySQL Management Tools

updated at April 25, 2024, 6:37 a.m.

Python

72 +0

879 +0

141 +0

GitHub
elasticsearch-jdbc by jprante

JDBC importer for Elasticsearch

updated at April 23, 2024, 2:40 a.m.

Java

231 +0

2,838 +0

711 -1

GitHub
haproxy_exporter by prometheus

Simple server that scrapes HAProxy stats and exports them via HTTP for Prometheus consumption

updated at April 22, 2024, 5:30 p.m.

Go

30 +0

609 +0

219 +0

GitHub
secor by pinterest

Secor is a service implementing Kafka log persistence

updated at April 22, 2024, 8:31 a.m.

Java

70 +0

1,835 +0

541 +0

GitHub