rudder-server by rudderlabs

Privacy and Security focused Segment-alternative, in Golang and React

created at July 19, 2019, 9:24 a.m.

Go

61 +0

3,940 +8

289 +1

GitHub
lakeFS by treeverse

lakeFS - Data version control for your data lake | Git for data

created at Sept. 12, 2019, 11:46 a.m.

Go

40 +0

4,083 +17

329 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

created at Oct. 26, 2020, 1:56 p.m.

Scala

16 +0

335 +1

50 +0

GitHub
superset by apache

Apache Superset is a Data Visualization and Data Exploration Platform

created at July 21, 2015, 6:55 p.m.

TypeScript

1,498 +3

58,954 +104

12,607 +46

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

created at Dec. 1, 2014, 6:10 p.m.

Java

167 +0

2,190 +0

742 +0

GitHub
snappydata by TIBCOSoftware

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster

created at Sept. 16, 2015, 10:36 a.m.

Scala

84 +0

1,037 +0

203 +0

GitHub
hstream by hstreamdb

HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.

created at Aug. 31, 2020, 9:42 a.m.

Haskell

23 +0

690 -2

56 +0

GitHub
ekuiper by lf-edge

Lightweight data stream processing engine for IoT edge

created at July 3, 2019, 7:37 a.m.

Go

41 +0

1,365 +1

387 +5

GitHub
kyoto by AlticeLabsProjects

Kyoto Tycoon key-value store (and the underlying Kyoto Cabinet library)

created at Dec. 24, 2014, 5:55 p.m.

C++

29 +0

271 +0

40 +0

GitHub
kcat by edenhill

Generic command line non-JVM Apache Kafka producer and consumer

created at March 30, 2014, 4:25 a.m.

C

79 +0

5,260 +14

473 +1

GitHub
blueflood by rax-maas

A distributed system designed to ingest and process time series data

created at May 15, 2013, 2:50 p.m.

Java

95 +0

592 +0

102 +0

GitHub
faust by faust-streaming

Python Stream Processing. A Faust fork

created at Oct. 22, 2020, 3:32 p.m.

Python

28 +0

1,465 +15

171 +1

GitHub
seaweedfs by seaweedfs

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

created at July 14, 2014, 4:41 p.m.

Go

537 +0

21,123 +47

2,172 +4

GitHub
scylladb by scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra

created at Dec. 24, 2014, 1:16 p.m.

C++

340 +0

12,591 +32

1,208 +5

GitHub
aws-sdk-pandas by aws

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

created at Feb. 26, 2019, 1:39 a.m.

Python

61 +0

3,805 +3

668 +1

GitHub
librdkafka by confluentinc

The Apache Kafka C/C++ library

created at Sept. 19, 2012, 10:14 a.m.

C

409 +1

7,297 +4

3,109 +0

GitHub
zilla by aklivity

🦎 A multi-protocol, event-native proxy. Securely interface web apps, IoT clients, & microservices to Apache Kafka® via declaratively defined, stateless APIs.

created at Dec. 7, 2021, 10:10 p.m.

Java

9 +0

486 +0

47 +0

GitHub
smart_open by piskvorky

Utils for streaming large files (S3, HDFS, gzip, bz2...)

created at Jan. 2, 2015, 1:05 p.m.

Python

49 +0

3,094 +1

378 +0

GitHub
pace by getstrm

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.

created at Oct. 18, 2023, 12:49 p.m.

Kotlin

3 +0

31 +0

0 +0

GitHub
DataProfiler by capitalone

What's in your data? Extract schema, statistics and entities from datasets

created at Nov. 9, 2020, 3:20 p.m.

Python

21 +0

1,363 +1

154 +0

GitHub