hstream by hstreamdb

HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.

updated at May 18, 2024, 4:01 p.m.

Haskell

23 +0

692 +0

56 +0

GitHub
kafka-docker by wurstmeister

Dockerfile for Apache Kafka

updated at May 18, 2024, 3:33 p.m.

Shell

161 +0

6,867 +5

2,736 +2

GitHub
DataProfiler by capitalone

What's in your data? Extract schema, statistics and entities from datasets

updated at May 18, 2024, 3:05 p.m.

Python

21 +0

1,369 +5

156 +0

GitHub
dqo by dqops

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

updated at May 18, 2024, 2:49 p.m.

Java

5 +0

56 +1

12 +0

GitHub
aws-sdk-pandas by aws

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

updated at May 18, 2024, 12:51 p.m.

Python

60 -1

3,820 +11

669 +0

GitHub
gpdb by greenplum-db

Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.

updated at May 18, 2024, 8:58 a.m.

C

418 +0

6,214 +3

1,708 +4

GitHub
flocker by ClusterHQ

Container data volume manager for your Dockerized application

updated at May 18, 2024, 3:54 a.m.

Python

168 +0

3,375 -1

286 +1

GitHub
CMAK by yahoo

CMAK is a tool for managing Apache Kafka clusters

updated at May 18, 2024, 1:25 a.m.

Scala

533 +0

11,679 +1

2,498 +0

GitHub
FiloDB by filodb

Distributed Prometheus time series database

updated at May 18, 2024, 12:40 a.m.

Scala

89 +0

1,413 +0

224 +0

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

updated at May 17, 2024, 9:19 p.m.

Java

166 +0

2,193 +1

743 +0

GitHub
zilla by aklivity

🦎 A multi-protocol, event-native proxy. Securely interface web apps, IoT clients, & microservices to Apache Kafka® via declaratively defined, stateless APIs.

updated at May 17, 2024, 8:22 p.m.

Java

9 +0

491 +3

47 +0

GitHub
rudder-server by rudderlabs

Privacy and Security focused Segment-alternative, in Golang and React

updated at May 17, 2024, 7:54 p.m.

Go

61 +0

3,950 +5

293 +2

GitHub
multiwoven by Multiwoven

🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack.

updated at May 17, 2024, 7:53 p.m.

Ruby

12 +0

652 +12

32 +1

GitHub
druid by apache

Apache Druid: a high performance real-time analytics database.

updated at May 17, 2024, 6:11 p.m.

Java

590 -1

13,213 -2

3,643 +4

GitHub
flockdb by twitter-archive

A distributed, fault-tolerant graph database

updated at May 17, 2024, 4:39 p.m.

Scala

279 +0

3,326 -3

263 +0

GitHub
dalmatinerdb by dalmatinerdb

See gitlab: https://gitlab.com/Project-FiFo/DalmatinerDB/dalmatinerdb

updated at May 17, 2024, 3:54 p.m.

Erlang

37 +0

696 -1

44 +0

GitHub
gockerize by redbooth

Package golang service into minimal docker containers.

updated at May 17, 2024, 12:37 p.m.

Shell

65 +0

668 +1

20 +0

GitHub
faust by faust-streaming

Python Stream Processing. A Faust fork

updated at May 17, 2024, 11:41 a.m.

Python

28 +0

1,470 +3

173 +1

GitHub
kryo by EsotericSoftware

Java binary serialization and cloning: fast, efficient, automatic

updated at May 17, 2024, 7:37 a.m.

HTML

294 +0

6,094 +4

815 -2

GitHub
elasticsearch-jdbc by jprante

JDBC importer for Elasticsearch

updated at May 17, 2024, 7:27 a.m.

Java

231 +0

2,841 +1

712 +1

GitHub