multiwoven by Multiwoven

🔥 Open Source Reverse ETL and Customer Data Platform (CDP). An open-source alternative to tools like Hightouch, Census, and RudderStack.

created at Oct. 20, 2023, 3:21 p.m.

Ruby

12 +0

617 +4

27 +3

GitHub
pace by getstrm

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.

created at Oct. 18, 2023, 12:49 p.m.

Kotlin

3 +0

31 +0

0 +0

GitHub
dqo by dqops

Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.

created at March 8, 2022, 3:18 p.m.

Java

5 +0

52 +0

11 +0

GitHub
zilla by aklivity

🦎 A multi-protocol, event-native proxy. Securely interface web apps, IoT clients, & microservices to Apache Kafka® via declaratively defined, stateless APIs.

created at Dec. 7, 2021, 10:10 p.m.

Java

9 +0

483 +1

47 +1

GitHub
DataProfiler by capitalone

What's in your data? Extract schema, statistics and entities from datasets

created at Nov. 9, 2020, 3:20 p.m.

Python

21 +0

1,359 +5

154 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

created at Oct. 26, 2020, 1:56 p.m.

Scala

16 +0

334 +0

50 +0

GitHub
faust by faust-streaming

Python Stream Processing. A Faust fork

created at Oct. 22, 2020, 3:32 p.m.

Python

28 +0

1,445 +6

170 -1

GitHub
hstream by hstreamdb

HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.

created at Aug. 31, 2020, 9:42 a.m.

Haskell

23 +0

691 +0

56 +0

GitHub
nessie by projectnessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

created at April 9, 2020, 6:39 p.m.

Java

28 +0

831 +10

115 -1

GitHub
lakeFS by treeverse

lakeFS - Data version control for your data lake | Git for data

created at Sept. 12, 2019, 11:46 a.m.

Go

40 +0

4,059 +6

329 +2

GitHub
rudder-server by rudderlabs

Privacy and Security focused Segment-alternative, in Golang and React

created at July 19, 2019, 9:24 a.m.

Go

61 +0

3,923 +11

289 +3

GitHub
ekuiper by lf-edge

Lightweight data stream processing engine for IoT edge

created at July 3, 2019, 7:37 a.m.

Go

41 +0

1,357 +5

381 +3

GitHub
aws-sdk-pandas by aws

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

created at Feb. 26, 2019, 1:39 a.m.

Python

61 +0

3,799 +8

666 +2

GitHub
dagster by dagster-io

An orchestration platform for the development, production, and observation of data assets.

created at April 30, 2018, 4:30 p.m.

Python

112 +0

10,180 +58

1,259 +6

GitHub
datacompy by capitalone

Pandas and Spark DataFrame comparison for humans and more!

created at March 23, 2018, 1:16 p.m.

Python

25 +0

379 +2

118 +0

GitHub
bistro by asavinov

A general-purpose data analysis engine radically changing the way batch and stream data is processed

created at Nov. 9, 2017, 3:42 p.m.

Java

2 +0

7 +0

0 +0

GitHub
incubator-hivemall by apache

Mirror of Apache Hivemall (incubating)

created at Sept. 15, 2016, 7 a.m.

Java

32 +0

310 +0

119 +0

GitHub
timely by NationalSecurityAgency

Accumulo backed time series database

created at April 12, 2016, 9:33 p.m.

CSS

51 +0

374 +0

110 +0

GitHub
Gaffer by gchq

A large-scale entity and relation database supporting aggregation of properties

created at Dec. 14, 2015, 12:12 p.m.

Java

143 +0

1,731 +1

353 +0

GitHub
mysql_utils by pinterest

Pinterest MySQL Management Tools

created at Oct. 24, 2015, 5:33 p.m.

Python

72 +0

878 +0

146 +0

GitHub