🔥 Open Source Reverse ETL and Customer Data Platform (CDP). An open-source alternative to tools like Hightouch, Census, and RudderStack.
created at Oct. 20, 2023, 3:21 p.m.
Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.
created at Oct. 18, 2023, 12:49 p.m.
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.
created at March 8, 2022, 3:18 p.m.
What's in your data? Extract schema, statistics and entities from datasets
created at Nov. 9, 2020, 3:20 p.m.
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
created at Oct. 26, 2020, 1:56 p.m.
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
created at April 9, 2020, 6:39 p.m.
Privacy and Security focused Segment-alternative, in Golang and React
created at July 19, 2019, 9:24 a.m.
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
created at Feb. 26, 2019, 1:39 a.m.
An orchestration platform for the development, production, and observation of data assets.
created at April 30, 2018, 4:30 p.m.
Pandas and Spark DataFrame comparison for humans and more!
created at March 23, 2018, 1:16 p.m.
Mirror of Apache Hivemall (incubating)
created at Sept. 15, 2016, 7 a.m.
Accumulo backed time series database
created at April 12, 2016, 9:33 p.m.