Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
created at Sept. 16, 2015, 10:36 a.m.
Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.
created at Oct. 23, 2015, 12:25 a.m.
Accumulo backed time series database
created at April 12, 2016, 9:33 p.m.
Mirror of Apache Hivemall (incubating)
created at Sept. 15, 2016, 7 a.m.
Pandas and Spark DataFrame comparison for humans and more!
created at March 23, 2018, 1:16 p.m.
An orchestration platform for the development, production, and observation of data assets.
created at April 30, 2018, 4:30 p.m.
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
created at Feb. 26, 2019, 1:39 a.m.
Privacy and Security focused Segment-alternative, in Golang and React
created at July 19, 2019, 9:24 a.m.
Nessie: Transactional Catalog for Data Lakes with Git-like semantics
created at April 9, 2020, 6:39 p.m.
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
created at Oct. 26, 2020, 1:56 p.m.
What's in your data? Extract schema, statistics and entities from datasets
created at Nov. 9, 2020, 3:20 p.m.