Nessie: Transactional Catalog for Data Lakes with Git-like semantics
updated at April 21, 2024, 7:09 p.m.
Pandas and Spark DataFrame comparison for humans and more!
updated at April 21, 2024, 3:19 p.m.
Scalable datastore for metrics, events, and real-time analytics
updated at April 21, 2024, 2:27 p.m.
Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.
updated at April 21, 2024, 1:59 p.m.
What's in your data? Extract schema, statistics and entities from datasets
updated at April 21, 2024, 1:37 p.m.
Java binary serialization and cloning: fast, efficient, automatic
updated at April 21, 2024, 12:03 p.m.
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
updated at April 21, 2024, 1:55 a.m.
Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
updated at April 21, 2024, 1:38 a.m.
An orchestration platform for the development, production, and observation of data assets.
updated at April 21, 2024, 1:36 a.m.
Data Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.
updated at April 20, 2024, 11:29 p.m.