nomad by hashicorp

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.

updated at May 19, 2024, 4:24 p.m.

Go

536 +0

14,471 +8

1,897 +0

GitHub
lakeFS by treeverse

lakeFS - Data version control for your data lake | Git for data

updated at May 19, 2024, 2:44 p.m.

Go

40 +0

4,096 +7

329 +0

GitHub
seaweedfs by seaweedfs

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

updated at May 19, 2024, 1:51 p.m.

Go

535 +0

21,250 +63

2,183 +7

GitHub
dagster by dagster-io

An orchestration platform for the development, production, and observation of data assets.

updated at May 19, 2024, 11:30 a.m.

Python

115 +0

10,389 +54

1,294 +9

GitHub
rqlite by rqlite

The lightweight, distributed relational database built on SQLite.

updated at May 19, 2024, 7:38 a.m.

Go

227 +0

14,971 +39

686 +4

GitHub
cadvisor by google

Analyzes resource usage and performance characteristics of running containers.

updated at May 19, 2024, 6:41 a.m.

Go

387 +1

16,434 +43

2,278 -1

GitHub
datacompy by capitalone

Pandas and Spark DataFrame comparison for humans and more!

updated at May 19, 2024, 5:17 a.m.

Python

25 +0

397 +3

122 +0

GitHub
airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

updated at May 19, 2024, 5 a.m.

Python

754 -2

34,730 +86

13,623 +27

GitHub
superset by apache

Apache Superset is a Data Visualization and Data Exploration Platform

updated at May 19, 2024, 4:59 a.m.

TypeScript

1,498 +0

59,473 +405

12,709 +48

GitHub
protobuf by protocolbuffers

Protocol Buffers - Google's data interchange format

updated at May 19, 2024, 4:23 a.m.

C++

2,056 +0

63,854 +61

15,291 +13

GitHub
tidb by pingcap

TiDB is an open-source, cloud-native, distributed, MySQL-Compatible database for elastic scale and real-time analytics. Try AI-powered Chat2Query free at : https://tidbcloud.com/free-trial

updated at May 19, 2024, 4:06 a.m.

Go

1,265 -2

36,274 +49

5,720 +2

GitHub
dash by plotly

Data Apps & Dashboards for Python. No JavaScript Required.

updated at May 19, 2024, 2:10 a.m.

Python

419 +0

20,611 +29

1,993 +2

GitHub
luigi by spotify

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

updated at May 19, 2024, 1:11 a.m.

Python

473 +0

17,382 +19

2,374 +0

GitHub
prometheus by prometheus

The Prometheus monitoring system and time series database.

updated at May 19, 2024, 12:37 a.m.

Go

1,132 +8

53,086 +104

8,796 +24

GitHub
pace by getstrm

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.

updated at May 19, 2024, 12:02 a.m.

Kotlin

3 +0

31 +0

0 +0

GitHub
nessie by projectnessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

updated at May 18, 2024, 11:06 p.m.

Java

27 +0

858 +7

116 +0

GitHub
influxdb by influxdata

Scalable datastore for metrics, events, and real-time analytics

updated at May 18, 2024, 10:21 p.m.

Rust

736 -2

27,881 +31

3,490 +0

GitHub
scylladb by scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra

updated at May 18, 2024, 10:19 p.m.

C++

337 -3

12,659 +25

1,215 +3

GitHub
metabase by metabase

The simplest, fastest way to get business intelligence and analytics to everyone in your company yum

updated at May 18, 2024, 7:53 p.m.

Clojure

643 +0

36,809 +121

4,881 +10

GitHub
kcat by edenhill

Generic command line non-JVM Apache Kafka producer and consumer

updated at May 18, 2024, 7:11 p.m.

C

78 +0

5,282 +4

473 +0

GitHub