luigi by spotify

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

updated at May 11, 2024, 11:52 p.m.

Python

473 -1

17,363 +21

2,374 +1

GitHub
airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

updated at May 11, 2024, 10:20 p.m.

Python

756 +1

34,644 +61

13,596 +23

GitHub
dagster by dagster-io

An orchestration platform for the development, production, and observation of data assets.

updated at May 11, 2024, 7:31 p.m.

Python

115 +1

10,335 +53

1,285 +8

GitHub
dash by plotly

Data Apps & Dashboards for Python. No JavaScript Required.

updated at May 11, 2024, 6:26 p.m.

Python

419 +1

20,582 +47

1,991 -1

GitHub
PyHive by dropbox

Python interface to Hive and Presto. 🐝

updated at May 11, 2024, 1:05 p.m.

Python

62 +0

1,665 +0

551 -1

GitHub
ccm by riptano

A script to easily create and destroy an Apache Cassandra cluster on localhost

updated at May 11, 2024, 6:48 a.m.

Python

76 +0

1,212 +0

301 -1

GitHub
faust by faust-streaming

Python Stream Processing. A Faust fork

updated at May 10, 2024, 11:15 p.m.

Python

28 +0

1,467 +2

172 +1

GitHub
DataProfiler by capitalone

What's in your data? Extract schema, statistics and entities from datasets

updated at May 10, 2024, 10:33 p.m.

Python

21 +0

1,364 +1

156 +2

GitHub
smart_open by piskvorky

Utils for streaming large files (S3, HDFS, gzip, bz2...)

updated at May 10, 2024, 4:57 p.m.

Python

49 +0

3,096 +2

378 +0

GitHub
aws-sdk-pandas by aws

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

updated at May 10, 2024, 3:44 p.m.

Python

61 +0

3,809 +4

669 +1

GitHub
datacompy by capitalone

Pandas and Spark DataFrame comparison for humans and more!

updated at May 6, 2024, 11:14 p.m.

Python

25 +0

394 +5

122 +0

GitHub
mysql_utils by pinterest

Pinterest MySQL Management Tools

updated at April 25, 2024, 6:37 a.m.

Python

72 +0

879 +0

141 +0

GitHub
flocker by ClusterHQ

Container data volume manager for your Dockerized application

updated at April 8, 2024, 9:52 a.m.

Python

168 +0

3,376 +0

285 +0

GitHub
snakebite by spotify

A pure python HDFS client

updated at April 2, 2024, 5:40 p.m.

Python

128 -1

858 +0

216 +0

GitHub