airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

created at April 13, 2015, 6:04 p.m.

Python

760 +0

37,422 +121

14,357 +19

GitHub
dash by plotly

Data Apps & Dashboards for Python. No JavaScript Required.

created at April 10, 2015, 1:53 a.m.

Python

427 +2

21,583 +49

2,081 +7

GitHub
luigi by spotify

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

created at Sept. 20, 2012, 3:06 p.m.

Python

472 +0

17,908 +8

2,400 +1

GitHub
dagster by dagster-io

An orchestration platform for the development, production, and observation of data assets.

created at April 30, 2018, 4:30 p.m.

Python

124 +1

11,964 +110

1,497 +10

GitHub
aws-sdk-pandas by aws

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

created at Feb. 26, 2019, 1:39 a.m.

Python

60 +0

3,938 -1

702 +1

GitHub
flocker by ClusterHQ

Container data volume manager for your Dockerized application

created at April 28, 2014, 6:02 p.m.

Python

169 +0

3,389 -1

290 +0

GitHub
smart_open by piskvorky

Utils for streaming large files (S3, HDFS, gzip, bz2...)

created at Jan. 2, 2015, 1:05 p.m.

Python

47 +0

3,221 +3

382 +0

GitHub
PyHive by dropbox

Python interface to Hive and Presto. 🐝

created at Feb. 1, 2014, 9:05 a.m.

Python

62 +0

1,674 +3

551 +2

GitHub
faust by faust-streaming

Python Stream Processing. A Faust fork

created at Oct. 22, 2020, 3:32 p.m.

Python

32 -1

1,668 +7

183 +0

GitHub
DataProfiler by capitalone

What's in your data? Extract schema, statistics and entities from datasets

created at Nov. 9, 2020, 3:20 p.m.

Python

21 +0

1,437 +3

163 +1

GitHub
ccm by riptano

A script to easily create and destroy an Apache Cassandra cluster on localhost

created at March 1, 2011, 9:42 a.m.

Python

74 +0

1,218 +2

303 +0

GitHub
mysql_utils by pinterest

Pinterest MySQL Management Tools

created at Oct. 24, 2015, 5:33 p.m.

Python

71 +0

883 +0

142 +0

GitHub
snakebite by spotify

A pure python HDFS client

created at May 7, 2013, 9:44 a.m.

Python

129 +1

854 -1

216 +0

GitHub
datacompy by capitalone

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

created at March 23, 2018, 1:16 p.m.

Python

23 +0

485 +0

130 +1

GitHub