airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

created at April 13, 2015, 6:04 p.m.

Python

754 +2

34,425 +83

13,523 +18

GitHub
dash by plotly

Data Apps & Dashboards for Python. No JavaScript Required.

created at April 10, 2015, 1:53 a.m.

Python

416 -1

20,471 +39

1,985 +2

GitHub
luigi by spotify

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

created at Sept. 20, 2012, 3:06 p.m.

Python

475 +0

17,300 +15

2,371 +1

GitHub
dagster by dagster-io

An orchestration platform for the development, production, and observation of data assets.

created at April 30, 2018, 4:30 p.m.

Python

112 +0

10,180 +58

1,259 +6

GitHub
aws-sdk-pandas by aws

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

created at Feb. 26, 2019, 1:39 a.m.

Python

61 +0

3,799 +8

666 +2

GitHub
flocker by ClusterHQ

Container data volume manager for your Dockerized application

created at April 28, 2014, 6:02 p.m.

Python

168 +0

3,376 +0

286 +0

GitHub
smart_open by piskvorky

Utils for streaming large files (S3, HDFS, gzip, bz2...)

created at Jan. 2, 2015, 1:05 p.m.

Python

49 +0

3,087 +3

378 +1

GitHub
PyHive by dropbox

Python interface to Hive and Presto. 🐝

created at Feb. 1, 2014, 9:05 a.m.

Python

62 +0

1,665 +2

552 +0

GitHub
faust by faust-streaming

Python Stream Processing. A Faust fork

created at Oct. 22, 2020, 3:32 p.m.

Python

28 +0

1,445 +6

170 -1

GitHub
DataProfiler by capitalone

What's in your data? Extract schema, statistics and entities from datasets

created at Nov. 9, 2020, 3:20 p.m.

Python

21 +0

1,359 +5

154 +0

GitHub
ccm by riptano

A script to easily create and destroy an Apache Cassandra cluster on localhost

created at March 1, 2011, 9:42 a.m.

Python

76 +0

1,212 +0

302 +0

GitHub
mysql_utils by pinterest

Pinterest MySQL Management Tools

created at Oct. 24, 2015, 5:33 p.m.

Python

72 +0

878 +0

146 +0

GitHub
snakebite by spotify

A pure python HDFS client

created at May 7, 2013, 9:44 a.m.

Python

130 +0

858 +0

216 +0

GitHub
datacompy by capitalone

Pandas and Spark DataFrame comparison for humans and more!

created at March 23, 2018, 1:16 p.m.

Python

25 +0

379 +2

118 +0

GitHub