faust by faust-streaming

Python Stream Processing. A Faust fork

updated at May 5, 2024, 11:03 a.m.

Python

28 +0

1,465 +15

171 +1

GitHub
dagster by dagster-io

An orchestration platform for the development, production, and observation of data assets.

updated at May 5, 2024, 4:32 a.m.

Python

114 +1

10,282 +59

1,277 +13

GitHub
airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

updated at May 5, 2024, 4:18 a.m.

Python

755 +2

34,583 +70

13,573 +23

GitHub
luigi by spotify

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

updated at May 5, 2024, 1:07 a.m.

Python

474 -1

17,342 +24

2,373 +1

GitHub
dash by plotly

Data Apps & Dashboards for Python. No JavaScript Required.

updated at May 4, 2024, 11:48 p.m.

Python

418 +1

20,535 +37

1,992 +5

GitHub
aws-sdk-pandas by aws

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

updated at May 4, 2024, 9:05 p.m.

Python

61 +0

3,805 +3

668 +1

GitHub
datacompy by capitalone

Pandas and Spark DataFrame comparison for humans and more!

updated at May 3, 2024, 8:43 a.m.

Python

25 +0

389 +6

122 +2

GitHub
smart_open by piskvorky

Utils for streaming large files (S3, HDFS, gzip, bz2...)

updated at May 2, 2024, 12:46 p.m.

Python

49 +0

3,094 +1

378 +0

GitHub
DataProfiler by capitalone

What's in your data? Extract schema, statistics and entities from datasets

updated at May 2, 2024, 2:22 a.m.

Python

21 +0

1,363 +1

154 +0

GitHub
ccm by riptano

A script to easily create and destroy an Apache Cassandra cluster on localhost

updated at April 29, 2024, 12:45 p.m.

Python

76 +0

1,212 +0

302 +0

GitHub
mysql_utils by pinterest

Pinterest MySQL Management Tools

updated at April 25, 2024, 6:37 a.m.

Python

72 +0

879 +0

141 +0

GitHub
PyHive by dropbox

Python interface to Hive and Presto. 🐝

updated at April 17, 2024, 5:33 p.m.

Python

62 +0

1,665 +0

552 +0

GitHub
flocker by ClusterHQ

Container data volume manager for your Dockerized application

updated at April 8, 2024, 9:52 a.m.

Python

168 +0

3,376 +0

285 -1

GitHub
snakebite by spotify

A pure python HDFS client

updated at April 2, 2024, 5:40 p.m.

Python

129 -1

858 +0

216 +0

GitHub