airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

updated at May 5, 2024, 4:18 a.m.

Python

755 +2

34,583 +70

13,573 +23

GitHub
genie by Netflix

Distributed Big Data Orchestration Service

updated at May 4, 2024, 8:37 p.m.

Java

521 +1

1,681 -1

365 +0

GitHub
elasticsearch-hadoop by elastic

elephant Elasticsearch real-time search and analytics natively integrated with Hadoop

updated at May 4, 2024, 2:51 p.m.

Java

490 +0

1,924 -1

981 +0

GitHub
schema-registry by confluentinc

Confluent Schema Registry for Kafka

updated at May 4, 2024, 9:44 a.m.

Java

368 +0

2,139 +1

1,101 +2

GitHub
crunch by jondot

A fast to develop, fast to run, Go based toolkit for ETL and feature extraction on Hadoop.

updated at May 4, 2024, 2:49 a.m.

Go

18 +0

213 +0

16 +0

GitHub
PigPen by Netflix

Map-Reduce for Clojure

updated at May 3, 2024, 7:54 a.m.

Clojure

466 +1

558 +1

55 +0

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

updated at May 2, 2024, 9:14 p.m.

Java

167 +0

2,190 +0

742 +0

GitHub
oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

updated at May 2, 2024, 11:57 a.m.

Java

209 +0

1,788 +1

405 +0

GitHub
YCSB by brianfrankcooper

Yahoo! Cloud Serving Benchmark

updated at May 2, 2024, 8:18 a.m.

Java

215 +0

4,803 +4

2,194 +2

GitHub
hdfs by colinmarc

A native go client for HDFS

updated at May 1, 2024, 11:41 p.m.

Go

39 +0

1,347 +2

341 +0

GitHub
HiBench by Intel-bigdata

HiBench is a big data benchmark suite.

updated at May 1, 2024, 10:35 a.m.

Java

126 +0

1,433 +2

756 -1

GitHub
schema-registry-ui by Landoop

Web tool for Avro Schema Registry |

updated at April 24, 2024, 12:15 p.m.

JavaScript

36 +0

415 +0

112 +0

GitHub
Lipstick by Netflix

Pig Visualization framework

updated at April 22, 2024, 8:35 p.m.

JavaScript

491 +1

465 +0

132 +0

GitHub
shib by tagomoris

WebUI for query engines: Hive and Presto

updated at April 19, 2024, 10:53 a.m.

JavaScript

28 +0

198 +0

56 +0

GitHub
HiveRunner by HiveRunner

An Open Source unit test framework for Hive queries based on JUnit 4 and 5

updated at April 18, 2024, 11:53 a.m.

Java

34 +0

252 +0

79 +0

GitHub
PyHive by dropbox

Python interface to Hive and Presto. 🐝

updated at April 17, 2024, 5:33 p.m.

Python

62 +0

1,665 +0

552 +0

GitHub
registry by hortonworks

Schema Registry

updated at April 10, 2024, 1:30 p.m.

Java

203 +0

12 +0

7 +0

GitHub
OdbcHive by recruitcojp

Hive ODBC driver for Windows

updated at April 10, 2024, 12:43 a.m.

C++

3 +0

8 +0

8 +0

GitHub
inviso by Netflix

None

updated at April 9, 2024, 3:13 a.m.

JavaScript

456 +1

205 +0

72 +0

GitHub
hindex by Huawei-Hadoop

Secondary Index for HBase

updated at April 8, 2024, 2:38 a.m.

Java

134 +0

589 +0

289 +0

GitHub