airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

updated at May 26, 2024, 4:37 a.m.

Python

755 +1

34,811 +81

13,643 +20

GitHub
schema-registry by confluentinc

Confluent Schema Registry for Kafka

updated at May 25, 2024, 10:17 p.m.

Java

372 +2

2,153 +11

1,101 +1

GitHub
genie by Netflix

Distributed Big Data Orchestration Service

updated at May 25, 2024, 1:26 p.m.

Java

524 +2

1,689 +3

364 +0

GitHub
shib by tagomoris

WebUI for query engines: Hive and Presto

updated at May 24, 2024, 10:19 p.m.

JavaScript

28 +0

199 +0

57 +0

GitHub
PigPen by Netflix

Map-Reduce for Clojure

updated at May 24, 2024, 6:23 a.m.

Clojure

469 +2

559 +2

55 +0

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

updated at May 23, 2024, 8:40 p.m.

Java

166 +0

2,196 +3

743 +0

GitHub
YCSB by brianfrankcooper

Yahoo! Cloud Serving Benchmark

updated at May 23, 2024, 9:19 a.m.

Java

215 -1

4,816 +3

2,199 +1

GitHub
HiBench by Intel-bigdata

HiBench is a big data benchmark suite.

updated at May 23, 2024, 12:25 a.m.

Java

126 +0

1,436 +1

756 +0

GitHub
hadoopy by bwhite

Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.

updated at May 20, 2024, 4:24 p.m.

C

23 +0

244 +1

59 +0

GitHub
oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

updated at May 19, 2024, 10:14 a.m.

Java

209 +0

1,788 -1

405 +0

GitHub
suro by Netflix

Netflix's distributed Data Pipeline

updated at May 17, 2024, 9:24 p.m.

Java

511 +2

791 +0

168 +0

GitHub
HiveSwarm by livingsocial

Helpful user defined fuctions / table generating functions for Hive

updated at May 17, 2024, 4:39 p.m.

Java

66 +0

100 +0

46 +0

GitHub
hdfs by colinmarc

A native go client for HDFS

updated at May 17, 2024, 1:08 p.m.

Go

38 +0

1,348 +0

339 +0

GitHub
elasticsearch-hadoop by elastic

elephant Elasticsearch real-time search and analytics natively integrated with Hadoop

updated at May 16, 2024, 7:48 p.m.

Java

488 -1

1,926 +0

981 +0

GitHub
PyHive by dropbox

Python interface to Hive and Presto. 🐝

updated at May 15, 2024, 9:36 a.m.

Python

62 +0

1,664 +0

551 +0

GitHub
hive_test by edwardcapriolo

Unit test framework for hive and hive-service

updated at May 12, 2024, 2:09 a.m.

Java

18 +0

64 +0

47 +0

GitHub
Beetest by kawaa

A super simple utility for testing Apache Hive scripts locally for non-Java developers.

updated at May 12, 2024, 2:09 a.m.

Java

8 +0

72 +0

23 +0

GitHub
hindex by Huawei-Hadoop

Secondary Index for HBase

updated at May 10, 2024, 8:20 a.m.

Java

134 +0

589 +0

289 +0

GitHub
HiveRunner by HiveRunner

An Open Source unit test framework for Hive queries based on JUnit 4 and 5

updated at May 8, 2024, 4:39 a.m.

Java

34 +0

253 +0

79 +0

GitHub
hive-solr by chimpler

Hive Storage Handler for SOLR

updated at May 6, 2024, 3:14 p.m.

Java

10 +0

16 +0

26 +0

GitHub