airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

created at April 13, 2015, 6:04 p.m.

Python

760 +2

37,126 +88

14,306 +25

GitHub
YCSB by brianfrankcooper

Yahoo! Cloud Serving Benchmark

created at April 19, 2010, 8:52 p.m.

Java

213 -1

4,955 +5

2,252 +4

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

created at Dec. 1, 2014, 6:10 p.m.

Java

165 +0

2,228 +0

751 +1

GitHub
schema-registry by confluentinc

Confluent Schema Registry for Kafka

created at Dec. 9, 2014, 10:38 p.m.

Java

379 +0

2,225 +1

1,114 +1

GitHub
oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

created at July 25, 2014, 8:08 p.m.

Java

208 +0

1,788 +0

405 +0

GitHub
genie by Netflix

Distributed Big Data Orchestration Service

created at June 20, 2013, 8:35 p.m.

Java

528 +1

1,716 +0

369 +0

GitHub
PyHive by dropbox

Python interface to Hive and Presto. 🐝

created at Feb. 1, 2014, 9:05 a.m.

Python

62 +0

1,671 +1

549 +0

GitHub
HiBench by Intel-bigdata

HiBench is a big data benchmark suite.

created at June 12, 2012, 7:56 a.m.

Java

126 +0

1,459 +2

768 +2

GitHub
hdfs by colinmarc

A native go client for HDFS

created at Oct. 8, 2014, 7:37 p.m.

Go

37 +0

1,370 +1

341 +0

GitHub
elephant-bird by twitter

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

created at March 25, 2010, 1:49 a.m.

Java

188 +0

1,138 +0

387 +0

GitHub
suro by Netflix

Netflix's distributed Data Pipeline

created at March 20, 2013, 9:02 p.m.

Java

514 +1

794 +0

171 +0

GitHub
banana by lucidworks

Banana for Solr - A Port of Kibana

created at Nov. 21, 2013, 5:30 p.m.

JavaScript

199 +0

668 +0

235 +0

GitHub
happybase by python-happybase

A developer-friendly Python library to interact with Apache HBase

created at May 20, 2012, 8:06 p.m.

Python

35 +0

612 +0

163 +0

GitHub
hindex by Huawei-Hadoop

Secondary Index for HBase

created at Aug. 8, 2013, 11:33 a.m.

Java

134 +0

591 +0

286 +0

GitHub
PigPen by Netflix

Map-Reduce for Clojure

created at Dec. 12, 2013, 10:56 p.m.

Clojure

475 +1

567 +0

55 +0

GitHub
Lipstick by Netflix

Pig Visualization framework

created at May 21, 2013, 6:56 p.m.

JavaScript

499 +1

464 +0

131 +0

GitHub
schema-registry-ui by Landoop

Web tool for Avro Schema Registry |

created at June 12, 2016, 1:01 p.m.

JavaScript

36 +0

421 +0

112 +0

GitHub
packetpig by packetloop

Packetpig - Open Source Big Data Security Analytics

created at March 7, 2012, 4:03 a.m.

Python

57 +0

299 +1

85 +0

GitHub
HiveRunner by HiveRunner

An Open Source unit test framework for Hive queries based on JUnit 4 and 5

created at Nov. 22, 2013, 9:19 a.m.

Java

34 +0

255 +0

77 +0

GitHub
hadoopy by bwhite

Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.

created at Oct. 18, 2009, 1:25 a.m.

C

23 +0

243 +0

59 +0

GitHub