genie by Netflix

Distributed Big Data Orchestration Service

created at June 20, 2013, 8:35 p.m.

Java

521 +0

1,685 +4

365 +0

GitHub
suro by Netflix

Netflix's distributed Data Pipeline

created at March 20, 2013, 9:02 p.m.

Java

508 +0

789 +0

168 +0

GitHub
elasticsearch-hadoop by elastic

elephant Elasticsearch real-time search and analytics natively integrated with Hadoop

created at March 11, 2013, 6:57 p.m.

Java

488 -2

1,925 +1

981 +0

GitHub
schema-registry by confluentinc

Confluent Schema Registry for Kafka

created at Dec. 9, 2014, 10:38 p.m.

Java

370 +2

2,142 +3

1,101 +0

GitHub
Hive-Extensions-from-Think-Big-Analytics by ThinkBigAnalytics

Reusable code for Hive

created at April 6, 2011, 1:45 a.m.

Java

316 +0

16 +0

14 +0

GitHub
YCSB by brianfrankcooper

Yahoo! Cloud Serving Benchmark

created at April 19, 2010, 8:52 p.m.

Java

215 +0

4,812 +9

2,196 +2

GitHub
oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

created at July 25, 2014, 8:08 p.m.

Java

209 +0

1,789 +1

405 +0

GitHub
registry by hortonworks

Schema Registry

created at Oct. 26, 2016, 8:28 a.m.

Java

203 +0

12 +0

7 +0

GitHub
elephant-bird by twitter

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

created at March 25, 2010, 1:49 a.m.

Java

189 -1

1,137 +0

390 +0

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

created at Dec. 1, 2014, 6:10 p.m.

Java

166 -1

2,192 +2

743 +1

GitHub
hindex by Huawei-Hadoop

Secondary Index for HBase

created at Aug. 8, 2013, 11:33 a.m.

Java

134 +0

589 +0

289 +0

GitHub
HiBench by Intel-bigdata

HiBench is a big data benchmark suite.

created at June 12, 2012, 7:56 a.m.

Java

126 +0

1,433 +0

756 +0

GitHub
white-elephant by LinkedInAttic

Hadoop log aggregator and dashboard

created at Jan. 24, 2013, 11:26 p.m.

Java

97 +0

190 +0

63 +0

GitHub
HiveSwarm by livingsocial

Helpful user defined fuctions / table generating functions for Hive

created at April 5, 2011, 5:46 p.m.

Java

66 +0

101 +0

46 +0

GitHub
HiveRunner by HiveRunner

An Open Source unit test framework for Hive queries based on JUnit 4 and 5

created at Nov. 22, 2013, 9:19 a.m.

Java

34 +0

253 +1

79 +0

GitHub
mpich2-yarn by alibaba

Running MPICH2 on Yarn

created at Aug. 23, 2012, 3:57 a.m.

Java

34 +0

114 +0

62 +0

GitHub
haeinsa by VCNC

Haeinsa is linearly scalable multi-row, multi-table transaction library for HBase

created at Aug. 10, 2013, 3:43 p.m.

Java

30 +0

158 +0

47 +0

GitHub
akela by mozilla-metrics

A bunch of utility classes for Java, Hadoop, HBase, Pig, etc.

created at Dec. 11, 2010, 12:36 a.m.

Java

23 +0

76 +0

31 +0

GitHub
hive_test by edwardcapriolo

Unit test framework for hive and hive-service

created at Sept. 16, 2011, 2:39 p.m.

Java

18 +0

64 +1

47 +0

GitHub
ls-hive by lovelysystems

Lovely Systems Hive Goodies

created at Jan. 24, 2012, 3:12 p.m.

Java

16 +0

5 +0

2 +0

GitHub