schema-registry by confluentinc

Confluent Schema Registry for Kafka

created at Dec. 9, 2014, 10:38 p.m.

Java

370 +2

2,142 +3

1,101 +0

GitHub
mpich2-yarn by alibaba

Running MPICH2 on Yarn

created at Aug. 23, 2012, 3:57 a.m.

Java

34 +0

114 +0

62 +0

GitHub
crunch by jondot

A fast to develop, fast to run, Go based toolkit for ETL and feature extraction on Hadoop.

created at Nov. 18, 2014, 7:17 p.m.

Go

18 +0

213 +0

16 +0

GitHub
genie by Netflix

Distributed Big Data Orchestration Service

created at June 20, 2013, 8:35 p.m.

Java

521 +0

1,685 +4

365 +0

GitHub
hadoopy by bwhite

Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.

created at Oct. 18, 2009, 1:25 a.m.

C

23 +0

243 +0

59 +0

GitHub
elasticsearch-hadoop by elastic

elephant Elasticsearch real-time search and analytics natively integrated with Hadoop

created at March 11, 2013, 6:57 p.m.

Java

488 -2

1,925 +1

981 +0

GitHub
airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

created at April 13, 2015, 6:04 p.m.

Python

756 +1

34,644 +61

13,596 +23

GitHub
HiBench by Intel-bigdata

HiBench is a big data benchmark suite.

created at June 12, 2012, 7:56 a.m.

Java

126 +0

1,433 +0

756 +0

GitHub
schema-registry-ui by Landoop

Web tool for Avro Schema Registry |

created at June 12, 2016, 1:01 p.m.

JavaScript

36 +0

415 +0

112 +0

GitHub
happybase by python-happybase

A developer-friendly Python library to interact with Apache HBase

created at May 20, 2012, 8:06 p.m.

Python

35 +0

609 +0

162 +0

GitHub
varaha by thedatachef

Machine learning and natural language processing with Apache Pig

created at April 25, 2011, 3:39 a.m.

Java

9 +0

53 +0

15 +0

GitHub
hdfs-du by twitter-archive

Visualize your HDFS cluster usage

created at Aug. 7, 2012, 5:52 p.m.

JavaScript

138 +0

231 +0

87 +0

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

created at Dec. 1, 2014, 6:10 p.m.

Java

166 -1

2,192 +2

743 +1

GitHub
white-elephant by LinkedInAttic

Hadoop log aggregator and dashboard

created at Jan. 24, 2013, 11:26 p.m.

Java

97 +0

190 +0

63 +0

GitHub
HiveRunner by HiveRunner

An Open Source unit test framework for Hive queries based on JUnit 4 and 5

created at Nov. 22, 2013, 9:19 a.m.

Java

34 +0

253 +1

79 +0

GitHub