HiBench by Intel-bigdata

HiBench is a big data benchmark suite.

updated at Nov. 16, 2024, 5:46 p.m.

Java

126 +0

1,459 +2

768 +2

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

updated at Nov. 15, 2024, 10:11 p.m.

Java

165 +0

2,228 +0

751 +1

GitHub
schema-registry by confluentinc

Confluent Schema Registry for Kafka

updated at Nov. 15, 2024, 9:40 p.m.

Java

379 +0

2,225 +1

1,114 +1

GitHub
YCSB by brianfrankcooper

Yahoo! Cloud Serving Benchmark

updated at Nov. 14, 2024, 5:52 p.m.

Java

213 -1

4,955 +5

2,252 +4

GitHub
elasticsearch-hadoop by elastic

elephant Elasticsearch real-time search and analytics natively integrated with Hadoop

updated at Nov. 13, 2024, 6:37 a.m.

Java

180 +1

9 +0

990 +0

GitHub
HiveRunner by HiveRunner

An Open Source unit test framework for Hive queries based on JUnit 4 and 5

updated at Nov. 11, 2024, 9:43 a.m.

Java

34 +0

255 +0

77 +0

GitHub
genie by Netflix

Distributed Big Data Orchestration Service

updated at Nov. 6, 2024, 12:51 p.m.

Java

528 +1

1,716 +0

369 +0

GitHub
elephant-bird by twitter

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

updated at Oct. 25, 2024, 8:30 a.m.

Java

188 +0

1,138 +0

387 +0

GitHub
oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

updated at Oct. 19, 2024, 8:54 a.m.

Java

208 +0

1,788 +0

405 +0

GitHub
white-elephant by LinkedInAttic

Hadoop log aggregator and dashboard

updated at Oct. 12, 2024, 2:55 a.m.

Java

97 +0

192 +0

62 +0

GitHub
flume-ng-rabbitmq by jcustenborder

Flume plugin for RabbitMQ

updated at Oct. 1, 2024, 7:27 a.m.

Java

10 +0

58 +0

46 +0

GitHub
registry by hortonworks

Schema Registry

updated at Sept. 18, 2024, 2:25 p.m.

Java

206 +0

15 +0

8 +0

GitHub
haeinsa by VCNC

Haeinsa is linearly scalable multi-row, multi-table transaction library for HBase

updated at Sept. 3, 2024, 4:38 a.m.

Java

30 +0

158 +0

42 +0

GitHub
suro by Netflix

Netflix's distributed Data Pipeline

updated at Aug. 24, 2024, 12:46 p.m.

Java

514 +1

794 +0

171 +0

GitHub
hindex by Huawei-Hadoop

Secondary Index for HBase

updated at Aug. 19, 2024, 8:09 a.m.

Java

134 +0

591 +0

286 +0

GitHub
HiveSwarm by livingsocial

Helpful user defined fuctions / table generating functions for Hive

updated at June 20, 2024, 11:59 a.m.

Java

66 +0

101 +0

46 +0

GitHub
hive_test by edwardcapriolo

Unit test framework for hive and hive-service

updated at May 12, 2024, 2:09 a.m.

Java

18 +0

64 +0

46 +0

GitHub
Beetest by kawaa

A super simple utility for testing Apache Hive scripts locally for non-Java developers.

updated at May 12, 2024, 2:09 a.m.

Java

8 +0

72 +0

23 +0

GitHub
hive-solr by chimpler

Hive Storage Handler for SOLR

updated at May 6, 2024, 3:14 p.m.

Java

10 +0

16 +0

26 +0

GitHub
Hive-mongo by yc-huang

hive storage handler for connecting with MongoDB

updated at May 6, 2024, 3:14 p.m.

Java

10 +0

32 +0

33 +0

GitHub