oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

updated at Oct. 19, 2024, 8:54 a.m.

Java

208 +0

1,788 +0

405 +0

GitHub
elephant-bird by twitter

Twitter's collection of LZO and Protocol Buffer-related Hadoop, Pig, Hive, and HBase code.

updated at Oct. 25, 2024, 8:30 a.m.

Java

188 +0

1,138 +0

387 +0

GitHub
banana by lucidworks

Banana for Solr - A Port of Kibana

updated at Nov. 6, 2024, 10:41 a.m.

JavaScript

199 +0

668 +0

235 +0

GitHub
genie by Netflix

Distributed Big Data Orchestration Service

updated at Nov. 6, 2024, 12:51 p.m.

Java

528 +1

1,716 +0

369 +0

GitHub
shib by tagomoris

WebUI for query engines: Hive and Presto

updated at Nov. 6, 2024, 8:43 p.m.

JavaScript

27 +0

200 +0

59 +0

GitHub
HiveRunner by HiveRunner

An Open Source unit test framework for Hive queries based on JUnit 4 and 5

updated at Nov. 11, 2024, 9:43 a.m.

Java

34 +0

255 +0

77 +0

GitHub
elasticsearch-hadoop by elastic

elephant Elasticsearch real-time search and analytics natively integrated with Hadoop

updated at Nov. 13, 2024, 6:37 a.m.

Java

180 +1

9 +0

990 +0

GitHub
PyHive by dropbox

Python interface to Hive and Presto. 🐝

updated at Nov. 14, 2024, 5:59 a.m.

Python

62 +0

1,671 +1

549 +0

GitHub
YCSB by brianfrankcooper

Yahoo! Cloud Serving Benchmark

updated at Nov. 14, 2024, 5:52 p.m.

Java

213 -1

4,955 +5

2,252 +4

GitHub
hdfs by colinmarc

A native go client for HDFS

updated at Nov. 15, 2024, 7:35 a.m.

Go

37 +0

1,370 +1

341 +0

GitHub
schema-registry by confluentinc

Confluent Schema Registry for Kafka

updated at Nov. 15, 2024, 9:40 p.m.

Java

379 +0

2,225 +1

1,114 +1

GitHub
gobblin by apache

A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

updated at Nov. 15, 2024, 10:11 p.m.

Java

165 +0

2,228 +0

751 +1

GitHub
HiBench by Intel-bigdata

HiBench is a big data benchmark suite.

updated at Nov. 16, 2024, 5:46 p.m.

Java

126 +0

1,459 +2

768 +2

GitHub
packetpig by packetloop

Packetpig - Open Source Big Data Security Analytics

updated at Nov. 16, 2024, 8:56 p.m.

Python

57 +0

299 +1

85 +0

GitHub
airflow by apache

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

updated at Nov. 17, 2024, 1:21 a.m.

Python

760 +2

37,126 +88

14,306 +25

GitHub