delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

created at Oct. 26, 2020, 1:56 p.m.

Scala

16 +0

335 +0

51 +1

GitHub
itachi by yaooqinn

A library that brings useful functions from various modern database management systems to Apache Spark

created at April 2, 2020, noon

Scala

5 +0

54 +0

4 +0

GitHub
delta by delta-io

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

created at April 22, 2019, 6:56 p.m.

Scala

215 +0

6,935 +13

1,583 +3

GitHub
deequ by awslabs

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

created at Aug. 7, 2018, 8:55 p.m.

Scala

80 +0

3,140 +6

514 +0

GitHub
Clustering4Ever by Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

created at March 26, 2018, 7:58 p.m.

Scala

21 +0

128 +0

13 +0

GitHub
kyuubi by apache

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

created at Dec. 18, 2017, 9:05 a.m.

Scala

62 -1

1,947 +6

860 +1

GitHub
spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

created at Sept. 24, 2017, 7:36 p.m.

Scala

100 +0

3,708 +9

702 +1

GitHub
aut by archivesunleashed

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

created at July 6, 2017, 10:13 a.m.

Scala

15 +0

133 +0

33 +0

GitHub
incubator-livy by apache

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

created at June 25, 2017, 7 a.m.

Scala

57 +0

857 +1

594 +0

GitHub
SynapseML by Microsoft

Simple and Distributed Machine Learning

created at June 5, 2017, 8:23 a.m.

Scala

146 +0

4,975 +3

815 +0

GitHub
spark-fast-tests by MrPowers

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

created at April 6, 2017, 9:40 p.m.

Scala

15 +0

418 +0

73 +0

GitHub
spark-daria by MrPowers

Essential Spark extensions and helper methods ✨😲

created at Feb. 16, 2017, 3:41 p.m.

Scala

33 +0

743 +1

148 +0

GitHub
spark-orientdb by orientechnologies

Apache Spark datasource for OrientDB

created at Oct. 31, 2016, 2:51 p.m.

Scala

15 +0

19 +0

11 +0

GitHub
flint by twosigma

A Time Series Library for Apache Spark

created at Oct. 19, 2016, 5:44 p.m.

Scala

77 +0

992 +0

184 +0

GitHub
silex by willb

something to help you spark

created at Oct. 5, 2016, 5:47 p.m.

Scala

3 +0

19 +0

0 +0

GitHub
mleap by combust

MLeap: Deploy ML Pipelines to Production

created at Aug. 23, 2016, 3:51 a.m.

Scala

69 +0

1,496 +2

313 +0

GitHub
neo4j-spark-connector by neo4j-contrib

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

created at March 3, 2016, 4:01 p.m.

Scala

35 +0

304 +1

114 +0

GitHub
graphframes by graphframes

None

created at Jan. 20, 2016, 11:17 p.m.

Scala

58 +0

972 +1

232 +0

GitHub
mist by Hydrospheredata

Serverless proxy for Spark cluster

created at Jan. 15, 2016, 7:22 a.m.

Scala

41 +0

326 +0

67 +0

GitHub
incubator-toree by apache

Mirror of Apache Toree (Incubating)

created at Jan. 7, 2016, 8 a.m.

Scala

48 -1

731 +0

224 +0

GitHub