sparkle by tweag

Haskell on Apache Spark.

created at Nov. 9, 2015, 3:49 p.m.

Haskell

60 +0

445 +0

30 +0

GitHub
spark-notebook by spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

created at Sept. 5, 2014, 7:35 p.m.

JavaScript

190 +0

3,148 +0

654 +0

GitHub
dplyr by tidyverse

dplyr: A grammar of data manipulation

created at Oct. 28, 2012, 1:39 p.m.

R

244 +0

4,677 +1

2,118 +0

GitHub
mleap by combust

MLeap: Deploy ML Pipelines to Production

created at Aug. 23, 2016, 3:51 a.m.

Scala

69 +0

1,498 +0

312 +0

GitHub
flintrock by nchammas

A command-line tool for launching Apache Spark clusters.

created at June 4, 2015, 7:14 a.m.

Python

32 +0

633 +0

115 +0

GitHub
spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

created at Sept. 24, 2017, 7:36 p.m.

Scala

99 -1

3,730 +6

704 +0

GitHub
sparkly by Tubular

Helpers & syntactic sugar for PySpark.

created at Oct. 7, 2016, 3:50 p.m.

Python

38 +0

60 +0

7 +0

GitHub
flambo by sorenmacbeth

A Clojure DSL for Apache Spark

created at Jan. 7, 2014, 7:42 p.m.

Clojure

78 +0

609 +0

84 +0

GitHub
aut by archivesunleashed

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

created at July 6, 2017, 10:13 a.m.

Scala

15 +0

134 +1

33 +0

GitHub
Clustering4Ever by Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

created at March 26, 2018, 7:58 p.m.

Scala

21 +0

128 +0

13 +0

GitHub
Mobius by Microsoft

C# and F# language binding and extensions to Apache Spark

created at Oct. 27, 2015, 7:21 p.m.

C#

145 +0

940 +0

212 +0

GitHub
koalas by databricks

Koalas: pandas API on Apache Spark

created at Jan. 3, 2019, 9:46 p.m.

Python

319 +0

3,319 +0

354 +0

GitHub
sparklyr by sparklyr

R interface for Apache Spark

created at May 20, 2016, 3:28 p.m.

R

73 +0

929 +0

302 +0

GitHub
deequ by awslabs

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

created at Aug. 7, 2018, 8:55 p.m.

Scala

80 +0

3,158 +5

513 +0

GitHub
incubator-livy by apache

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

created at June 25, 2017, 7 a.m.

Scala

57 +0

860 +0

595 +1

GitHub
spark by dotnet

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

created at April 22, 2019, 6:55 p.m.

C#

92 +0

2,004 +1

309 -1

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

created at Oct. 26, 2020, 1:56 p.m.

Scala

16 +0

339 +0

52 +1

GitHub
delta by delta-io

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

created at April 22, 2019, 6:56 p.m.

Scala

217 +1

7,110 +127

1,608 +12

GitHub
itachi by yaooqinn

A library that brings useful functions from various modern database management systems to Apache Spark

created at April 2, 2020, noon

Scala

5 +0

53 +0

4 +0

GitHub
spark-daria by MrPowers

Essential Spark extensions and helper methods ✨😲

created at Feb. 16, 2017, 3:41 p.m.

Scala

33 +0

743 +0

149 +0

GitHub