flambo by sorenmacbeth

A Clojure DSL for Apache Spark

created at Jan. 7, 2014, 7:42 p.m.

Clojure

78 +0

608 +0

86 +0

GitHub
mongo-spark by mongodb

The MongoDB Spark Connector

created at May 20, 2015, 5:59 p.m.

Java

79 +0

702 -1

307 +0

GitHub
deequ by awslabs

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

created at Aug. 7, 2018, 8:55 p.m.

Scala

80 +0

3,140 +6

514 +0

GitHub
photon-ml by linkedin

A scalable machine learning library on Apache Spark

created at Feb. 3, 2016, 1:12 a.m.

Terra

83 +0

790 +0

185 +0

GitHub
livy by cloudera

Livy is an open source REST interface for interacting with Apache Spark from anywhere

created at Nov. 17, 2015, 6:55 a.m.

Scala

91 +0

1,005 +0

316 +0

GitHub
spark by dotnet

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

created at April 22, 2019, 6:55 p.m.

C#

91 +0

1,999 +0

308 +0

GitHub
spark-sklearn by databricks

(Deprecated) Scikit-learn integration package for Apache Spark

created at Sept. 2, 2015, 6:44 p.m.

Python

94 +0

1,077 +0

231 +0

GitHub
sedona by apache

A cluster computing framework for processing large-scale geospatial data

created at April 24, 2015, 6:01 p.m.

Java

96 +0

1,784 +4

646 +1

GitHub
spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

created at Sept. 24, 2017, 7:36 p.m.

Scala

100 +0

3,708 +9

702 +1

GitHub
adam by bigdatagenomics

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

created at Nov. 19, 2013, 11:47 p.m.

Scala

100 +0

968 +1

304 -1

GitHub
crossdata by Stratio

DISCONTINUED - Easy access to big things. Library for Apache Spark extending and improving its capabilities

created at Feb. 6, 2014, 9:41 a.m.

Scala

101 +0

169 +0

51 +0

GitHub
cromwell by broadinstitute

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

created at April 17, 2015, 7:39 p.m.

Scala

112 +0

959 -1

350 +0

GitHub
spark-timeseries by sryza

A library for time series analysis on Apache Spark

created at March 11, 2015, 8:14 a.m.

Scala

134 +0

1,189 +0

427 +0

GitHub
Mobius by Microsoft

C# and F# language binding and extensions to Apache Spark

created at Oct. 27, 2015, 7:21 p.m.

C#

145 +0

939 +2

212 +0

GitHub
SynapseML by Microsoft

Simple and Distributed Machine Learning

created at June 5, 2017, 8:23 a.m.

Scala

146 +0

4,975 +3

815 +0

GitHub
aas by sryza

Code to accompany Advanced Analytics with Spark from O'Reilly Media

created at Nov. 8, 2014, 10:18 p.m.

Scala

148 +0

1,514 +0

1,032 +0

GitHub
spark-cassandra-connector by datastax

DataStax Connector for Apache Spark to Apache Cassandra

created at June 27, 2014, 3:45 p.m.

Scala

162 +0

1,931 +1

913 -1

GitHub
sparkling-water by h2oai

Sparkling Water provides H2O functionality inside Spark cluster

created at Oct. 13, 2014, 11:06 p.m.

Scala

179 +1

951 -1

363 +0

GitHub
spark-notebook by spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

created at Sept. 5, 2014, 7:35 p.m.

JavaScript

190 +0

3,148 +0

654 +0

GitHub
blaze by blaze

NumPy and Pandas interface to Big Data

created at Oct. 26, 2012, 2:25 p.m.

Python

195 +0

3,179 -1

393 +0

GitHub