first-edition by spark-in-action

The book's repo

created at March 25, 2015, 2:54 a.m.

Scala

42 +0

272 +0

191 +0

GitHub
cromwell by broadinstitute

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

created at April 17, 2015, 7:39 p.m.

Scala

112 +0

959 -1

350 +0

GitHub
sedona by apache

A cluster computing framework for processing large-scale geospatial data

created at April 24, 2015, 6:01 p.m.

Java

96 +0

1,784 +4

646 +1

GitHub
spark-riak-connector by basho

The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV

created at May 7, 2015, 7:22 p.m.

Scala

66 +0

60 +0

29 +0

GitHub
mongo-spark by mongodb

The MongoDB Spark Connector

created at May 20, 2015, 5:59 p.m.

Java

79 +0

702 -1

307 +0

GitHub
magellan by harsha2010

Geo Spatial Data Analytics on Spark

created at June 1, 2015, 1:06 a.m.

Scala

65 +0

534 +1

150 +0

GitHub
flintrock by nchammas

A command-line tool for launching Apache Spark clusters.

created at June 4, 2015, 7:14 a.m.

Python

33 +0

631 +0

114 +0

GitHub
spark-corenlp by databricks

Stanford CoreNLP wrapper for Apache Spark

created at Aug. 21, 2015, 8:54 p.m.

Scala

52 +0

423 +0

120 +0

GitHub
spark-sklearn by databricks

(Deprecated) Scikit-learn integration package for Apache Spark

created at Sept. 2, 2015, 6:44 p.m.

Python

94 +0

1,077 +0

231 +0

GitHub
sparkmagic by jupyter-incubator

Jupyter magics and kernels for working with remote Spark clusters

created at Sept. 21, 2015, 3:35 p.m.

Python

49 +0

1,287 +0

438 +0

GitHub
Mobius by Microsoft

C# and F# language binding and extensions to Apache Spark

created at Oct. 27, 2015, 7:21 p.m.

C#

145 +0

939 +2

212 +0

GitHub
hail by hail-is

Cloud-native genomic dataframes and batch computing

created at Oct. 27, 2015, 8:55 p.m.

Python

55 +0

938 +0

235 +0

GitHub
sparkle by tweag

Haskell on Apache Spark.

created at Nov. 9, 2015, 3:49 p.m.

Haskell

59 +0

444 +0

30 +0

GitHub
livy by cloudera

Livy is an open source REST interface for interacting with Apache Spark from anywhere

created at Nov. 17, 2015, 6:55 a.m.

Scala

91 +0

1,005 +0

316 +0

GitHub
spark-xml by databricks

XML data source for Spark SQL and DataFrames

created at Nov. 26, 2015, 2:46 a.m.

Scala

40 +0

489 +1

223 +0

GitHub
jpmml-evaluator-spark by jpmml

PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)

created at Nov. 29, 2015, 10:03 a.m.

Java

14 +0

94 +0

43 +0

GitHub
incubator-toree by apache

Mirror of Apache Toree (Incubating)

created at Jan. 7, 2016, 8 a.m.

Scala

48 -1

731 +0

224 +0

GitHub
mist by Hydrospheredata

Serverless proxy for Spark cluster

created at Jan. 15, 2016, 7:22 a.m.

Scala

41 +0

326 +0

67 +0

GitHub
graphframes by graphframes

None

created at Jan. 20, 2016, 11:17 p.m.

Scala

58 +0

972 +1

232 +0

GitHub
photon-ml by linkedin

A scalable machine learning library on Apache Spark

created at Feb. 3, 2016, 1:12 a.m.

Terra

83 +0

790 +0

185 +0

GitHub