awesome-spark/awesome-spark

spark-corenlp by databricks

Stanford CoreNLP wrapper for Apache Spark

created at Aug. 21, 2015, 8:54 p.m.

Scala

52 +0

423 +0

120 +0

GitHub

flintrock by nchammas

A command-line tool for launching Apache Spark clusters.

created at June 4, 2015, 7:14 a.m.

Python

33 +0

631 +0

114 +0

GitHub

magellan by harsha2010

Geo Spatial Data Analytics on Spark

created at June 1, 2015, 1:06 a.m.

Scala

65 +0

534 +1

150 +0

GitHub

mongo-spark by mongodb

The MongoDB Spark Connector

created at May 20, 2015, 5:59 p.m.

Java

79 +0

702 -1

307 +0

GitHub

spark-riak-connector by basho

The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV

created at May 7, 2015, 7:22 p.m.

Scala

66 +0

60 +0

29 +0

GitHub

sedona by apache

A cluster computing framework for processing large-scale geospatial data

created at April 24, 2015, 6:01 p.m.

Java

96 +0

1,784 +4

646 +1

GitHub

cromwell by broadinstitute

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

created at April 17, 2015, 7:39 p.m.

Scala

112 +0

959 -1

350 +0

GitHub

first-edition by spark-in-action

The book's repo

created at March 25, 2015, 2:54 a.m.

Scala

42 +0

272 +0

191 +0

GitHub

dbscan-on-spark by irvingc

An implementation of DBSCAN runing on top of Apache Spark

created at March 15, 2015, 12:45 a.m.

Scala

19 +0

182 +0

58 +0

GitHub

spark-timeseries by sryza

A library for time series analysis on Apache Spark

created at March 11, 2015, 8:14 a.m.

Scala

134 +0

1,189 +0

427 +0

GitHub

spark-testing-base by holdenk

Base classes to use when writing tests with Spark

created at Jan. 30, 2015, 10:23 p.m.

Scala

78 +0

1,497 +4

358 +0

GitHub

spark-csv by databricks

CSV Data Source for Apache Spark 1.x

created at Dec. 3, 2014, 12:56 a.m.

Scala

418 +0

1,049 +1

446 +0

GitHub

aas by sryza

Code to accompany Advanced Analytics with Spark from O'Reilly Media

created at Nov. 8, 2014, 10:18 p.m.

Scala

148 +0

1,514 +0

1,032 +0

GitHub

neo4j-mazerunner by neo4j-contrib

Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.

created at Oct. 28, 2014, 9:33 p.m.

Java

56 +0

377 +0

105 +0

GitHub

sparkling-water by h2oai

Sparkling Water provides H2O functionality inside Spark cluster

created at Oct. 13, 2014, 11:06 p.m.

Scala

179 +1

951 -1

363 +0

GitHub

spark-avro by databricks

Avro Data Source for Apache Spark

created at Sept. 30, 2014, 5:50 p.m.

Scala

71 +0

540 +0

310 +0

GitHub

spark-notebook by spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

created at Sept. 5, 2014, 7:35 p.m.

JavaScript

190 +0

3,148 +0

654 +0

GitHub

spark-jobserver by spark-jobserver

REST job server for Apache Spark

created at Aug. 21, 2014, 11:07 p.m.

Scala

221 +0

2,841 +1

1,004 +0

GitHub

oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

created at July 25, 2014, 8:08 p.m.

Java

209 +0

1,789 +1

405 +0

GitHub

docker-spark by sequenceiq

None

created at July 11, 2014, 3:45 p.m.

Shell

65 +0

764 +0

284 +0

GitHub