sparkling-water by h2oai

Sparkling Water provides H2O functionality inside Spark cluster

created at Oct. 13, 2014, 11:06 p.m.

Scala

179 +1

951 -1

363 +0

GitHub
scikit-learn by scikit-learn

scikit-learn: machine learning in Python

created at Aug. 17, 2010, 9:43 a.m.

Python

2,141 +0

58,265 +63

25,004 +18

GitHub
dist-keras by cerndb

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

created at July 25, 2016, 9:47 a.m.

Python

49 +0

623 -1

170 +0

GitHub
spark-sklearn by databricks

(Deprecated) Scikit-learn integration package for Apache Spark

created at Sept. 2, 2015, 6:44 p.m.

Python

94 +0

1,077 +0

231 +0

GitHub
dbscan-on-spark by irvingc

An implementation of DBSCAN runing on top of Apache Spark

created at March 15, 2015, 12:45 a.m.

Scala

19 +0

182 +0

58 +0

GitHub
graphframes by graphframes

None

created at Jan. 20, 2016, 11:17 p.m.

Scala

58 +0

972 +1

232 +0

GitHub
neo4j-mazerunner by neo4j-contrib

Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.

created at Oct. 28, 2014, 9:33 p.m.

Java

56 +0

377 +0

105 +0

GitHub
flint by twosigma

A Time Series Library for Apache Spark

created at Oct. 19, 2016, 5:44 p.m.

Scala

77 +0

992 +0

184 +0

GitHub
magellan by harsha2010

Geo Spatial Data Analytics on Spark

created at June 1, 2015, 1:06 a.m.

Scala

65 +0

534 +1

150 +0

GitHub
hail by hail-is

Cloud-native genomic dataframes and batch computing

created at Oct. 27, 2015, 8:55 p.m.

Python

55 +0

938 +0

235 +0

GitHub
spark-orientdb by orientechnologies

Apache Spark datasource for OrientDB

created at Oct. 31, 2016, 2:51 p.m.

Scala

15 +0

19 +0

11 +0

GitHub
mongo-spark by mongodb

The MongoDB Spark Connector

created at May 20, 2015, 5:59 p.m.

Java

79 +0

702 -1

307 +0

GitHub
spark-riak-connector by basho

The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV

created at May 7, 2015, 7:22 p.m.

Scala

66 +0

60 +0

29 +0

GitHub
jpmml-evaluator-spark by jpmml

PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)

created at Nov. 29, 2015, 10:03 a.m.

Java

14 +0

94 +0

43 +0

GitHub
spark-cassandra-connector by datastax

DataStax Connector for Apache Spark to Apache Cassandra

created at June 27, 2014, 3:45 p.m.

Scala

162 +0

1,931 +1

913 -1

GitHub
spark-xml by databricks

XML data source for Spark SQL and DataFrames

created at Nov. 26, 2015, 2:46 a.m.

Scala

40 +0

489 +1

223 +0

GitHub
spark-avro by databricks

Avro Data Source for Apache Spark

created at Sept. 30, 2014, 5:50 p.m.

Scala

71 +0

540 +0

310 +0

GitHub
spark-csv by databricks

CSV Data Source for Apache Spark 1.x

created at Dec. 3, 2014, 12:56 a.m.

Scala

418 +0

1,049 +1

446 +0

GitHub
sparkmagic by jupyter-incubator

Jupyter magics and kernels for working with remote Spark clusters

created at Sept. 21, 2015, 3:35 p.m.

Python

49 +0

1,287 +0

438 +0

GitHub
spark-timeseries by sryza

A library for time series analysis on Apache Spark

created at March 11, 2015, 8:14 a.m.

Scala

134 +0

1,189 +0

427 +0

GitHub