sparkmagic by jupyter-incubator

Jupyter magics and kernels for working with remote Spark clusters

created at Sept. 21, 2015, 3:35 p.m.

Python

48 +0

1,328 -1

447 +1

GitHub
livy by cloudera

Livy is an open source REST interface for interacting with Apache Spark from anywhere

created at Nov. 17, 2015, 6:55 a.m.

Scala

91 +0

1,009 +0

314 +0

GitHub
adam by bigdatagenomics

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

created at Nov. 19, 2013, 11:47 p.m.

Scala

100 +0

1,003 +0

308 +0

GitHub
graphframes by graphframes

None

created at Jan. 20, 2016, 11:17 p.m.

Scala

59 +0

1,001 +2

237 +0

GitHub
cromwell by broadinstitute

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

created at April 17, 2015, 7:39 p.m.

Scala

110 +0

997 +1

361 +1

GitHub
hail by hail-is

Cloud-native genomic dataframes and batch computing

created at Oct. 27, 2015, 8:55 p.m.

Python

55 +0

984 +2

246 +0

GitHub
sparkling-water by h2oai

Sparkling Water provides H2O functionality inside Spark cluster

created at Oct. 13, 2014, 11:06 p.m.

Scala

180 +0

968 +1

360 +0

GitHub
sparklyr by sparklyr

R interface for Apache Spark

created at May 20, 2016, 3:28 p.m.

R

73 +0

957 +1

310 +0

GitHub
incubator-livy by apache

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

created at June 25, 2017, 7 a.m.

Scala

60 +0

888 +2

602 +0

GitHub
photon-ml by linkedin

A scalable machine learning library on Apache Spark

created at Feb. 3, 2016, 1:12 a.m.

Terra

82 +0

792 +0

185 +0

GitHub
docker-spark by sequenceiq

None

created at July 11, 2014, 3:45 p.m.

Shell

65 +0

765 +0

282 +0

GitHub
spark-daria by mrpowers-io

Essential Spark extensions and helper methods ✨😲

created at Feb. 16, 2017, 3:41 p.m.

Scala

34 +0

754 +0

152 +0

GitHub
incubator-toree by apache

Mirror of Apache Toree (Incubating)

created at Jan. 7, 2016, 8 a.m.

Scala

48 +0

740 +0

225 +0

GitHub
python-deequ by awslabs

Python API for Deequ

created at Nov. 9, 2020, 9:28 p.m.

Jupyter Notebook

17 +0

730 +3

136 +1

GitHub
mongo-spark by mongodb

The MongoDB Spark Connector

created at May 20, 2015, 5:59 p.m.

Java

79 +0

712 +0

309 +0

GitHub
quinn by mrpowers-io

pyspark methods to enhance developer productivity 📣 👯 🎉

created at Sept. 15, 2017, 1:02 p.m.

Python

20 +0

642 +1

99 +0

GitHub
flintrock by nchammas

A command-line tool for launching Apache Spark clusters.

created at June 4, 2015, 7:14 a.m.

Python

31 +0

638 +0

116 +0

GitHub
chispa by MrPowers

PySpark test helper methods with beautiful error messages

created at March 19, 2019, 3:52 p.m.

Python

5 +0

620 +3

68 +0

GitHub
spark-xml by databricks

XML data source for Spark SQL and DataFrames

created at Nov. 26, 2015, 2:46 a.m.

Scala

39 +0

505 +0

226 -1

GitHub
kotlin-spark-api by Kotlin

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

created at June 1, 2020, 11:07 a.m.

Kotlin

20 -1

461 +0

35 +0

GitHub