adam by bigdatagenomics

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

updated at May 13, 2024, 11:56 a.m.

Scala

100 +0

967 +0

304 +0

GitHub
neo4j-spark-connector by neo4j-contrib

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

updated at May 13, 2024, 8:43 a.m.

Scala

35 +0

304 +0

114 +0

GitHub
spark-daria by MrPowers

Essential Spark extensions and helper methods ✨😲

updated at May 12, 2024, 6:41 p.m.

Scala

33 +0

742 +0

148 +0

GitHub
photon-ml by linkedin

A scalable machine learning library on Apache Spark

updated at May 12, 2024, 9:15 a.m.

Terra

83 +0

789 +0

185 +0

GitHub
dist-keras by cerndb

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

updated at May 10, 2024, 5:12 a.m.

Python

49 +0

623 +0

170 +0

GitHub
spark-jobserver by spark-jobserver

REST job server for Apache Spark

updated at May 9, 2024, 3:16 a.m.

Scala

221 +0

2,841 +0

1,004 +0

GitHub
magellan by harsha2010

Geo Spatial Data Analytics on Spark

updated at May 8, 2024, 1:18 p.m.

Scala

65 +0

534 +0

150 +0

GitHub
spark-csv by databricks

CSV Data Source for Apache Spark 1.x

updated at May 7, 2024, 12:54 p.m.

Scala

420 +2

1,049 +0

445 +0

GitHub
livy by cloudera

Livy is an open source REST interface for interacting with Apache Spark from anywhere

updated at May 4, 2024, 5:57 p.m.

Scala

91 +0

1,005 +0

315 +0

GitHub
aas by sryza

Code to accompany Advanced Analytics with Spark from O'Reilly Media

updated at May 2, 2024, 4:43 p.m.

Scala

148 +0

1,514 +0

1,032 +0

GitHub
first-edition by spark-in-action

The book's repo

updated at May 2, 2024, 11:57 a.m.

Scala

42 +0

272 +0

189 +0

GitHub
aut by archivesunleashed

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

updated at May 1, 2024, 4:39 p.m.

Scala

15 +0

133 +0

33 +0

GitHub
spark-notebook by spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

updated at May 1, 2024, 3:08 p.m.

JavaScript

190 +0

3,148 +0

654 +0

GitHub
spark-gotchas by awesome-spark

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

updated at April 30, 2024, 6:38 p.m.

Unknown languages

33 +0

355 +0

82 +0

GitHub
spark-timeseries by sryza

A library for time series analysis on Apache Spark

updated at April 24, 2024, 9:39 a.m.

Scala

134 +0

1,189 +0

427 +0

GitHub
spark-sklearn by databricks

(Deprecated) Scikit-learn integration package for Apache Spark

updated at April 17, 2024, 4:13 a.m.

Python

95 +1

1,077 +0

231 +0

GitHub
sparkle by tweag

Haskell on Apache Spark.

updated at April 4, 2024, 9:31 p.m.

Haskell

59 +0

444 +0

30 +0

GitHub
mist by Hydrospheredata

Serverless proxy for Spark cluster

updated at April 2, 2024, 5:42 p.m.

Scala

40 -1

326 +0

68 +0

GitHub
docker-spark by sequenceiq

None

updated at April 2, 2024, 5:41 p.m.

Shell

65 +0

764 +0

283 +0

GitHub
jpmml-evaluator-spark by jpmml

PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)

updated at March 31, 2024, 2:17 p.m.

Java

14 +0

94 +0

43 +0

GitHub