awesome-spark/awesome-spark

first-edition by spark-in-action

The book's repo

updated at May 2, 2024, 11:57 a.m.

Scala

42 +0

272 +0

191 +0

GitHub

aut by archivesunleashed

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

updated at May 1, 2024, 4:39 p.m.

Scala

15 +0

133 +0

33 +0

GitHub

spark-notebook by spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

updated at May 1, 2024, 3:08 p.m.

JavaScript

190 +0

3,148 +0

654 +0

GitHub

delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

updated at April 30, 2024, 9:48 p.m.

Scala

16 +0

335 +0

51 +1

GitHub

spark-gotchas by awesome-spark

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

updated at April 30, 2024, 6:38 p.m.

Unknown languages

33 +0

355 +0

82 +0

GitHub

itachi by yaooqinn

A library that brings useful functions from various modern database management systems to Apache Spark

updated at April 29, 2024, 3:42 p.m.

Scala

5 +0

54 +0

4 +0

GitHub

incubator-toree by apache

Mirror of Apache Toree (Incubating)

updated at April 28, 2024, 11:16 p.m.

Scala

48 -1

731 +0

224 +0

GitHub

spark by dotnet

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

updated at April 27, 2024, 2:45 p.m.

C#

91 +0

1,999 +0

308 +0

GitHub

spark-timeseries by sryza

A library for time series analysis on Apache Spark

updated at April 24, 2024, 9:39 a.m.

Scala

134 +0

1,189 +0

427 +0

GitHub

spark-sklearn by databricks

(Deprecated) Scikit-learn integration package for Apache Spark

updated at April 17, 2024, 4:13 a.m.

Python

94 +0

1,077 +0

231 +0

GitHub

spark-fast-tests by MrPowers

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

updated at April 6, 2024, 7:28 p.m.

Scala

15 +0

418 +0

73 +0

GitHub

sparkle by tweag

Haskell on Apache Spark.

updated at April 4, 2024, 9:31 p.m.

Haskell

59 +0

444 +0

30 +0

GitHub

mist by Hydrospheredata

Serverless proxy for Spark cluster

updated at April 2, 2024, 5:42 p.m.

Scala

41 +0

326 +0

67 +0

GitHub

docker-spark by sequenceiq

None

updated at April 2, 2024, 5:41 p.m.

Shell

65 +0

764 +0

284 +0

GitHub

jpmml-evaluator-spark by jpmml

PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)

updated at March 31, 2024, 2:17 p.m.

Java

14 +0

94 +0

43 +0

GitHub

neo4j-mazerunner by neo4j-contrib

Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.

updated at March 31, 2024, 2:15 p.m.

Java

56 +0

377 +0

105 +0

GitHub

dbscan-on-spark by irvingc

An implementation of DBSCAN runing on top of Apache Spark

updated at March 17, 2024, 12:31 a.m.

Scala

19 +0

182 +0

58 +0

GitHub

Clustering4Ever by Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

updated at Feb. 29, 2024, 4:50 a.m.

Scala

21 +0

128 +0

13 +0

GitHub

photon-ml by linkedin

A scalable machine learning library on Apache Spark

updated at Feb. 29, 2024, 4:48 a.m.

Terra

83 +0

790 +0

185 +0

GitHub

flambo by sorenmacbeth

A Clojure DSL for Apache Spark

updated at Feb. 12, 2024, 2:53 p.m.

Clojure

78 +0

608 +0

86 +0

GitHub