silex by willb

something to help you spark

created at Oct. 5, 2016, 5:47 p.m.

Scala

3 +0

19 +0

0 +0

GitHub
itachi by yaooqinn

A library that brings useful functions from various modern database management systems to Apache Spark

created at April 2, 2020, noon

Scala

5 +0

53 +0

4 +0

GitHub
pyspark-stubs by zero323

Apache (Py)Spark type annotations (stub files).

created at Jan. 31, 2017, 1:13 a.m.

Python

6 +0

114 +0

37 +0

GitHub
joblib-spark by joblib

Joblib Apache Spark Backend

created at Nov. 20, 2019, 7:02 p.m.

Python

9 +0

237 +0

26 +0

GitHub
jpmml-evaluator-spark by jpmml

PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)

created at Nov. 29, 2015, 10:03 a.m.

Java

14 +0

94 +0

43 +0

GitHub
aut by archivesunleashed

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

created at July 6, 2017, 10:13 a.m.

Scala

15 +0

131 +0

33 +0

GitHub
spark-fast-tests by MrPowers

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

created at April 6, 2017, 9:40 p.m.

Scala

15 +0

418 +0

73 +0

GitHub
spark-orientdb by orientechnologies

Apache Spark datasource for OrientDB

created at Oct. 31, 2016, 2:51 p.m.

Scala

15 +0

19 +0

11 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

created at Oct. 26, 2020, 1:56 p.m.

Scala

16 +0

334 +0

50 +0

GitHub
kotlin-spark-api by Kotlin

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

created at June 1, 2020, 11:07 a.m.

Kotlin

18 +0

440 +3

34 +0

GitHub
quinn by MrPowers

pyspark methods to enhance developer productivity 📣 👯 🎉

created at Sept. 15, 2017, 1:02 p.m.

Python

19 +0

578 +2

90 +0

GitHub
dbscan-on-spark by irvingc

An implementation of DBSCAN runing on top of Apache Spark

created at March 15, 2015, 12:45 a.m.

Scala

19 +0

182 +0

58 +0

GitHub
Clustering4Ever by Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

created at March 26, 2018, 7:58 p.m.

Scala

21 +0

128 +0

13 +0

GitHub
flintrock by nchammas

A command-line tool for launching Apache Spark clusters.

created at June 4, 2015, 7:14 a.m.

Python

33 +0

630 +0

114 +0

GitHub
spark-gotchas by awesome-spark

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

created at June 2, 2016, 10:21 p.m.

Unknown languages

33 +0

354 +0

82 +0

GitHub
spark-daria by MrPowers

Essential Spark extensions and helper methods ✨😲

created at Feb. 16, 2017, 3:41 p.m.

Scala

33 +0

742 +0

147 +0

GitHub
neo4j-spark-connector by neo4j-contrib

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

created at March 3, 2016, 4:01 p.m.

Scala

35 +0

303 +0

114 +0

GitHub
sparkly by Tubular

Helpers & syntactic sugar for PySpark.

created at Oct. 7, 2016, 3:50 p.m.

Python

38 +0

60 +0

7 +0

GitHub
spark-xml by databricks

XML data source for Spark SQL and DataFrames

created at Nov. 26, 2015, 2:46 a.m.

Scala

40 +0

487 +0

222 +0

GitHub
mist by Hydrospheredata

Serverless proxy for Spark cluster

created at Jan. 15, 2016, 7:22 a.m.

Scala

41 +0

326 +0

67 +0

GitHub