spark-csv by databricks

CSV Data Source for Apache Spark 1.x

updated at May 7, 2024, 12:54 p.m.

Scala

420 +2

1,049 +0

445 +0

GitHub
magellan by harsha2010

Geo Spatial Data Analytics on Spark

updated at May 8, 2024, 1:18 p.m.

Scala

65 +0

534 +0

150 +0

GitHub
spark-jobserver by spark-jobserver

REST job server for Apache Spark

updated at May 9, 2024, 3:16 a.m.

Scala

221 +0

2,841 +0

1,004 +0

GitHub
dist-keras by cerndb

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

updated at May 10, 2024, 5:12 a.m.

Python

49 +0

623 +0

170 +0

GitHub
photon-ml by linkedin

A scalable machine learning library on Apache Spark

updated at May 12, 2024, 9:15 a.m.

Terra

83 +0

789 +0

185 +0

GitHub
spark-daria by MrPowers

Essential Spark extensions and helper methods ✨😲

updated at May 12, 2024, 6:41 p.m.

Scala

33 +0

742 +0

148 +0

GitHub
neo4j-spark-connector by neo4j-contrib

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

updated at May 13, 2024, 8:43 a.m.

Scala

35 +0

304 +0

114 +0

GitHub
adam by bigdatagenomics

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

updated at May 13, 2024, 11:56 a.m.

Scala

100 +0

967 +0

304 +0

GitHub
neo4j-mazerunner by neo4j-contrib

Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.

updated at May 14, 2024, 7:19 a.m.

Java

56 +0

378 +0

105 +0

GitHub
kotlin-spark-api by Kotlin

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

updated at May 14, 2024, 10:01 p.m.

Kotlin

19 +0

443 +0

34 +0

GitHub
flintrock by nchammas

A command-line tool for launching Apache Spark clusters.

updated at May 16, 2024, 1:26 p.m.

Python

33 +0

632 +0

115 +0

GitHub
flambo by sorenmacbeth

A Clojure DSL for Apache Spark

updated at May 16, 2024, 8:28 p.m.

Clojure

78 +0

609 +0

86 +0

GitHub
spark-fast-tests by MrPowers

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

updated at May 17, 2024, 4:50 p.m.

Scala

15 +0

421 +0

73 +0

GitHub
dbscan-on-spark by irvingc

An implementation of DBSCAN runing on top of Apache Spark

updated at May 18, 2024, 7:22 a.m.

Scala

19 +0

183 +0

58 +0

GitHub
mongo-spark by mongodb

The MongoDB Spark Connector

updated at May 18, 2024, 8:19 a.m.

Java

79 +0

703 +0

307 +0

GitHub
flint by twosigma

A Time Series Library for Apache Spark

updated at May 19, 2024, 9:51 a.m.

Scala

77 +0

992 +0

184 +0

GitHub
oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

updated at May 19, 2024, 10:14 a.m.

Java

209 +0

1,788 -1

405 +0

GitHub
mleap by combust

MLeap: Deploy ML Pipelines to Production

updated at May 21, 2024, 10:11 p.m.

Scala

69 +0

1,499 +1

313 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

updated at May 22, 2024, 4:13 p.m.

Scala

16 +0

337 +1

51 +0

GitHub
quinn by MrPowers

pyspark methods to enhance developer productivity 📣 👯 🎉

updated at May 22, 2024, 9:33 p.m.

Python

19 +0

583 +1

92 +0

GitHub