neo4j-spark-connector by neo4j-contrib

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

created at March 3, 2016, 4:01 p.m.

Scala

35 +0

304 +1

114 +0

GitHub
sparklyr by sparklyr

R interface for Apache Spark

created at May 20, 2016, 3:28 p.m.

R

73 +0

926 +2

302 +0

GitHub
spark-gotchas by awesome-spark

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

created at June 2, 2016, 10:21 p.m.

Unknown languages

33 +0

355 +0

82 +0

GitHub
dist-keras by cerndb

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

created at July 25, 2016, 9:47 a.m.

Python

49 +0

623 -1

170 +0

GitHub
mleap by combust

MLeap: Deploy ML Pipelines to Production

created at Aug. 23, 2016, 3:51 a.m.

Scala

69 +0

1,496 +2

313 +0

GitHub
ipex-llm by intel-analytics

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, etc.

created at Aug. 29, 2016, 7:59 a.m.

Python

242 +0

6,049 +48

1,204 +2

GitHub
silex by willb

something to help you spark

created at Oct. 5, 2016, 5:47 p.m.

Scala

3 +0

19 +0

0 +0

GitHub
sparkly by Tubular

Helpers & syntactic sugar for PySpark.

created at Oct. 7, 2016, 3:50 p.m.

Python

38 +0

60 +0

7 +0

GitHub
flint by twosigma

A Time Series Library for Apache Spark

created at Oct. 19, 2016, 5:44 p.m.

Scala

77 +0

992 +0

184 +0

GitHub
spark-orientdb by orientechnologies

Apache Spark datasource for OrientDB

created at Oct. 31, 2016, 2:51 p.m.

Scala

15 +0

19 +0

11 +0

GitHub
pyspark-stubs by zero323

Apache (Py)Spark type annotations (stub files).

created at Jan. 31, 2017, 1:13 a.m.

Python

6 +0

114 +0

37 +0

GitHub
spark-daria by MrPowers

Essential Spark extensions and helper methods ✨😲

created at Feb. 16, 2017, 3:41 p.m.

Scala

33 +0

743 +1

148 +0

GitHub
spark-fast-tests by MrPowers

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

created at April 6, 2017, 9:40 p.m.

Scala

15 +0

418 +0

73 +0

GitHub
SynapseML by Microsoft

Simple and Distributed Machine Learning

created at June 5, 2017, 8:23 a.m.

Scala

146 +0

4,975 +3

815 +0

GitHub
incubator-livy by apache

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

created at June 25, 2017, 7 a.m.

Scala

57 +0

857 +1

594 +0

GitHub
aut by archivesunleashed

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

created at July 6, 2017, 10:13 a.m.

Scala

15 +0

133 +0

33 +0

GitHub
quinn by MrPowers

pyspark methods to enhance developer productivity 📣 👯 🎉

created at Sept. 15, 2017, 1:02 p.m.

Python

19 +0

581 +1

91 +0

GitHub
spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

created at Sept. 24, 2017, 7:36 p.m.

Scala

100 +0

3,708 +9

702 +1

GitHub
kyuubi by apache

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

created at Dec. 18, 2017, 9:05 a.m.

Scala

62 -1

1,947 +6

860 +1

GitHub
Clustering4Ever by Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

created at March 26, 2018, 7:58 p.m.

Scala

21 +0

128 +0

13 +0

GitHub