awesome-spark/awesome-spark

oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

created at July 25, 2014, 8:08 p.m.

Java

209 +0

1,789 +1

405 +0

GitHub

delta by delta-io

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

created at April 22, 2019, 6:56 p.m.

Scala

215 +0

6,935 +13

1,583 +3

GitHub

spark-jobserver by spark-jobserver

REST job server for Apache Spark

created at Aug. 21, 2014, 11:07 p.m.

Scala

221 +0

2,841 +1

1,004 +0

GitHub

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, etc.

created at Aug. 29, 2016, 7:59 a.m.

Python

242 +0

6,049 +48

1,204 +2

GitHub

dplyr by tidyverse

dplyr: A grammar of data manipulation

created at Oct. 28, 2012, 1:39 p.m.

R

247 +1

4,665 +6

2,119 +1

GitHub

koalas by databricks

Koalas: pandas API on Apache Spark

created at Jan. 3, 2019, 9:46 p.m.

Python

316 +0

3,321 +0

355 +0

GitHub

spark-csv by databricks

CSV Data Source for Apache Spark 1.x

created at Dec. 3, 2014, 12:56 a.m.

Scala

418 +0

1,049 +1

446 +0

GitHub

scikit-learn by scikit-learn

scikit-learn: machine learning in Python

created at Aug. 17, 2010, 9:43 a.m.

Python

2,141 +0

58,265 +63

25,004 +18

GitHub

oryx by OryxProject

delta by delta-io

spark-jobserver by spark-jobserver

ipex-llm by intel-analytics

dplyr by tidyverse

koalas by databricks

spark-csv by databricks

scikit-learn by scikit-learn