awesome-spark/awesome-spark

koalas by databricks

Koalas: pandas API on Apache Spark

created at Jan. 3, 2019, 9:46 p.m.

Python

316 +0

3,321 +0

355 +0

GitHub

joblib by joblib

Computing with Python functions.

created at May 7, 2010, 6:48 a.m.

Python

61 +0

3,679 +9

405 +3

GitHub

spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

created at Sept. 24, 2017, 7:36 p.m.

Scala

100 +0

3,708 +9

702 +1

GitHub

dplyr by tidyverse

dplyr: A grammar of data manipulation

created at Oct. 28, 2012, 1:39 p.m.

R

247 +1

4,665 +6

2,119 +1

GitHub

SynapseML by Microsoft

Simple and Distributed Machine Learning

created at June 5, 2017, 8:23 a.m.

Scala

146 +0

4,975 +3

815 +0

GitHub

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, etc.

created at Aug. 29, 2016, 7:59 a.m.

Python

242 +0

6,049 +48

1,204 +2

GitHub

delta by delta-io

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

created at April 22, 2019, 6:56 p.m.

Scala

215 +0

6,935 +13

1,583 +3

GitHub

scikit-learn by scikit-learn

scikit-learn: machine learning in Python

created at Aug. 17, 2010, 9:43 a.m.

Python

2,141 +0

58,265 +63

25,004 +18

GitHub

koalas by databricks

joblib by joblib

spark-nlp by JohnSnowLabs

dplyr by tidyverse

SynapseML by Microsoft

ipex-llm by intel-analytics

delta by delta-io

scikit-learn by scikit-learn