oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

created at July 25, 2014, 8:08 p.m.

Java

209 +0

1,789 +1

405 +0

GitHub
delta by delta-io

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

created at April 22, 2019, 6:56 p.m.

Scala

215 +0

6,935 +13

1,583 +3

GitHub
spark-jobserver by spark-jobserver

REST job server for Apache Spark

created at Aug. 21, 2014, 11:07 p.m.

Scala

221 +0

2,841 +1

1,004 +0

GitHub
ipex-llm by intel-analytics

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, etc.

created at Aug. 29, 2016, 7:59 a.m.

Python

242 +0

6,049 +48

1,204 +2

GitHub
dplyr by tidyverse

dplyr: A grammar of data manipulation

created at Oct. 28, 2012, 1:39 p.m.

R

247 +1

4,665 +6

2,119 +1

GitHub
koalas by databricks

Koalas: pandas API on Apache Spark

created at Jan. 3, 2019, 9:46 p.m.

Python

316 +0

3,321 +0

355 +0

GitHub
spark-csv by databricks

CSV Data Source for Apache Spark 1.x

created at Dec. 3, 2014, 12:56 a.m.

Scala

418 +0

1,049 +1

446 +0

GitHub
scikit-learn by scikit-learn

scikit-learn: machine learning in Python

created at Aug. 17, 2010, 9:43 a.m.

Python

2,141 +0

58,265 +63

25,004 +18

GitHub