joblib by joblib

Computing with Python functions.

updated at Nov. 15, 2024, 11:01 p.m.

Python

64 +1

3,876 +11

416 +2

GitHub
incubator-livy by apache

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

updated at Nov. 16, 2024, 9:40 a.m.

Scala

60 +0

888 +2

602 +0

GitHub
koalas by databricks

Koalas: pandas API on Apache Spark

updated at Nov. 16, 2024, 10:21 a.m.

Python

326 +0

3,338 +3

358 +0

GitHub
spark by dotnet

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

updated at Nov. 16, 2024, 1:01 p.m.

C#

93 +0

2,024 +1

315 +0

GitHub
spark-connect-go by apache

Apache Spark Connect Client for Golang

updated at Nov. 17, 2024, 12:26 a.m.

Go

25 +0

161 +2

32 +0

GitHub
scikit-learn by scikit-learn

scikit-learn: machine learning in Python

updated at Nov. 17, 2024, 1:36 a.m.

Python

2,138 -1

60,149 +80

25,410 +17

GitHub
ipex-llm by intel-analytics

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc

updated at Nov. 17, 2024, 11:33 a.m.

Python

251 +0

6,718 +29

1,264 +3

GitHub
deequ by awslabs

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

updated at Nov. 17, 2024, 2:14 p.m.

Scala

81 +0

3,308 +1

539 +1

GitHub
spark-jobserver by spark-jobserver

REST job server for Apache Spark

updated at Nov. 17, 2024, 2:15 p.m.

Scala

221 +0

2,839 -1

998 +0

GitHub
hudi by apache

Upserts, Deletes And Incremental Processing on Big Data.

updated at Nov. 17, 2024, 5:10 p.m.

Java

1,164 +1

5,436 +21

2,424 -1

GitHub
sedona by apache

A cluster computing framework for processing large-scale geospatial data

updated at Nov. 17, 2024, 5:14 p.m.

Java

95 +0

1,956 +2

693 -2

GitHub
delta by delta-io

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

updated at Nov. 17, 2024, 6:58 p.m.

Scala

217 +0

7,599 +18

1,707 +6

GitHub
graphframes by graphframes

None

updated at Nov. 17, 2024, 8:23 p.m.

Scala

59 +0

1,001 +2

237 +0

GitHub
iceberg by apache

Apache Iceberg

updated at Nov. 17, 2024, 8:29 p.m.

Java

160 +0

6,464 +20

2,235 +10

GitHub