awesome-spark/awesome-spark

joblib by joblib

Computing with Python functions.

updated at May 11, 2024, 9 a.m.

Python

61 +0

3,679 +9

405 +3

GitHub

hail by hail-is

Cloud-native genomic dataframes and batch computing

updated at May 11, 2024, 1:20 p.m.

Python

55 +0

938 +0

235 +0

GitHub

spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

updated at May 11, 2024, 9:33 p.m.

Scala

100 +0

3,708 +9

702 +1

GitHub

deequ by awslabs

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

updated at May 11, 2024, 11:29 p.m.

Scala

80 +0

3,140 +6

514 +0

GitHub

mleap by combust

MLeap: Deploy ML Pipelines to Production

updated at May 12, 2024, 1 a.m.

Scala

69 +0

1,496 +2

313 +0

GitHub

scikit-learn by scikit-learn

scikit-learn: machine learning in Python

updated at May 12, 2024, 2:16 a.m.

Python

2,141 +0

58,265 +63

25,004 +18

GitHub

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, etc.

updated at May 12, 2024, 3:48 a.m.

Python

242 +0

6,049 +48

1,204 +2

GitHub

mongo-spark by mongodb

The MongoDB Spark Connector

updated at May 12, 2024, 6:15 a.m.

Java

79 +0

702 -1

307 +0

GitHub

joblib by joblib

hail by hail-is

spark-nlp by JohnSnowLabs

deequ by awslabs

mleap by combust

scikit-learn by scikit-learn

ipex-llm by intel-analytics

mongo-spark by mongodb