sedona by apache

A cluster computing framework for processing large-scale geospatial data

updated at April 28, 2024, 8:52 a.m.

Java

96 +0

1,776 +5

644 +2

GitHub
ipex-llm by intel-analytics

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, etc.

updated at April 28, 2024, 8:48 a.m.

Python

243 +0

5,954 +43

1,200 +5

GitHub
incubator-livy by apache

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

updated at April 28, 2024, 8:20 a.m.

Scala

57 +0

855 +4

594 +0

GitHub
delta by delta-io

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

updated at April 28, 2024, 7:40 a.m.

Scala

215 +1

6,903 +27

1,573 +4

GitHub
spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

updated at April 28, 2024, 7:28 a.m.

Scala

100 +0

3,693 +22

699 +1

GitHub
scikit-learn by scikit-learn

scikit-learn: machine learning in Python

updated at April 28, 2024, 6:14 a.m.

Python

2,140 +1

58,132 +86

24,976 +6

GitHub
kyuubi by apache

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

updated at April 28, 2024, 5:30 a.m.

Scala

63 +0

1,937 +9

857 +0

GitHub
joblib by joblib

Computing with Python functions.

updated at April 28, 2024, 2:57 a.m.

Python

61 +0

3,662 +11

401 +1

GitHub
koalas by databricks

Koalas: pandas API on Apache Spark

updated at April 28, 2024, 2:27 a.m.

Python

316 +1

3,320 +2

353 +0

GitHub
SynapseML by Microsoft

Simple and Distributed Machine Learning

updated at April 28, 2024, 1:52 a.m.

Scala

146 +0

4,968 +1

813 +1

GitHub
kotlin-spark-api by Kotlin

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

updated at April 27, 2024, 8:33 p.m.

Kotlin

18 +0

440 +3

34 +0

GitHub
dplyr by tidyverse

dplyr: A grammar of data manipulation

updated at April 27, 2024, 7:18 p.m.

R

246 +0

4,656 +3

2,116 +0

GitHub
spark by dotnet

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

updated at April 27, 2024, 2:45 p.m.

C#

91 +0

1,999 +2

308 +0

GitHub
aas by sryza

Code to accompany Advanced Analytics with Spark from O'Reilly Media

updated at April 27, 2024, 11:44 a.m.

Scala

148 +0

1,515 +1

1,032 +0

GitHub
graphframes by graphframes

None

updated at April 27, 2024, 11:19 a.m.

Scala

58 +0

971 +1

232 +0

GitHub
deequ by awslabs

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

updated at April 27, 2024, 9:17 a.m.

Scala

81 +0

3,127 +6

513 +0

GitHub
hail by hail-is

Cloud-native genomic dataframes and batch computing

updated at April 26, 2024, 7:20 p.m.

Python

55 +0

934 -2

235 +1

GitHub
sparklyr by sparklyr

R interface for Apache Spark

updated at April 26, 2024, 12:55 p.m.

R

73 +0

923 +1

302 +0

GitHub
neo4j-spark-connector by neo4j-contrib

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

updated at April 26, 2024, 9:31 a.m.

Scala

35 +0

303 +0

114 +0

GitHub
sparkmagic by jupyter-incubator

Jupyter magics and kernels for working with remote Spark clusters

updated at April 26, 2024, 8:52 a.m.

Python

49 +0

1,286 +4

437 +0

GitHub