ipex-llm by intel-analytics

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, etc.

updated at May 26, 2024, 1:23 p.m.

Python

243 +1

6,099 +27

1,208 +0

GitHub
kyuubi by apache

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

updated at May 26, 2024, 8:54 a.m.

Scala

64 +2

1,962 +11

863 +2

GitHub
itachi by yaooqinn

A library that brings useful functions from various modern database management systems to Apache Spark

updated at May 26, 2024, 8:52 a.m.

Scala

5 +0

53 -1

4 +0

GitHub
spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

updated at May 26, 2024, 8:02 a.m.

Scala

100 +0

3,720 +4

704 +2

GitHub
SynapseML by Microsoft

Simple and Distributed Machine Learning

updated at May 26, 2024, 6:17 a.m.

Scala

146 +0

4,991 +7

819 +1

GitHub
koalas by databricks

Koalas: pandas API on Apache Spark

updated at May 26, 2024, 2:57 a.m.

Python

318 +2

3,320 -1

355 +0

GitHub
scikit-learn by scikit-learn

scikit-learn: machine learning in Python

updated at May 25, 2024, 8:59 p.m.

Python

2,141 +0

58,415 +64

25,028 +10

GitHub
incubator-livy by apache

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

updated at May 25, 2024, 7:50 p.m.

Scala

57 +0

858 +1

594 +0

GitHub
spark-testing-base by holdenk

Base classes to use when writing tests with Spark

updated at May 25, 2024, 7:11 p.m.

Scala

78 +0

1,499 +1

358 +0

GitHub
dplyr by tidyverse

dplyr: A grammar of data manipulation

updated at May 25, 2024, 3:41 p.m.

R

245 +0

4,672 +1

2,117 +0

GitHub
delta by delta-io

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

updated at May 25, 2024, 3:05 p.m.

Scala

216 +1

6,972 +11

1,591 +7

GitHub
sparkmagic by jupyter-incubator

Jupyter magics and kernels for working with remote Spark clusters

updated at May 25, 2024, 2:45 p.m.

Python

49 +0

1,288 +2

438 +0

GitHub
joblib-spark by joblib

Joblib Apache Spark Backend

updated at May 25, 2024, 1:34 p.m.

Python

9 +0

239 +1

26 +0

GitHub
joblib by joblib

Computing with Python functions.

updated at May 25, 2024, 1:34 p.m.

Python

63 +1

3,694 +9

408 +1

GitHub
cromwell by broadinstitute

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

updated at May 25, 2024, 9:18 a.m.

Scala

112 +0

962 +1

351 +1

GitHub
sedona by apache

A cluster computing framework for processing large-scale geospatial data

updated at May 25, 2024, 7:33 a.m.

Java

96 +1

1,791 +5

648 +2

GitHub
sparklyr by sparklyr

R interface for Apache Spark

updated at May 25, 2024, 6 a.m.

R

73 +0

929 +3

302 +0

GitHub
hail by hail-is

Cloud-native genomic dataframes and batch computing

updated at May 24, 2024, 7:38 p.m.

Python

55 +0

943 +2

238 +2

GitHub
deequ by awslabs

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

updated at May 24, 2024, 4:08 p.m.

Scala

80 +0

3,145 +1

513 -1

GitHub
sparkling-water by h2oai

Sparkling Water provides H2O functionality inside Spark cluster

updated at May 24, 2024, 2:50 p.m.

Scala

179 +0

952 +0

363 +0

GitHub