spark-daria by mrpowers-io

Essential Spark extensions and helper methods ✨😲

updated at Nov. 8, 2024, 2:27 a.m.

Scala

34 +0

754 +0

152 +0

GitHub
photon-ml by linkedin

A scalable machine learning library on Apache Spark

updated at Nov. 8, 2024, 6:04 a.m.

Terra

82 +0

792 +0

185 +0

GitHub
spark-fast-tests by mrpowers-io

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

updated at Nov. 8, 2024, 12:32 p.m.

Scala

16 +0

436 +0

77 +0

GitHub
incubator-toree by apache

Mirror of Apache Toree (Incubating)

updated at Nov. 8, 2024, 5:15 p.m.

Scala

48 +0

740 +0

225 +0

GitHub
kotlin-spark-api by Kotlin

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

updated at Nov. 10, 2024, 6:49 a.m.

Kotlin

20 -1

461 +0

35 +0

GitHub
quinn by mrpowers-io

pyspark methods to enhance developer productivity 📣 👯 🎉

updated at Nov. 11, 2024, 9:29 a.m.

Python

20 +0

642 +1

99 +0

GitHub
sparklyr by sparklyr

R interface for Apache Spark

updated at Nov. 12, 2024, 2:27 a.m.

R

73 +0

957 +1

310 +0

GitHub
mleap by combust

MLeap: Deploy ML Pipelines to Production

updated at Nov. 12, 2024, 3:16 p.m.

Scala

66 +0

1,504 +0

313 +1

GitHub
sparkmagic by jupyter-incubator

Jupyter magics and kernels for working with remote Spark clusters

updated at Nov. 14, 2024, 5:19 a.m.

Python

48 +0

1,328 -1

447 +1

GitHub
neo4j-spark-connector by neo4j

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

updated at Nov. 14, 2024, 9:10 a.m.

Scala

34 +0

313 +0

112 +0

GitHub
SynapseML by Microsoft

Simple and Distributed Machine Learning

updated at Nov. 14, 2024, 10:15 a.m.

Scala

146 +0

5,068 +3

831 -1

GitHub
dplyr by tidyverse

dplyr: A grammar of data manipulation

updated at Nov. 14, 2024, 7:54 p.m.

R

244 +0

4,781 +6

2,122 +0

GitHub
kyuubi by apache

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

updated at Nov. 15, 2024, 8:22 a.m.

Scala

62 +0

2,105 +7

914 -2

GitHub
spark-testing-base by holdenk

Base classes to use when writing tests with Spark

updated at Nov. 15, 2024, 9:20 a.m.

Scala

77 +0

1,523 +1

358 +0

GitHub
cromwell by broadinstitute

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

updated at Nov. 15, 2024, 9:25 a.m.

Scala

110 +0

997 +1

361 +1

GitHub
hail by hail-is

Cloud-native genomic dataframes and batch computing

updated at Nov. 15, 2024, 10:30 a.m.

Python

55 +0

984 +2

246 +0

GitHub
spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

updated at Nov. 15, 2024, 2:29 p.m.

Scala

100 +0

3,871 +6

712 +2

GitHub
chispa by MrPowers

PySpark test helper methods with beautiful error messages

updated at Nov. 15, 2024, 5:38 p.m.

Python

5 +0

620 +3

68 +0

GitHub
python-deequ by awslabs

Python API for Deequ

updated at Nov. 15, 2024, 5:51 p.m.

Jupyter Notebook

17 +0

730 +3

136 +1

GitHub
sparkling-water by h2oai

Sparkling Water provides H2O functionality inside Spark cluster

updated at Nov. 15, 2024, 8:11 p.m.

Scala

180 +0

968 +1

360 +0

GitHub