spark-testing-base by holdenk

Base classes to use when writing tests with Spark

updated at Nov. 15, 2024, 9:20 a.m.

Scala

77 +0

1,523 +1

358 +0

GitHub
kyuubi by apache

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

updated at Nov. 15, 2024, 8:22 a.m.

Scala

62 +0

2,105 +7

914 -2

GitHub
dplyr by tidyverse

dplyr: A grammar of data manipulation

updated at Nov. 14, 2024, 7:54 p.m.

R

244 +0

4,781 +6

2,122 +0

GitHub
SynapseML by Microsoft

Simple and Distributed Machine Learning

updated at Nov. 14, 2024, 10:15 a.m.

Scala

146 +0

5,068 +3

831 -1

GitHub
neo4j-spark-connector by neo4j

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

updated at Nov. 14, 2024, 9:10 a.m.

Scala

34 +0

313 +0

112 +0

GitHub
sparkmagic by jupyter-incubator

Jupyter magics and kernels for working with remote Spark clusters

updated at Nov. 14, 2024, 5:19 a.m.

Python

48 +0

1,328 -1

447 +1

GitHub
mleap by combust

MLeap: Deploy ML Pipelines to Production

updated at Nov. 12, 2024, 3:16 p.m.

Scala

66 +0

1,504 +0

313 +1

GitHub
sparklyr by sparklyr

R interface for Apache Spark

updated at Nov. 12, 2024, 2:27 a.m.

R

73 +0

957 +1

310 +0

GitHub
quinn by mrpowers-io

pyspark methods to enhance developer productivity 📣 👯 🎉

updated at Nov. 11, 2024, 9:29 a.m.

Python

20 +0

642 +1

99 +0

GitHub
kotlin-spark-api by Kotlin

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

updated at Nov. 10, 2024, 6:49 a.m.

Kotlin

20 -1

461 +0

35 +0

GitHub
incubator-toree by apache

Mirror of Apache Toree (Incubating)

updated at Nov. 8, 2024, 5:15 p.m.

Scala

48 +0

740 +0

225 +0

GitHub
spark-fast-tests by mrpowers-io

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

updated at Nov. 8, 2024, 12:32 p.m.

Scala

16 +0

436 +0

77 +0

GitHub
photon-ml by linkedin

A scalable machine learning library on Apache Spark

updated at Nov. 8, 2024, 6:04 a.m.

Terra

82 +0

792 +0

185 +0

GitHub
spark-daria by mrpowers-io

Essential Spark extensions and helper methods ✨😲

updated at Nov. 8, 2024, 2:27 a.m.

Scala

34 +0

754 +0

152 +0

GitHub
livy by cloudera

Livy is an open source REST interface for interacting with Apache Spark from anywhere

updated at Nov. 7, 2024, 8:17 a.m.

Scala

91 +0

1,009 +0

314 +0

GitHub
spark-cassandra-connector by datastax

DataStax Connector for Apache Spark to Apache Cassandra

updated at Nov. 6, 2024, 1:04 a.m.

Scala

163 +0

1,943 +0

918 -1

GitHub
aas by sryza

Code to accompany Advanced Analytics with Spark from O'Reilly Media

updated at Nov. 5, 2024, 9:15 a.m.

Scala

146 +0

1,520 +0

1,031 +0

GitHub
mongo-spark by mongodb

The MongoDB Spark Connector

updated at Nov. 5, 2024, 8:45 a.m.

Java

79 +0

712 +0

309 +0

GitHub
spark-connect-rs by sjrusso8

Apache Spark Connect Client for Rust

updated at Nov. 4, 2024, 1:07 p.m.

Rust

5 +0

90 +0

15 +0

GitHub
adam by bigdatagenomics

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

updated at Nov. 4, 2024, 1:06 a.m.

Scala

100 +0

1,003 +0

308 +0

GitHub