deequ by awslabs

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

created at Aug. 7, 2018, 8:55 p.m.

Scala

81 +0

3,308 +1

539 +1

GitHub
iceberg by apache

Apache Iceberg

created at Nov. 19, 2018, 4:26 p.m.

Java

160 +0

6,464 +20

2,235 +10

GitHub
koalas by databricks

Koalas: pandas API on Apache Spark

created at Jan. 3, 2019, 9:46 p.m.

Python

326 +0

3,338 +3

358 +0

GitHub
chispa by MrPowers

PySpark test helper methods with beautiful error messages

created at March 19, 2019, 3:52 p.m.

Python

5 +0

620 +3

68 +0

GitHub
spark by dotnet

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

created at April 22, 2019, 6:55 p.m.

C#

93 +0

2,024 +1

315 +0

GitHub
delta by delta-io

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

created at April 22, 2019, 6:56 p.m.

Scala

217 +0

7,599 +18

1,707 +6

GitHub
joblib-spark by joblib

Joblib Apache Spark Backend

created at Nov. 20, 2019, 7:02 p.m.

Python

9 +0

242 +0

26 +0

GitHub
itachi by yaooqinn

A library that brings useful functions from various modern database management systems to Apache Spark

created at April 2, 2020, noon

Scala

5 +0

56 +0

4 +0

GitHub
kotlin-spark-api by Kotlin

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

created at June 1, 2020, 11:07 a.m.

Kotlin

20 -1

461 +0

35 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

created at Oct. 26, 2020, 1:56 p.m.

Scala

16 +0

342 +0

53 +0

GitHub
python-deequ by awslabs

Python API for Deequ

created at Nov. 9, 2020, 9:28 p.m.

Jupyter Notebook

17 +0

730 +3

136 +1

GitHub
spark-connect-go by apache

Apache Spark Connect Client for Golang

created at May 30, 2023, 10:09 a.m.

Go

25 +0

161 +2

32 +0

GitHub
spark-connect-rs by sjrusso8

Apache Spark Connect Client for Rust

created at Sept. 18, 2023, 1:32 p.m.

Rust

5 +0

90 +0

15 +0

GitHub
spark-connect-csharp by mdrakiburrahman

Apache Spark Connect Client for C#

created at April 14, 2024, 11:40 p.m.

C#

2 +0

1 +0

0 +0

GitHub