crossdata by Stratio

DISCONTINUED - Easy access to big things. Library for Apache Spark extending and improving its capabilities

updated at Feb. 12, 2023, 6:49 p.m.

Scala

101 +0

169 +0

51 +0

GitHub
sparkly by Tubular

Helpers & syntactic sugar for PySpark.

updated at Dec. 22, 2023, 2:37 a.m.

Python

41 +0

60 +0

9 +0

GitHub
jpmml-evaluator-spark by jpmml

PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)

updated at March 31, 2024, 2:17 p.m.

Java

14 +0

94 +0

43 +0

GitHub
sparkle by tweag

Haskell on Apache Spark.

updated at Aug. 28, 2024, 6:08 p.m.

Haskell

61 +0

447 +0

30 +0

GitHub
aut by archivesunleashed

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

updated at Aug. 29, 2024, 4:20 p.m.

Scala

15 +0

137 +0

33 +0

GitHub
docker-spark by sequenceiq

None

updated at Aug. 30, 2024, 12:17 p.m.

Shell

65 +0

765 +0

282 +0

GitHub
first-edition by spark-in-action

The book's repo

updated at Sept. 9, 2024, 8:21 a.m.

Scala

42 +0

273 +0

188 +0

GitHub
spark-connect-csharp by mdrakiburrahman

Apache Spark Connect Client for C#

updated at Sept. 30, 2024, 3 p.m.

C#

2 +0

1 +0

0 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

updated at Oct. 8, 2024, 6:49 a.m.

Scala

16 +0

342 +0

53 +0

GitHub
flintrock by nchammas

A command-line tool for launching Apache Spark clusters.

updated at Oct. 12, 2024, 10:52 p.m.

Python

31 +0

638 +0

116 +0

GitHub
itachi by yaooqinn

A library that brings useful functions from various modern database management systems to Apache Spark

updated at Oct. 14, 2024, 9:49 a.m.

Scala

5 +0

56 +0

4 +0

GitHub
oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

updated at Oct. 19, 2024, 8:54 a.m.

Java

208 +0

1,788 +0

405 +0

GitHub
joblib-spark by joblib

Joblib Apache Spark Backend

updated at Oct. 28, 2024, 8:33 p.m.

Python

9 +0

242 +0

26 +0

GitHub
spark-xml by databricks

XML data source for Spark SQL and DataFrames

updated at Oct. 30, 2024, 7:02 a.m.

Scala

39 +0

505 +0

226 -1

GitHub
adam by bigdatagenomics

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

updated at Nov. 4, 2024, 1:06 a.m.

Scala

100 +0

1,003 +0

308 +0

GitHub
spark-connect-rs by sjrusso8

Apache Spark Connect Client for Rust

updated at Nov. 4, 2024, 1:07 p.m.

Rust

5 +0

90 +0

15 +0

GitHub
mongo-spark by mongodb

The MongoDB Spark Connector

updated at Nov. 5, 2024, 8:45 a.m.

Java

79 +0

712 +0

309 +0

GitHub
aas by sryza

Code to accompany Advanced Analytics with Spark from O'Reilly Media

updated at Nov. 5, 2024, 9:15 a.m.

Scala

146 +0

1,520 +0

1,031 +0

GitHub
spark-cassandra-connector by datastax

DataStax Connector for Apache Spark to Apache Cassandra

updated at Nov. 6, 2024, 1:04 a.m.

Scala

163 +0

1,943 +0

918 -1

GitHub
livy by cloudera

Livy is an open source REST interface for interacting with Apache Spark from anywhere

updated at Nov. 7, 2024, 8:17 a.m.

Scala

91 +0

1,009 +0

314 +0

GitHub