crossdata by Stratio

DISCONTINUED - Easy access to big things. Library for Apache Spark extending and improving its capabilities

updated at Feb. 12, 2023, 6:49 p.m.

Scala

101 +0

169 +0

51 +0

GitHub
aut by archivesunleashed

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

updated at Aug. 29, 2024, 4:20 p.m.

Scala

15 +0

137 +0

33 +0

GitHub
first-edition by spark-in-action

The book's repo

updated at Sept. 9, 2024, 8:21 a.m.

Scala

42 +0

273 +0

188 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

updated at Oct. 8, 2024, 6:49 a.m.

Scala

16 +0

342 +0

53 +0

GitHub
itachi by yaooqinn

A library that brings useful functions from various modern database management systems to Apache Spark

updated at Oct. 14, 2024, 9:49 a.m.

Scala

5 +0

56 +0

4 +0

GitHub
spark-xml by databricks

XML data source for Spark SQL and DataFrames

updated at Oct. 30, 2024, 7:02 a.m.

Scala

39 +0

505 +0

226 -1

GitHub
adam by bigdatagenomics

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

updated at Nov. 4, 2024, 1:06 a.m.

Scala

100 +0

1,003 +0

308 +0

GitHub
aas by sryza

Code to accompany Advanced Analytics with Spark from O'Reilly Media

updated at Nov. 5, 2024, 9:15 a.m.

Scala

146 +0

1,520 +0

1,031 +0

GitHub
spark-cassandra-connector by datastax

DataStax Connector for Apache Spark to Apache Cassandra

updated at Nov. 6, 2024, 1:04 a.m.

Scala

163 +0

1,943 +0

918 -1

GitHub
livy by cloudera

Livy is an open source REST interface for interacting with Apache Spark from anywhere

updated at Nov. 7, 2024, 8:17 a.m.

Scala

91 +0

1,009 +0

314 +0

GitHub
spark-daria by mrpowers-io

Essential Spark extensions and helper methods ✨😲

updated at Nov. 8, 2024, 2:27 a.m.

Scala

34 +0

754 +0

152 +0

GitHub
spark-fast-tests by mrpowers-io

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

updated at Nov. 8, 2024, 12:32 p.m.

Scala

16 +0

436 +0

77 +0

GitHub
incubator-toree by apache

Mirror of Apache Toree (Incubating)

updated at Nov. 8, 2024, 5:15 p.m.

Scala

48 +0

740 +0

225 +0

GitHub
mleap by combust

MLeap: Deploy ML Pipelines to Production

updated at Nov. 12, 2024, 3:16 p.m.

Scala

66 +0

1,504 +0

313 +1

GitHub
neo4j-spark-connector by neo4j

Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs

updated at Nov. 14, 2024, 9:10 a.m.

Scala

34 +0

313 +0

112 +0

GitHub
SynapseML by Microsoft

Simple and Distributed Machine Learning

updated at Nov. 14, 2024, 10:15 a.m.

Scala

146 +0

5,068 +3

831 -1

GitHub
kyuubi by apache

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

updated at Nov. 15, 2024, 8:22 a.m.

Scala

62 +0

2,105 +7

914 -2

GitHub
spark-testing-base by holdenk

Base classes to use when writing tests with Spark

updated at Nov. 15, 2024, 9:20 a.m.

Scala

77 +0

1,523 +1

358 +0

GitHub
cromwell by broadinstitute

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

updated at Nov. 15, 2024, 9:25 a.m.

Scala

110 +0

997 +1

361 +1

GitHub
spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

updated at Nov. 15, 2024, 2:29 p.m.

Scala

100 +0

3,871 +6

712 +2

GitHub