spark-orientdb by orientechnologies

Apache Spark datasource for OrientDB

updated at Aug. 3, 2022, 7:26 a.m.

Scala

15 +0

19 +0

11 +0

GitHub
crossdata by Stratio

DISCONTINUED - Easy access to big things. Library for Apache Spark extending and improving its capabilities

updated at Feb. 12, 2023, 6:49 p.m.

Scala

101 +0

169 +0

51 +0

GitHub
silex by willb

something to help you spark

updated at June 8, 2023, 7:50 a.m.

Scala

3 +0

19 +0

0 +0

GitHub
spark-riak-connector by basho

The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV

updated at Sept. 27, 2023, 10:28 a.m.

Scala

66 +0

60 +0

29 +0

GitHub
spark-avro by databricks

Avro Data Source for Apache Spark

updated at Jan. 6, 2024, 9:05 a.m.

Scala

71 +0

540 +0

310 +0

GitHub
spark-corenlp by databricks

Stanford CoreNLP wrapper for Apache Spark

updated at Jan. 21, 2024, 2:22 p.m.

Scala

52 +0

423 +0

120 +0

GitHub
Clustering4Ever by Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

updated at Feb. 29, 2024, 4:50 a.m.

Scala

21 +0

128 +0

13 +0

GitHub
dbscan-on-spark by irvingc

An implementation of DBSCAN runing on top of Apache Spark

updated at March 17, 2024, 12:31 a.m.

Scala

19 +0

182 +0

58 +0

GitHub
mist by Hydrospheredata

Serverless proxy for Spark cluster

updated at April 2, 2024, 5:42 p.m.

Scala

41 +0

326 +0

67 +0

GitHub
spark-fast-tests by MrPowers

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

updated at April 6, 2024, 7:28 p.m.

Scala

15 +0

418 +0

73 +0

GitHub
spark-timeseries by sryza

A library for time series analysis on Apache Spark

updated at April 24, 2024, 9:39 a.m.

Scala

134 +0

1,189 +0

427 +0

GitHub
incubator-toree by apache

Mirror of Apache Toree (Incubating)

updated at April 28, 2024, 11:16 p.m.

Scala

48 -1

731 +0

224 +0

GitHub
itachi by yaooqinn

A library that brings useful functions from various modern database management systems to Apache Spark

updated at April 29, 2024, 3:42 p.m.

Scala

5 +0

54 +0

4 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

updated at April 30, 2024, 9:48 p.m.

Scala

16 +0

335 +0

51 +1

GitHub
aut by archivesunleashed

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

updated at May 1, 2024, 4:39 p.m.

Scala

15 +0

133 +0

33 +0

GitHub
first-edition by spark-in-action

The book's repo

updated at May 2, 2024, 11:57 a.m.

Scala

42 +0

272 +0

191 +0

GitHub
aas by sryza

Code to accompany Advanced Analytics with Spark from O'Reilly Media

updated at May 2, 2024, 4:43 p.m.

Scala

148 +0

1,514 +0

1,032 +0

GitHub
livy by cloudera

Livy is an open source REST interface for interacting with Apache Spark from anywhere

updated at May 4, 2024, 5:57 p.m.

Scala

91 +0

1,005 +0

316 +0

GitHub
spark-csv by databricks

CSV Data Source for Apache Spark 1.x

updated at May 7, 2024, 12:54 p.m.

Scala

418 +0

1,049 +1

446 +0

GitHub
magellan by harsha2010

Geo Spatial Data Analytics on Spark

updated at May 8, 2024, 1:18 p.m.

Scala

65 +0

534 +1

150 +0

GitHub