spark-orientdb by orientechnologies

Apache Spark datasource for OrientDB

updated at Aug. 3, 2022, 7:26 a.m.

Scala

15 +0

19 +0

11 +0

GitHub
crossdata by Stratio

DISCONTINUED - Easy access to big things. Library for Apache Spark extending and improving its capabilities

updated at Feb. 12, 2023, 6:49 p.m.

Scala

101 +0

169 +0

51 +0

GitHub
silex by willb

something to help you spark

updated at June 8, 2023, 7:50 a.m.

Scala

3 +0

19 +0

0 +0

GitHub
pyspark-stubs by zero323

Apache (Py)Spark type annotations (stub files).

updated at Sept. 16, 2023, 6:30 p.m.

Python

6 +0

114 +0

37 +0

GitHub
spark-riak-connector by basho

The official Riak Spark Connector for Apache Spark with Riak TS and Riak KV

updated at Sept. 27, 2023, 10:28 a.m.

Scala

66 +0

60 +0

29 +0

GitHub
sparkly by Tubular

Helpers & syntactic sugar for PySpark.

updated at Dec. 22, 2023, 2:37 a.m.

Python

38 +0

60 +0

7 +0

GitHub
spark-avro by databricks

Avro Data Source for Apache Spark

updated at Jan. 6, 2024, 9:05 a.m.

Scala

71 +0

540 +0

310 +0

GitHub
spark-corenlp by databricks

Stanford CoreNLP wrapper for Apache Spark

updated at Jan. 21, 2024, 2:22 p.m.

Scala

52 +0

423 +0

120 +0

GitHub
aut by archivesunleashed

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

updated at Feb. 8, 2024, 11:01 a.m.

Scala

15 +0

131 +0

33 +0

GitHub
flambo by sorenmacbeth

A Clojure DSL for Apache Spark

updated at Feb. 12, 2024, 2:53 p.m.

Clojure

78 +0

608 +0

86 +0

GitHub
spark-gotchas by awesome-spark

Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks

updated at Feb. 20, 2024, 9:34 a.m.

Unknown languages

33 +0

354 +0

82 +0

GitHub
dist-keras by cerndb

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

updated at Feb. 21, 2024, 3:07 p.m.

Python

49 +0

624 +0

171 +0

GitHub
photon-ml by linkedin

A scalable machine learning library on Apache Spark

updated at Feb. 29, 2024, 4:48 a.m.

Terra

83 +0

790 +0

185 +0

GitHub
Clustering4Ever by Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

updated at Feb. 29, 2024, 4:50 a.m.

Scala

21 +0

128 +0

13 +0

GitHub
magellan by harsha2010

Geo Spatial Data Analytics on Spark

updated at March 15, 2024, 4:45 a.m.

Scala

65 +0

533 +0

150 +0

GitHub
dbscan-on-spark by irvingc

An implementation of DBSCAN runing on top of Apache Spark

updated at March 17, 2024, 12:31 a.m.

Scala

19 +0

182 +0

58 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

updated at March 28, 2024, 5:47 a.m.

Scala

16 +0

334 +0

50 +0

GitHub
flintrock by nchammas

A command-line tool for launching Apache Spark clusters.

updated at March 29, 2024, 3:35 p.m.

Python

33 +0

630 +0

114 +0

GitHub
itachi by yaooqinn

A library that brings useful functions from various modern database management systems to Apache Spark

updated at March 30, 2024, 5:36 p.m.

Scala

5 +0

53 +0

4 +0

GitHub
neo4j-mazerunner by neo4j-contrib

Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.

updated at March 31, 2024, 2:15 p.m.

Java

56 +0

377 +0

105 +0

GitHub