spark-cassandra-connector by datastax

DataStax Connector for Apache Spark to Apache Cassandra

updated at May 24, 2024, 12:26 p.m.

Scala

162 +0

1,932 +1

913 +0

GitHub
graphframes by graphframes

None

updated at May 24, 2024, 9:11 a.m.

Scala

58 +0

972 +1

232 +0

GitHub
incubator-toree by apache

Mirror of Apache Toree (Incubating)

updated at May 24, 2024, 8:39 a.m.

Scala

48 +0

733 +1

224 +0

GitHub
Mobius by Microsoft

C# and F# language binding and extensions to Apache Spark

updated at May 23, 2024, 7:21 p.m.

C#

145 +0

940 +1

212 +0

GitHub
spark-avro by databricks

Avro Data Source for Apache Spark

updated at May 23, 2024, 12:39 p.m.

Scala

70 -1

539 -1

310 +0

GitHub
spark by dotnet

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

updated at May 23, 2024, 6:11 a.m.

C#

91 +0

2,002 +1

309 +0

GitHub
blaze by blaze

NumPy and Pandas interface to Big Data

updated at May 23, 2024, 2:20 a.m.

Python

195 +1

3,180 +1

388 -5

GitHub
spark-xml by databricks

XML data source for Spark SQL and DataFrames

updated at May 23, 2024, 1:15 a.m.

Scala

40 +0

487 -1

224 +0

GitHub
quinn by MrPowers

pyspark methods to enhance developer productivity 📣 👯 🎉

updated at May 22, 2024, 9:33 p.m.

Python

19 +0

583 +1

92 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

updated at May 22, 2024, 4:13 p.m.

Scala

16 +0

337 +1

51 +0

GitHub
mleap by combust

MLeap: Deploy ML Pipelines to Production

updated at May 21, 2024, 10:11 p.m.

Scala

69 +0

1,499 +1

313 +0

GitHub
oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

updated at May 19, 2024, 10:14 a.m.

Java

209 +0

1,788 -1

405 +0

GitHub
flint by twosigma

A Time Series Library for Apache Spark

updated at May 19, 2024, 9:51 a.m.

Scala

77 +0

992 +0

184 +0

GitHub
mongo-spark by mongodb

The MongoDB Spark Connector

updated at May 18, 2024, 8:19 a.m.

Java

79 +0

703 +0

307 +0

GitHub
dbscan-on-spark by irvingc

An implementation of DBSCAN runing on top of Apache Spark

updated at May 18, 2024, 7:22 a.m.

Scala

19 +0

183 +0

58 +0

GitHub
spark-fast-tests by MrPowers

Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)

updated at May 17, 2024, 4:50 p.m.

Scala

15 +0

421 +0

73 +0

GitHub
flambo by sorenmacbeth

A Clojure DSL for Apache Spark

updated at May 16, 2024, 8:28 p.m.

Clojure

78 +0

609 +0

86 +0

GitHub
flintrock by nchammas

A command-line tool for launching Apache Spark clusters.

updated at May 16, 2024, 1:26 p.m.

Python

33 +0

632 +0

115 +0

GitHub
kotlin-spark-api by Kotlin

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

updated at May 14, 2024, 10:01 p.m.

Kotlin

19 +0

443 +0

34 +0

GitHub
neo4j-mazerunner by neo4j-contrib

Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.

updated at May 14, 2024, 7:19 a.m.

Java

56 +0

378 +0

105 +0

GitHub