dist-keras by cerndb

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

updated at May 10, 2024, 5:12 a.m.

Python

49 +0

623 -1

170 +0

GitHub
spark-xml by databricks

XML data source for Spark SQL and DataFrames

updated at May 10, 2024, 3:38 a.m.

Scala

40 +0

489 +1

223 +0

GitHub
incubator-livy by apache

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

updated at May 10, 2024, 3:34 a.m.

Scala

57 +0

857 +1

594 +0

GitHub
sparklyr by sparklyr

R interface for Apache Spark

updated at May 9, 2024, 5:06 p.m.

R

73 +0

926 +2

302 +0

GitHub
spark-daria by MrPowers

Essential Spark extensions and helper methods ✨😲

updated at May 9, 2024, 4:48 p.m.

Scala

33 +0

743 +1

148 +0

GitHub
flint by twosigma

A Time Series Library for Apache Spark

updated at May 9, 2024, 3:30 a.m.

Scala

77 +0

992 +0

184 +0

GitHub
spark-cassandra-connector by datastax

DataStax Connector for Apache Spark to Apache Cassandra

updated at May 9, 2024, 3:23 a.m.

Scala

162 +0

1,931 +1

913 -1

GitHub
spark-jobserver by spark-jobserver

REST job server for Apache Spark

updated at May 9, 2024, 3:16 a.m.

Scala

221 +0

2,841 +1

1,004 +0

GitHub
sparkling-water by h2oai

Sparkling Water provides H2O functionality inside Spark cluster

updated at May 8, 2024, 4:42 p.m.

Scala

179 +1

951 -1

363 +0

GitHub
magellan by harsha2010

Geo Spatial Data Analytics on Spark

updated at May 8, 2024, 1:18 p.m.

Scala

65 +0

534 +1

150 +0

GitHub
joblib-spark by joblib

Joblib Apache Spark Backend

updated at May 8, 2024, 11:19 a.m.

Python

9 +0

238 +1

26 +0

GitHub
quinn by MrPowers

pyspark methods to enhance developer productivity 📣 👯 🎉

updated at May 7, 2024, 3:46 p.m.

Python

19 +0

581 +1

91 +0

GitHub
spark-csv by databricks

CSV Data Source for Apache Spark 1.x

updated at May 7, 2024, 12:54 p.m.

Scala

418 +0

1,049 +1

446 +0

GitHub
oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

updated at May 6, 2024, 10:14 a.m.

Java

209 +0

1,789 +1

405 +0

GitHub
kotlin-spark-api by Kotlin

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

updated at May 5, 2024, 10:58 a.m.

Kotlin

18 +0

441 +0

34 +0

GitHub
blaze by blaze

NumPy and Pandas interface to Big Data

updated at May 5, 2024, 3:19 a.m.

Python

195 +0

3,179 -1

393 +0

GitHub
livy by cloudera

Livy is an open source REST interface for interacting with Apache Spark from anywhere

updated at May 4, 2024, 5:57 p.m.

Scala

91 +0

1,005 +0

316 +0

GitHub
flintrock by nchammas

A command-line tool for launching Apache Spark clusters.

updated at May 4, 2024, 11:07 a.m.

Python

33 +0

631 +0

114 +0

GitHub
sparkmagic by jupyter-incubator

Jupyter magics and kernels for working with remote Spark clusters

updated at May 3, 2024, 11:02 p.m.

Python

49 +0

1,287 +0

438 +0

GitHub
aas by sryza

Code to accompany Advanced Analytics with Spark from O'Reilly Media

updated at May 2, 2024, 4:43 p.m.

Scala

148 +0

1,514 +0

1,032 +0

GitHub