spark-csv by databricks

CSV Data Source for Apache Spark 1.x

updated at June 5, 2024, 3:16 p.m.

Scala

421 +0

1,051 +1

445 +0

GitHub
joblib-spark by joblib

Joblib Apache Spark Backend

updated at June 5, 2024, 12:48 p.m.

Python

9 +0

239 +0

26 +0

GitHub
incubator-toree by apache

Mirror of Apache Toree (Incubating)

updated at June 5, 2024, 11:50 a.m.

Scala

48 +0

736 +1

224 +0

GitHub
oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

updated at June 5, 2024, 8:40 a.m.

Java

209 +0

1,789 +1

405 +0

GitHub
mist by Hydrospheredata

Serverless proxy for Spark cluster

updated at June 5, 2024, 8:37 a.m.

Scala

40 +0

327 +1

68 +0

GitHub
spark-testing-base by holdenk

Base classes to use when writing tests with Spark

updated at June 4, 2024, 6:53 p.m.

Scala

78 +0

1,501 +1

358 +0

GitHub
kotlin-spark-api by Kotlin

This projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x

updated at June 4, 2024, 11:23 a.m.

Kotlin

20 +0

446 +0

34 +0

GitHub
sparkling-water by h2oai

Sparkling Water provides H2O functionality inside Spark cluster

updated at June 4, 2024, 1:38 a.m.

Scala

179 +0

953 +1

363 +0

GitHub
graphframes by graphframes

None

updated at June 3, 2024, 2:36 p.m.

Scala

58 +0

975 +1

232 +0

GitHub
spark-xml by databricks

XML data source for Spark SQL and DataFrames

updated at June 2, 2024, 10:47 p.m.

Scala

40 +0

488 +1

224 +0

GitHub
incubator-livy by apache

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

updated at June 1, 2024, 5:45 p.m.

Scala

57 +0

860 +0

595 +1

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

updated at May 31, 2024, 2:15 p.m.

Scala

16 +0

339 +0

52 +1

GitHub
aas by sryza

Code to accompany Advanced Analytics with Spark from O'Reilly Media

updated at May 31, 2024, 1:13 p.m.

Scala

148 +0

1,515 +0

1,032 +0

GitHub
mleap by combust

MLeap: Deploy ML Pipelines to Production

updated at May 31, 2024, 8:43 a.m.

Scala

69 +0

1,498 +0

312 +0

GitHub
koalas by databricks

Koalas: pandas API on Apache Spark

updated at May 30, 2024, 5:09 p.m.

Python

319 +0

3,319 +0

354 +0

GitHub
spark-daria by MrPowers

Essential Spark extensions and helper methods ✨😲

updated at May 30, 2024, 2:58 p.m.

Scala

33 +0

743 +0

149 +0

GitHub
sparkle by tweag

Haskell on Apache Spark.

updated at May 28, 2024, 10:45 p.m.

Haskell

60 +0

445 +0

30 +0

GitHub
spark-cassandra-connector by datastax

DataStax Connector for Apache Spark to Apache Cassandra

updated at May 28, 2024, 4:10 p.m.

Scala

163 +1

1,933 +0

913 +0

GitHub
spark-timeseries by sryza

A library for time series analysis on Apache Spark

updated at May 28, 2024, 3:01 a.m.

Scala

134 +0

1,190 +0

427 +0

GitHub
spark-notebook by spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

updated at May 28, 2024, 1:47 a.m.

JavaScript

190 +0

3,148 +0

654 +0

GitHub