spark by dotnet

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

created at April 22, 2019, 6:55 p.m.

C#

91 +0

2,002 +1

309 +0

GitHub
spark-avro by databricks

Avro Data Source for Apache Spark

created at Sept. 30, 2014, 5:50 p.m.

Scala

70 -1

539 -1

310 +0

GitHub
mleap by combust

MLeap: Deploy ML Pipelines to Production

created at Aug. 23, 2016, 3:51 a.m.

Scala

69 +0

1,499 +1

313 +0

GitHub
livy by cloudera

Livy is an open source REST interface for interacting with Apache Spark from anywhere

created at Nov. 17, 2015, 6:55 a.m.

Scala

91 +0

1,005 +0

315 +0

GitHub
cromwell by broadinstitute

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

created at April 17, 2015, 7:39 p.m.

Scala

112 +0

962 +1

351 +1

GitHub
koalas by databricks

Koalas: pandas API on Apache Spark

created at Jan. 3, 2019, 9:46 p.m.

Python

318 +2

3,320 -1

355 +0

GitHub
spark-testing-base by holdenk

Base classes to use when writing tests with Spark

created at Jan. 30, 2015, 10:23 p.m.

Scala

78 +0

1,499 +1

358 +0

GitHub
sparkling-water by h2oai

Sparkling Water provides H2O functionality inside Spark cluster

created at Oct. 13, 2014, 11:06 p.m.

Scala

179 +0

952 +0

363 +0

GitHub
blaze by blaze

NumPy and Pandas interface to Big Data

created at Oct. 26, 2012, 2:25 p.m.

Python

195 +1

3,180 +1

388 -5

GitHub
oryx by OryxProject

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

created at July 25, 2014, 8:08 p.m.

Java

209 +0

1,788 -1

405 +0

GitHub
joblib by joblib

Computing with Python functions.

created at May 7, 2010, 6:48 a.m.

Python

63 +1

3,694 +9

408 +1

GitHub
spark-timeseries by sryza

A library for time series analysis on Apache Spark

created at March 11, 2015, 8:14 a.m.

Scala

134 +0

1,189 +0

427 +0

GitHub
sparkmagic by jupyter-incubator

Jupyter magics and kernels for working with remote Spark clusters

created at Sept. 21, 2015, 3:35 p.m.

Python

49 +0

1,288 +2

438 +0

GitHub
spark-csv by databricks

CSV Data Source for Apache Spark 1.x

created at Dec. 3, 2014, 12:56 a.m.

Scala

420 +2

1,049 +0

445 +0

GitHub
deequ by awslabs

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

created at Aug. 7, 2018, 8:55 p.m.

Scala

80 +0

3,145 +1

513 -1

GitHub
incubator-livy by apache

Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

created at June 25, 2017, 7 a.m.

Scala

57 +0

858 +1

594 +0

GitHub
sedona by apache

A cluster computing framework for processing large-scale geospatial data

created at April 24, 2015, 6:01 p.m.

Java

96 +1

1,791 +5

648 +2

GitHub
spark-notebook by spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

created at Sept. 5, 2014, 7:35 p.m.

JavaScript

190 +0

3,148 +0

654 +0

GitHub
spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

created at Sept. 24, 2017, 7:36 p.m.

Scala

100 +0

3,720 +4

704 +2

GitHub
SynapseML by Microsoft

Simple and Distributed Machine Learning

created at June 5, 2017, 8:23 a.m.

Scala

146 +0

4,991 +7

819 +1

GitHub