spark by apache

Apache Spark - A unified analytics engine for large-scale data processing

created at Feb. 25, 2014, 8 a.m.

Scala

2,019 -2

39,566 +52

28,274 +14

GitHub
predictionio by apache

PredictionIO, a machine learning server for developers and ML engineers.

created at Jan. 25, 2013, 7:42 p.m.

Scala

756 +0

12,548 +0

1,928 +0

GitHub
SynapseML by Microsoft

Simple and Distributed Machine Learning

created at June 5, 2017, 8:23 a.m.

Scala

145 -1

5,061 +2

832 +3

GitHub
aerosolve by airbnb

A machine learning package built for humans.

created at May 12, 2015, 7:11 p.m.

Scala

352 +0

4,795 -2

563 +0

GitHub
spark-nlp by JohnSnowLabs

State of the Art Natural Language Processing

created at Sept. 24, 2017, 7:36 p.m.

Scala

100 +1

3,858 +10

711 +0

GitHub
scalding by twitter

A Scala API for Cascading

created at Jan. 10, 2012, 4:22 p.m.

Scala

321 +0

3,499 +3

706 +0

GitHub
breeze by scalanlp

Breeze is/was a numerical processing library for Scala.

created at July 8, 2009, 11:22 p.m.

Scala

206 +0

3,449 +1

693 +0

GitHub
algebird by twitter

Abstract Algebra for Scala

created at Aug. 2, 2012, 5:24 p.m.

Scala

232 +0

2,291 +3

345 +0

GitHub
summingbird by twitter

Streaming MapReduce with Scalding and Storm

created at Sept. 25, 2012, 10:38 p.m.

Scala

291 +0

2,142 +1

267 +0

GitHub
adam by bigdatagenomics

ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.

created at Nov. 19, 2013, 11:47 p.m.

Scala

100 +0

1,001 +0

308 +0

GitHub
sparkling-water by h2oai

Sparkling Water provides H2O functionality inside Spark cluster

created at Oct. 13, 2014, 11:06 p.m.

Scala

180 +0

966 +1

360 +0

GitHub
tensorflow_scala by eaplatanios

TensorFlow API for the Scala Programming Language

created at April 1, 2017, 6 p.m.

Scala

65 +0

938 +0

95 +0

GitHub
BIDMach by BIDData

CPU and GPU-accelerated Machine Learning Library

created at Oct. 22, 2012, 3:17 a.m.

Scala

87 +0

916 +0

168 +0

GitHub
factorie by factorie

FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.

created at June 25, 2013, 1:21 a.m.

Scala

69 +0

552 +0

144 +0

GitHub
brushfire by stripe-archive

Distributed decision tree ensemble learning in Scala

created at Nov. 20, 2014, 6:47 p.m.

Scala

94 +0

392 +1

50 +0

GitHub
delight by datamechanics

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

created at Oct. 26, 2020, 1:56 p.m.

Scala

16 +0

342 +0

53 +1

GitHub
mist by Hydrospheredata

Serverless proxy for Spark cluster

created at Jan. 15, 2016, 7:22 a.m.

Scala

39 +0

326 +0

68 +0

GitHub
BIDMat by BIDData

A CPU and GPU-accelerated matrix library for data mining

created at Oct. 17, 2012, 11:19 p.m.

Scala

45 +0

265 +0

73 +0

GitHub
chalk by scalanlp

Chalk is a natural language processing library.

created at Dec. 2, 2012, 5:45 a.m.

Scala

29 +0

258 +0

49 +0

GitHub
DynaML by transcendent-ai-labs

Scala Library/REPL for Machine Learning Research

created at Feb. 16, 2015, 3:22 p.m.

Scala

18 +0

201 +0

51 +0

GitHub