The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
updated at May 1, 2024, 4:39 p.m.
Interactive and Reactive Data Science using Scala and Spark.
updated at May 1, 2024, 3:08 p.m.
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
updated at April 30, 2024, 9:48 p.m.
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
updated at April 30, 2024, 6:38 p.m.
A library for time series analysis on Apache Spark
updated at April 24, 2024, 9:39 a.m.
(Deprecated) Scikit-learn integration package for Apache Spark
updated at April 17, 2024, 4:13 a.m.
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
updated at April 6, 2024, 7:28 p.m.
PMML evaluator library for the Apache Spark cluster computing system (http://spark.apache.org/)
updated at March 31, 2024, 2:17 p.m.
Mazerunner extends a Neo4j graph database to run scheduled big data graph compute algorithms at scale with HDFS and Apache Spark.
updated at March 31, 2024, 2:15 p.m.
An implementation of DBSCAN runing on top of Apache Spark
updated at March 17, 2024, 12:31 a.m.
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
updated at Feb. 29, 2024, 4:50 a.m.