The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
updated at Aug. 29, 2024, 4:20 p.m.
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
updated at Oct. 8, 2024, 6:49 a.m.
XML data source for Spark SQL and DataFrames
updated at Oct. 30, 2024, 7:02 a.m.
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
updated at Nov. 4, 2024, 1:06 a.m.
DataStax Connector for Apache Spark to Apache Cassandra
updated at Nov. 6, 2024, 1:04 a.m.
Essential Spark extensions and helper methods ✨😲
updated at Nov. 8, 2024, 2:27 a.m.
Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)
updated at Nov. 8, 2024, 12:32 p.m.
Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
updated at Nov. 14, 2024, 9:10 a.m.
Base classes to use when writing tests with Spark
updated at Nov. 15, 2024, 9:20 a.m.
Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
updated at Nov. 15, 2024, 9:25 a.m.
State of the Art Natural Language Processing
updated at Nov. 15, 2024, 2:29 p.m.