An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.
updated at Aug. 10, 2024, 1:19 p.m.
Web application for distributed compute analysis of Archive-It web archive collections.
updated at Aug. 28, 2024, 7:31 p.m.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
updated at Aug. 29, 2024, 4:20 p.m.
Internet Archive's Sparkling Data Processing Library
updated at Sept. 12, 2024, 3:06 p.m.
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
updated at Sept. 13, 2024, 6:53 a.m.