An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
created at Aug. 6, 2015, 7:42 p.m.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
created at July 6, 2017, 10:13 a.m.
Web application for distributed compute analysis of Archive-It web archive collections.
created at April 28, 2022, 3:18 p.m.
Internet Archive's Sparkling Data Processing Library
created at April 28, 2022, 2:28 p.m.
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.
created at Nov. 29, 2019, 2:52 p.m.