An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
created at Aug. 6, 2015, 7:42 p.m.
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
created at Jan. 29, 2016, 10:43 a.m.