A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
created at Aug. 8, 2016, 1:36 p.m.
Convert HTTP Archive (HAR) -> Web Archive (WARC) format
created at March 16, 2017, 12:14 a.m.
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
created at Nov. 6, 2019, 3:09 a.m.
Java application to download WARCs from WASAPI
created at April 28, 2017, 9:15 p.m.
A client for the Archive-It And Webrecorder WASAPI Data Transfer API
created at Aug. 10, 2017, 5:25 p.m.
Web application for distributed compute analysis of Archive-It web archive collections.
created at April 28, 2022, 3:18 p.m.
🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
created at Sept. 20, 2022, 6:50 p.m.
A list of things related to software, literature, and other content for 🕣 Memento
created at Sept. 16, 2016, 1:33 a.m.
A robust web archive analytics toolkit
created at June 22, 2021, 9:03 a.m.
Various Jupyter notebooks about Common Crawl data
created at July 19, 2019, 11:38 a.m.