Convert HTTP Archive (HAR) -> Web Archive (WARC) format
updated at March 12, 2024, 12:41 p.m.
Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.
updated at March 26, 2024, 10:50 p.m.
WARC and ARC indexing and discovery tools.
updated at March 31, 2024, 2:13 p.m.
Internet Archive's Sparkling Data Processing Library
updated at April 4, 2024, 12:42 a.m.
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
updated at April 11, 2024, 9:06 a.m.
A dockerized, queued high fidelity web archiver based on Squidwarc
updated at April 23, 2024, 1:39 a.m.
Web application for distributed compute analysis of Archive-It web archive collections.
updated at April 24, 2024, 8:10 p.m.
A list of things related to software, literature, and other content for 🕣 Memento
updated at April 27, 2024, 8:55 a.m.
WarcDB: Web crawl data as SQLite databases.
updated at May 1, 2024, 4:03 p.m.
Various Jupyter notebooks about Common Crawl data
updated at May 1, 2024, 4:06 p.m.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
updated at May 1, 2024, 4:39 p.m.