A Tool To Push Web Resources Into Web Archives
created at Feb. 9, 2017, 12:29 p.m.
WarcDB: Web crawl data as SQLite databases.
created at May 29, 2022, 11:09 a.m.
Streaming WARC/ARC library for fast web archive IO
created at March 6, 2017, 6:17 p.m.
Zotero extension that combats link rot by archiving webpages and journal articles.
created at Aug. 29, 2019, 5:51 p.m.
Snapshots a web page to get it as a static, self-contained HTML document.
created at July 13, 2017, 11:31 p.m.
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
created at June 28, 2021, 10:46 p.m.
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
created at March 22, 2013, 8:52 p.m.
Extract web archive data using Wayback Machine and Common Crawl
created at June 14, 2019, 7:02 p.m.
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
created at Aug. 6, 2015, 7:42 p.m.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
created at July 6, 2017, 10:13 a.m.
🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
created at Sept. 20, 2022, 6:50 p.m.
WARC and ARC indexing and discovery tools.
created at Dec. 20, 2012, 12:17 p.m.
A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
created at Feb. 8, 2017, 9:33 a.m.