🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
updated at Sept. 22, 2024, 12:52 p.m.
Powerful yet simple to use screenshot software :desktop_computer: :camera_flash:
updated at Sept. 22, 2024, 6:27 a.m.
Web Extension for saving a faithful copy of a complete web page in a single HTML file
updated at Sept. 22, 2024, 5:49 a.m.
Zotero extension that combats link rot by archiving webpages and journal articles.
updated at Sept. 22, 2024, 1:40 a.m.
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
updated at Sept. 21, 2024, 9:57 p.m.
Chrome Debugging Protocol interface for Node.js
updated at Sept. 21, 2024, 6:04 p.m.
A Python and Command-Line Interface to Archive.org
updated at Sept. 21, 2024, 10:32 a.m.
Run a high-fidelity browser-based web archiving crawler in a single Docker container
updated at Sept. 21, 2024, 6:18 a.m.
fake keyboard/mouse input, window management, and more
updated at Sept. 20, 2024, 8:59 p.m.
Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!
updated at Sept. 20, 2024, 6:53 p.m.
A robust web archive analytics toolkit
updated at Sept. 20, 2024, 1:34 p.m.
A Tool To Push Web Resources Into Web Archives
updated at Sept. 20, 2024, 7:58 a.m.
Core Python Web Archiving Toolkit for replay and recording of web archives
updated at Sept. 20, 2024, 3:24 a.m.
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
updated at Sept. 20, 2024, 3:24 a.m.
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
updated at Sept. 19, 2024, 11:56 a.m.