brozzler - distributed browser-based web crawler
updated at May 4, 2024, 4:59 a.m.
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
updated at May 5, 2024, 4:14 a.m.
A client for the Archive-It And Webrecorder WASAPI Data Transfer API
updated at May 7, 2024, 3:08 a.m.
A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
updated at May 7, 2024, 6:08 a.m.
simple script to convert web resources to a single warc file
updated at May 8, 2024, 5:21 a.m.
Java application to download WARCs from WASAPI
updated at May 13, 2024, 4:05 p.m.
🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.
updated at May 13, 2024, 7:35 p.m.
A collection of tools for archiving and analysing the internet.
updated at May 14, 2024, 5:12 p.m.
A curated list of awesome tools for website diffing and change monitoring.
updated at May 14, 2024, 9:53 p.m.
A Tool To Push Web Resources Into Web Archives
updated at May 15, 2024, 1:18 a.m.
Snapshots a web page to get it as a static, self-contained HTML document.
updated at May 15, 2024, 4:16 a.m.
A commandline tool and Python library for archiving data from Facebook using the Graph API.
updated at May 17, 2024, 4:57 a.m.
Streaming WARC/ARC library for fast web archive IO
updated at May 18, 2024, 11:21 a.m.