iipc/awesome-web-archiving

freeze-dry by WebMemex

Snapshots a web page to get it as a static, self-contained HTML document.

created at July 13, 2017, 11:31 p.m.

TypeScript

11 +0

268 +0

18 +0

GitHub

zotero-memento by leonkt

Zotero extension that combats link rot by archiving webpages and journal articles.

created at Aug. 29, 2019, 5:51 p.m.

JavaScript

7 +0

275 +1

14 +0

GitHub

wail by machawk1

Web Archiving Integration Layer: One-Click User Instigated Preservation

created at March 20, 2013, 2:42 p.m.

Roff

14 +0

345 +0

32 +0

GitHub

warcio by webrecorder

Streaming WARC/ARC library for fast web archive IO

created at March 6, 2017, 6:17 p.m.

Python

22 +0

349 +0

55 +1

GitHub

warcprox by internetarchive

WARC writing MITM HTTP/S proxy

created at Oct. 25, 2013, 11:27 p.m.

Python

33 +0

366 +1

54 -1

GitHub

WarcDB by Florents-Tselai

WarcDB: Web crawl data as SQLite databases.

created at May 29, 2022, 11:09 a.m.

Python

10 +0

384 +0

11 +0

GitHub

archivenow by oduwsdl

A Tool To Push Web Resources Into Web Archives

created at Feb. 9, 2017, 12:29 p.m.

Python

21 +0

392 +0

41 +0

GitHub

waybackpy by akamhy

Wayback Machine API interface & a command-line tool

created at May 2, 2020, 9:19 a.m.

Python

10 +0

441 +3

33 +0

GitHub

auto-archiver by bellingcat

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

created at Jan. 15, 2021, 10:30 a.m.

Python

19 +0

478 +1

53 +0

GitHub

awesome-website-change-monitoring by edgi-govdata-archiving

A curated list of awesome tools for website diffing and change monitoring.

created at May 24, 2017, 5:33 a.m.

Unknown languages

31 +0

482 +0

31 +0

GitHub

wpull by ArchiveTeam

Wget-compatible web downloader and crawler.

created at Dec. 7, 2013, 1:03 p.m.

HTML

23 +0

538 +2

77 +0

GitHub

browsertrix-crawler by webrecorder

Run a high-fidelity browser-based crawler in a single Docker container

created at Nov. 2, 2020, 4:37 a.m.

TypeScript

24 +0

557 +5

72 +0

GitHub

ipwb by oduwsdl

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

created at March 4, 2016, 3:01 p.m.

Python

23 +0

591 +1

39 +0

GitHub

brozzler by internetarchive

brozzler - distributed browser-based web crawler

created at July 13, 2015, 11:48 p.m.

Python

36 +0

630 +0

93 +0

GitHub

wikiteam by WikiTeam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2023, WikiTeam has preserved more than 350,000 wikis.

created at June 25, 2014, 10:18 a.m.

Python

40 +0

696 +3

145 +1

GitHub

grab-site by ArchiveTeam

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

created at Feb. 5, 2015, 5:01 a.m.

Python

40 +0

1,273 +1

125 +0

GitHub

pywb by webrecorder

Core Python Web Archiving Toolkit for replay and recording of web archives

created at Dec. 9, 2013, 3:30 a.m.

JavaScript

61 +0

1,317 +4

206 -1

GitHub

twarc by DocNow

A command line tool (and Python library) for archiving Twitter JSON

created at Jan. 14, 2013, 2:35 p.m.

Python

35 +0

1,355 +0

254 +1

GitHub

internetarchive by jjjake

A Python and Command-Line Interface to Archive.org

created at Aug. 15, 2012, 7:18 p.m.

Python

51 +0

1,539 +6

211 +1

GitHub

An archiving tool with an IM-style interface that prioritizes privacy and accessibility, integrated with various archival services including Internet Archive, archive.today, IPFS, Telegraph, and file systems.

created at June 13, 2020, 10:08 a.m.

Go

9 +0

1,667 +5

61 +0

GitHub

freeze-dry by WebMemex

zotero-memento by leonkt

wail by machawk1

warcio by webrecorder

warcprox by internetarchive

WarcDB by Florents-Tselai

archivenow by oduwsdl

waybackpy by akamhy

auto-archiver by bellingcat

awesome-website-change-monitoring by edgi-govdata-archiving

wpull by ArchiveTeam

browsertrix-crawler by webrecorder

ipwb by oduwsdl

brozzler by internetarchive

wikiteam by WikiTeam

grab-site by ArchiveTeam

pywb by webrecorder

twarc by DocNow

internetarchive by jjjake

wayback by wabarc