playback by wabarc

Playback webpages from Wayback Machine

created at April 8, 2021, 2:21 p.m.

Go

4 +0

6 +0

1 +0

GitHub
browsertrix-crawler by webrecorder

Run a high-fidelity browser-based crawler in a single Docker container

created at Nov. 2, 2020, 4:37 a.m.

TypeScript

25 +1

572 +6

71 +0

GitHub
warcprox by internetarchive

WARC writing MITM HTTP/S proxy

created at Oct. 25, 2013, 11:27 p.m.

Python

33 +0

367 +1

54 +0

GitHub
Zotero-Robust-Links-Extension by lanl

Create Robust Links from within Zotero

created at June 28, 2021, 9:38 p.m.

JavaScript

3 +0

17 +0

2 +0

GitHub
zotero-memento by leonkt

Zotero extension that combats link rot by archiving webpages and journal articles.

created at Aug. 29, 2019, 5:51 p.m.

JavaScript

7 +0

278 +3

15 +0

GitHub
gowarcserver by nlnwa

None

created at Jan. 15, 2021, 10:42 a.m.

Go

7 +0

12 +0

1 +0

GitHub
wayback by wabarc

An archiving tool with an IM-style interface that prioritizes privacy and accessibility, integrated with various archival services including Internet Archive, archive.today, IPFS, Telegraph, and file systems.

created at June 13, 2020, 10:08 a.m.

Go

9 +0

1,685 +6

63 +0

GitHub
warc2html by iipc

Converts WARC files to static HTML

created at Nov. 8, 2021, 4:09 a.m.

Java

10 +0

38 +0

3 +0

GitHub
chatnoir-resiliparse by chatnoir-eu

A robust web archive analytics toolkit

created at June 22, 2021, 9:03 a.m.

Cython

9 +0

48 +2

8 +0

GitHub
unwarcit by emmadickson

None

created at Dec. 11, 2021, 7:19 p.m.

Python

5 +0

6 +0

0 +0

GitHub
waybackpy by akamhy

Wayback Machine API interface & a command-line tool

created at May 2, 2020, 9:19 a.m.

Python

10 +0

445 +1

32 +0

GitHub
web-archiving-course by vphill

Web Archiving Course

created at Feb. 22, 2022, 2:33 a.m.

Unknown languages

1 +0

19 +0

6 +0

GitHub
Sparkling by internetarchive

Internet Archive's Sparkling Data Processing Library

created at April 28, 2022, 2:28 p.m.

Scala

17 +0

10 +0

2 +0

GitHub
arch by internetarchive

Web application for distributed compute analysis of Archive-It web archive collections.

created at April 28, 2022, 3:18 p.m.

Scala

19 +0

13 +0

4 +0

GitHub
auto-archiver by bellingcat

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

created at Jan. 15, 2021, 10:30 a.m.

Python

20 +0

483 +1

53 +0

GitHub
warcrefs by arcalex

Web archive deduplication tools

created at April 22, 2014, 8:02 a.m.

Java

5 +0

6 +0

1 +0

GitHub
scoop by harvard-lil

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.

created at Sept. 20, 2022, 6:50 p.m.

JavaScript

7 +0

102 +0

5 +0

GitHub
crau by turicas

Easy-to-use Web archiver

created at Oct. 26, 2019, 7:21 p.m.

Python

4 +0

54 +0

9 +0

GitHub
gogetcrawl by karust

Extract web archive data using Wayback Machine and Common Crawl

created at June 14, 2019, 7:02 p.m.

Go

5 +0

135 +0

15 +0

GitHub
cc-notebooks by commoncrawl

Various Jupyter notebooks about Common Crawl data

created at July 19, 2019, 11:38 a.m.

Jupyter Notebook

16 +0

40 +0

8 +0

GitHub