SingleFile by gildas-lormeau

Web Extension for saving a faithful copy of a complete web page in a single HTML file

created at Sept. 12, 2010, 11:50 p.m.

JavaScript

114 +0

13,911 +141

921 +6

GitHub
xdotool by jordansissel

fake keyboard/mouse input, window management, and more

created at Feb. 16, 2011, 2:41 a.m.

C

56 +0

3,047 +10

311 +1

GitHub
internetarchive by jjjake

A Python and Command-Line Interface to Archive.org

created at Aug. 15, 2012, 7:18 p.m.

Python

51 +0

1,526 +8

209 +0

GitHub
wget-lua by alard

Wget with Lua extension

created at Aug. 21, 2012, 8:39 p.m.

C

4 +0

22 +0

9 +0

GitHub
webarchive-discovery by ukwa

WARC and ARC indexing and discovery tools.

created at Dec. 20, 2012, 12:17 p.m.

Java

24 +0

113 +0

24 +0

GitHub
twarc by DocNow

A command line tool (and Python library) for archiving Twitter JSON

created at Jan. 14, 2013, 2:35 p.m.

Python

35 +0

1,354 +0

253 +0

GitHub
wail by machawk1

whale2 Web Archiving Integration Layer: One-Click User Instigated Preservation

created at March 20, 2013, 2:42 p.m.

Roff

14 +1

344 +1

32 +0

GitHub
warctools by internetarchive

Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)

created at March 22, 2013, 8:52 p.m.

Python

36 +0

141 +0

25 +0

GitHub
warcat by chfoo

Tool and library for handling Web ARChive (WARC) files.

created at April 9, 2013, 4:23 p.m.

Python

11 +0

136 +1

21 +0

GitHub
chrome-remote-interface by cyrus-and

Chrome Debugging Protocol interface for Node.js

created at April 17, 2013, 6 p.m.

JavaScript

81 +0

4,192 +3

300 +1

GitHub
shine by ukwa

Prototype SOLR-powered web archive exploration UI.

created at July 3, 2013, 8:18 p.m.

JavaScript

17 +0

42 +0

7 +0

GitHub
warcprox by internetarchive

WARC writing MITM HTTP/S proxy

created at Oct. 25, 2013, 11:27 p.m.

Python

33 +1

363 +1

55 +0

GitHub
wpull by ArchiveTeam

Wget-compatible web downloader and crawler.

created at Dec. 7, 2013, 1:03 p.m.

HTML

23 +0

536 +1

77 +1

GitHub
pywb by webrecorder

Core Python Web Archiving Toolkit for replay and recording of web archives

created at Dec. 9, 2013, 3:30 a.m.

JavaScript

61 +1

1,309 +6

207 +1

GitHub
Mink by machawk1

Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user access to the copy

created at Jan. 17, 2014, 6:25 p.m.

JavaScript

6 +0

45 +0

3 +0

GitHub
warcrefs by arcalex

Web archive deduplication tools

created at April 22, 2014, 8:02 a.m.

Java

5 +0

6 +0

1 +0

GitHub
wikiteam by WikiTeam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2023, WikiTeam has preserved more than 350,000 wikis.

created at June 25, 2014, 10:18 a.m.

Python

40 +0

692 +2

144 +0

GitHub
ArchiveTools by recrm

A collection of tools for archiving and analysing the internet.

created at Jan. 14, 2015, 6:53 p.m.

Python

6 +0

66 +0

15 +0

GitHub
outbackcdx by nla

Web archive index server based on RocksDB

created at Jan. 15, 2015, 11:53 p.m.

Java

23 +0

29 +0

20 +0

GitHub
grab-site by ArchiveTeam

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

created at Feb. 5, 2015, 5:01 a.m.

Python

40 +0

1,270 +6

125 +3

GitHub