iipc/awesome-web-archiving

SingleFile by gildas-lormeau

Web Extension for saving a faithful copy of a complete web page in a single HTML file

updated at May 12, 2024, 7:54 p.m.

JavaScript

114 +0

13,911 +141

921 +6

GitHub

ArchiveBox by ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

updated at May 12, 2024, 6:51 p.m.

Python

172 +0

19,905 +54

1,083 +6

GitHub

browsertrix-crawler by webrecorder

Run a high-fidelity browser-based crawler in a single Docker container

updated at May 12, 2024, 4:42 p.m.

TypeScript

23 +0

551 +4

69 +1

GitHub

pywb by webrecorder

Core Python Web Archiving Toolkit for replay and recording of web archives

updated at May 12, 2024, 1:47 p.m.

JavaScript

61 +1

1,309 +6

207 +1

GitHub

browsertrix by webrecorder

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

updated at May 12, 2024, 11:36 a.m.

TypeScript

10 +0

127 +3

26 +0

GitHub

badger by dgraph-io

Fast key-value DB in Go.

updated at May 12, 2024, 9:47 a.m.

Go

240 +0

13,427 +23

1,149 +2

GitHub

flameshot by flameshot-org

Powerful yet simple to use screenshot software :desktop_computer: :camera_flash:

updated at May 12, 2024, 8:15 a.m.

C++

206 -1

23,279 +51

1,499 +1

GitHub

grab-site by ArchiveTeam

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

updated at May 12, 2024, 5:50 a.m.

Python

40 +0

1,270 +6

125 +3

GitHub

An archiving tool with an IM-style interface that prioritizes privacy and accessibility, integrated with various archival services including Internet Archive, archive.today, IPFS, Telegraph, and file systems.

updated at May 12, 2024, 5:44 a.m.

Go

9 -2

1,658 +8

59 -1

GitHub

monolith by Y2Z

⬛️ CLI tool for saving complete web pages as a single HTML file

updated at May 12, 2024, 4:49 a.m.

Rust

62 +0

9,991 +29

284 +0

GitHub

xdotool by jordansissel

fake keyboard/mouse input, window management, and more

updated at May 12, 2024, 12:59 a.m.

C

56 +0

3,047 +10

311 +1

GitHub

warcat by chfoo

Tool and library for handling Web ARChive (WARC) files.

updated at May 11, 2024, 9:25 p.m.

Python

11 +0

136 +1

21 +0

GitHub

gogetcrawl by karust

Extract web archive data using Wayback Machine and Common Crawl

updated at May 11, 2024, 12:11 p.m.

Go

5 +0

130 +3

15 +0

GitHub

wpull by ArchiveTeam

Wget-compatible web downloader and crawler.

updated at May 11, 2024, 11:54 a.m.

HTML

23 +0

536 +1

77 +1

GitHub

internetarchive by jjjake

A Python and Command-Line Interface to Archive.org

updated at May 11, 2024, 8:29 a.m.

Python

51 +0

1,526 +8

209 +0

GitHub

warcio by webrecorder

Streaming WARC/ARC library for fast web archive IO

updated at May 11, 2024, 6:43 a.m.

Python

22 +0

346 +1

54 +0

GitHub

chrome-remote-interface by cyrus-and

Chrome Debugging Protocol interface for Node.js

updated at May 11, 2024, 2:06 a.m.

JavaScript

81 +0

4,192 +3

300 +1

GitHub

DownloadNet by dosyago

💾 DownloadNet - All content you browse online available offline. Search through the full-text of all pages in your browser history. ⭐️ Star to support our work!

updated at May 10, 2024, 6:50 p.m.

JavaScript

42 +0

3,654 +4

137 +0

GitHub

auto-archiver by bellingcat

Automatically archive links to videos, images, and social media content from Google Sheets (and more).

updated at May 10, 2024, 4:42 a.m.

Python

19 +0

474 +4

53 +0

GitHub

MemGator by oduwsdl

A Memento Aggregator CLI and Server in Go

updated at May 10, 2024, 4:08 a.m.

Go

14 +1

54 +1

11 +0

GitHub

SingleFile by gildas-lormeau

ArchiveBox by ArchiveBox

browsertrix-crawler by webrecorder

pywb by webrecorder

browsertrix by webrecorder

badger by dgraph-io

flameshot by flameshot-org

grab-site by ArchiveTeam

wayback by wabarc

monolith by Y2Z

xdotool by jordansissel

warcat by chfoo

gogetcrawl by karust

wpull by ArchiveTeam

internetarchive by jjjake

warcio by webrecorder

chrome-remote-interface by cyrus-and

DownloadNet by dosyago

auto-archiver by bellingcat

MemGator by oduwsdl