scoop by harvard-lil

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.

updated at Sept. 22, 2024, 12:52 p.m.

JavaScript

7 +0

114 +3

7 +0

GitHub
monolith by Y2Z

⬛️ CLI tool for saving complete web pages as a single HTML file

updated at Sept. 22, 2024, 12:03 p.m.

Rust

61 +0

10,918 +15

315 +2

GitHub
flameshot by flameshot-org

Powerful yet simple to use screenshot software :desktop_computer: :camera_flash:

updated at Sept. 22, 2024, 6:27 a.m.

C++

208 +0

24,612 +53

1,577 +4

GitHub
SingleFile by gildas-lormeau

Web Extension for saving a faithful copy of a complete web page in a single HTML file

updated at Sept. 22, 2024, 5:49 a.m.

JavaScript

117 +0

14,847 +50

972 +1

GitHub
zotero-memento by leonkt

Zotero extension that combats link rot by archiving webpages and journal articles.

updated at Sept. 22, 2024, 1:40 a.m.

JavaScript

7 +0

288 +2

14 +0

GitHub
badger by dgraph-io

Fast key-value DB in Go.

updated at Sept. 21, 2024, 11:30 p.m.

Go

231 +0

13,793 +29

1,173 +0

GitHub
ArchiveBox by ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

updated at Sept. 21, 2024, 9:57 p.m.

Python

177 +0

20,927 +136

1,116 +3

GitHub
wayback by wabarc

An archiving tool with an IM-style interface that prioritizes privacy and accessibility, integrated with various archival services including Internet Archive, archive.today, Ghostarchive, IPFS, Telegraph, and file systems.

updated at Sept. 21, 2024, 8:33 p.m.

Go

11 +0

1,758 +8

65 +0

GitHub
chrome-remote-interface by cyrus-and

Chrome Debugging Protocol interface for Node.js

updated at Sept. 21, 2024, 6:04 p.m.

JavaScript

79 -2

4,266 +6

305 +0

GitHub
dn by dosyago

💾 dn - offline full-text search and archiving for your Chromium-based browser.

updated at Sept. 21, 2024, 2:34 p.m.

JavaScript

42 +0

3,757 +1

142 +0

GitHub
Squidwarc by N0taN3rd

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

updated at Sept. 21, 2024, 12:51 p.m.

JavaScript

10 +0

167 +1

26 +0

GitHub
internetarchive by jjjake

A Python and Command-Line Interface to Archive.org

updated at Sept. 21, 2024, 10:32 a.m.

Python

56 +0

1,584 +3

217 +0

GitHub
browsertrix-crawler by webrecorder

Run a high-fidelity browser-based web archiving crawler in a single Docker container

updated at Sept. 21, 2024, 6:18 a.m.

TypeScript

24 +0

612 +2

79 +0

GitHub
xdotool by jordansissel

fake keyboard/mouse input, window management, and more

updated at Sept. 20, 2024, 8:59 p.m.

C

58 +0

3,176 +8

316 +0

GitHub
browsertrix by webrecorder

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

updated at Sept. 20, 2024, 6:53 p.m.

TypeScript

12 +0

170 +4

32 +2

GitHub
chatnoir-resiliparse by chatnoir-eu

A robust web archive analytics toolkit

updated at Sept. 20, 2024, 1:34 p.m.

Cython

9 +0

79 +6

11 +0

GitHub
archivenow by oduwsdl

A Tool To Push Web Resources Into Web Archives

updated at Sept. 20, 2024, 7:58 a.m.

Python

21 +0

405 +1

42 +0

GitHub
pywb by webrecorder

Core Python Web Archiving Toolkit for replay and recording of web archives

updated at Sept. 20, 2024, 3:24 a.m.

JavaScript

61 +0

1,362 +1

212 +0

GitHub
grab-site by ArchiveTeam

The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns

updated at Sept. 20, 2024, 3:24 a.m.

Python

40 +0

1,339 +4

134 +2

GitHub
ArchiveSpark by helgeho

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

updated at Sept. 19, 2024, 11:56 a.m.

Scala

14 +0

143 +0

19 +0

GitHub