obelisk by go-shiori

Go package and CLI tool for saving web page as single HTML file

created at March 29, 2020, 12:53 a.m.

Go

11 +0

240 -1

15 +0

GitHub
warcat by chfoo

Tool and library for handling Web ARChive (WARC) files.

created at April 9, 2013, 4:23 p.m.

Python

11 +0

136 +0

21 +0

GitHub
WarcDB by Florents-Tselai

WarcDB: Web crawl data as SQLite databases.

created at May 29, 2022, 11:09 a.m.

Python

10 +0

384 +0

11 +0

GitHub
warc2html by iipc

Converts WARC files to static HTML

created at Nov. 8, 2021, 4:09 a.m.

Java

10 +0

38 +0

3 +0

GitHub
browsertrix by webrecorder

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

created at June 28, 2021, 10:46 p.m.

TypeScript

10 +0

131 +1

27 +0

GitHub
waybackpy by akamhy

Wayback Machine API interface & a command-line tool

created at May 2, 2020, 9:19 a.m.

Python

10 +0

441 +3

33 +0

GitHub
Squidwarc by N0taN3rd

Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head

created at July 20, 2017, 6:57 a.m.

JavaScript

10 +0

164 +0

26 +0

GitHub
node-warc by N0taN3rd

Parse And Create Web ARChive (WARC) files with node.js

created at May 21, 2017, 6 a.m.

JavaScript

9 +0

92 +0

20 +0

GitHub
webarchive-indexing by ikreymer

Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.

created at March 9, 2015, 8:32 p.m.

Python

9 +0

41 +0

9 +1

GitHub
wayback by wabarc

An archiving tool with an IM-style interface that prioritizes privacy and accessibility, integrated with various archival services including Internet Archive, archive.today, IPFS, Telegraph, and file systems.

created at June 13, 2020, 10:08 a.m.

Go

9 +0

1,667 +5

61 +0

GitHub
chatnoir-resiliparse by chatnoir-eu

A robust web archive analytics toolkit

created at June 22, 2021, 9:03 a.m.

Cython

9 +0

45 +1

8 +0

GitHub
awesome-memento by machawk1

A list of things related to software, literature, and other content for 🕣 Memento

created at Sept. 16, 2016, 1:33 a.m.

Unknown languages

8 +0

77 +0

8 +0

GitHub
jwat by netarchivesuite

Java Web Archive Toolkit

created at Aug. 30, 2018, 5:28 p.m.

Java

8 +0

3 +0

2 +0

GitHub
crocoite by PromyLOPh

Web archiving using Google Chrome

created at Nov. 17, 2017, 6:56 p.m.

Python

8 +0

42 +0

7 +0

GitHub
har2warc by webrecorder

Convert HTTP Archive (HAR) -> Web Archive (WARC) format

created at March 16, 2017, 12:14 a.m.

Python

7 +0

42 +0

3 +0

GitHub
webarchive by richardlehane

golang readers for ARC and WARC webarchive formats

created at Sept. 21, 2015, 6:38 a.m.

Go

7 +0

17 +0

2 +0

GitHub
scoop by harvard-lil

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.

created at Sept. 20, 2022, 6:50 p.m.

JavaScript

7 +0

101 +0

5 +0

GitHub
zotero-memento by leonkt

Zotero extension that combats link rot by archiving webpages and journal articles.

created at Aug. 29, 2019, 5:51 p.m.

JavaScript

7 +0

275 +1

14 +0

GitHub
gowarcserver by nlnwa

None

created at Jan. 15, 2021, 10:42 a.m.

Go

7 +0

12 +0

1 +0

GitHub
jwat-tools by netarchivesuite

JWAT Tools

created at Aug. 30, 2018, 5:54 p.m.

Java

7 +0

5 +1

2 +0

GitHub