chronicler by CGamesPlay

Offline-first web browser

updated at May 1, 2024, 4:40 p.m.

JavaScript

6 +0

83 +0

5 +0

GitHub
brozzler by internetarchive

brozzler - distributed browser-based web crawler

updated at May 4, 2024, 4:59 a.m.

Python

36 +0

630 +0

93 +0

GitHub
outbackcdx by nla

Web archive index server based on RocksDB

updated at May 4, 2024, 5:05 a.m.

Java

23 +0

29 +0

20 +0

GitHub
ArchiveSpark by helgeho

An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.

updated at May 5, 2024, 4:14 a.m.

Scala

14 +0

141 +0

19 +0

GitHub
py-wasapi-client by unt-libraries

A client for the Archive-It And Webrecorder WASAPI Data Transfer API

updated at May 7, 2024, 3:08 a.m.

Python

5 +0

14 +0

4 +0

GitHub
solrwayback by netarchivesuite

A search interface and wayback machine for the UKWA Solr based warc-indexer framework.

updated at May 7, 2024, 6:08 a.m.

Java

24 +0

95 +0

18 +0

GitHub
html2warc by steffenfritz

simple script to convert web resources to a single warc file

updated at May 8, 2024, 5:21 a.m.

Python

4 +0

15 +0

2 +0

GitHub
wail by N0taN3rd

whale2 One-Click User Instigated Preservation

updated at May 10, 2024, 3:37 a.m.

JavaScript

13 +0

120 +0

9 +0

GitHub
warcat by chfoo

Tool and library for handling Web ARChive (WARC) files.

updated at May 11, 2024, 9:25 p.m.

Python

11 +0

136 +0

21 +0

GitHub
wasapi-downloader by sul-dlss

Java application to download WARCs from WASAPI

updated at May 13, 2024, 4:05 p.m.

Java

22 +0

6 +0

4 +0

GitHub
twarc by DocNow

A command line tool (and Python library) for archiving Twitter JSON

updated at May 13, 2024, 5:25 p.m.

Python

35 +0

1,355 +0

254 +1

GitHub
scoop by harvard-lil

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.

updated at May 13, 2024, 7:35 p.m.

JavaScript

7 +0

101 +0

5 +0

GitHub
ArchiveTools by recrm

A collection of tools for archiving and analysing the internet.

updated at May 14, 2024, 5:12 p.m.

Python

6 +0

67 +0

15 +0

GitHub
awesome-website-change-monitoring by edgi-govdata-archiving

A curated list of awesome tools for website diffing and change monitoring.

updated at May 14, 2024, 9:53 p.m.

Unknown languages

31 +0

482 +0

31 +0

GitHub
archivenow by oduwsdl

A Tool To Push Web Resources Into Web Archives

updated at May 15, 2024, 1:18 a.m.

Python

21 +0

392 +0

41 +0

GitHub
freeze-dry by WebMemex

Snapshots a web page to get it as a static, self-contained HTML document.

updated at May 15, 2024, 4:16 a.m.

TypeScript

11 +0

268 +0

18 +0

GitHub
fbarc by justinlittman

A commandline tool and Python library for archiving data from Facebook using the Graph API.

updated at May 17, 2024, 4:57 a.m.

Python

16 +0

77 +0

11 +0

GitHub
jwarc by iipc

Java library for reading and writing WARC files with a typed API

updated at May 17, 2024, 4:43 p.m.

Java

5 +0

42 +0

8 +0

GitHub
wail by machawk1

whale2 Web Archiving Integration Layer: One-Click User Instigated Preservation

updated at May 18, 2024, 11:20 a.m.

Roff

14 +0

345 +0

32 +0

GitHub
warcio by webrecorder

Streaming WARC/ARC library for fast web archive IO

updated at May 18, 2024, 11:21 a.m.

Python

22 +0

349 +0

55 +1

GitHub