warc2html by iipc

Converts WARC files to static HTML

created at Nov. 8, 2021, 4:09 a.m.

Java

10 +0

38 +0

3 +0

GitHub
unwarcit by emmadickson

None

created at Dec. 11, 2021, 7:19 p.m.

Python

5 +0

6 +0

0 +0

GitHub
web-archiving-course by vphill

Web Archiving Course

created at Feb. 22, 2022, 2:33 a.m.

Unknown languages

1 +0

19 +0

6 +0

GitHub
Sparkling by internetarchive

Internet Archive's Sparkling Data Processing Library

created at April 28, 2022, 2:28 p.m.

Scala

17 +0

10 +0

2 +0

GitHub
arch by internetarchive

Web application for distributed compute analysis of Archive-It web archive collections.

created at April 28, 2022, 3:18 p.m.

Scala

19 +0

13 +0

4 +0

GitHub
WarcDB by Florents-Tselai

WarcDB: Web crawl data as SQLite databases.

created at May 29, 2022, 11:09 a.m.

Python

10 +0

384 +0

11 +0

GitHub
scoop by harvard-lil

🍨 High-fidelity, browser-based, single-page web archiving library and CLI for witnessing the web.

created at Sept. 20, 2022, 6:50 p.m.

JavaScript

7 +0

101 +0

5 +0

GitHub
warc-safe by natliblux

A tool for detecting viruses and NSFW material in WARC files

created at May 3, 2024, 6:24 a.m.

Python

4 +0

7 +2

0 +0

GitHub