unwarcit by emmadickson

None

updated at April 9, 2024, 9:06 p.m.

Python

5 +0

6 +0

0 +0

GitHub
Sparkling by internetarchive

Internet Archive's Sparkling Data Processing Library

updated at April 4, 2024, 12:42 a.m.

Scala

17 +0

10 +0

2 +0

GitHub
webarchive-discovery by ukwa

WARC and ARC indexing and discovery tools.

updated at March 31, 2024, 2:13 p.m.

Java

24 +0

113 +0

24 +0

GitHub
wasp by webis-de

None

updated at March 30, 2024, 10:57 a.m.

Java

13 +0

25 +0

4 +0

GitHub
webarchive-indexing by ikreymer

Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.

updated at March 26, 2024, 10:50 p.m.

Python

9 +0

41 +0

9 +1

GitHub
playback by wabarc

Playback webpages from Wayback Machine

updated at March 21, 2024, 1:56 p.m.

Go

4 +0

6 +0

1 +0

GitHub
har2warc by webrecorder

Convert HTTP Archive (HAR) -> Web Archive (WARC) format

updated at March 12, 2024, 12:41 p.m.

Python

7 +0

42 +0

3 +0

GitHub
crau by turicas

Easy-to-use Web archiver

updated at March 11, 2024, 6:49 p.m.

Python

4 +0

53 +0

9 +0

GitHub
MementoMap by oduwsdl

A Tool to Summarize Web Archive Holdings

updated at March 6, 2024, 7:56 p.m.

Python

7 +0

9 +0

0 +0

GitHub
Zotero-Robust-Links-Extension by lanl

Create Robust Links from within Zotero

updated at Feb. 22, 2024, 6:58 p.m.

JavaScript

3 +0

17 +0

2 +0

GitHub
webarchive by richardlehane

golang readers for ARC and WARC webarchive formats

updated at Feb. 6, 2024, 11:28 p.m.

Go

7 +0

17 +0

2 +0

GitHub
httrack2warc by nla

Converts HTTrack crawls to WARC files

updated at Jan. 30, 2024, 12:40 p.m.

Java

20 +0

27 +0

6 +0

GitHub
shine by ukwa

Prototype SOLR-powered web archive exploration UI.

updated at Jan. 29, 2024, 1:03 a.m.

JavaScript

17 +0

42 +0

7 +0

GitHub
warcrefs by arcalex

Web archive deduplication tools

updated at Jan. 26, 2024, 12:55 a.m.

Java

5 +0

6 +0

1 +0

GitHub
notebooks by archivesunleashed

Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.

updated at Jan. 21, 2024, 10:04 a.m.

Jupyter Notebook

6 +0

21 +0

4 +0

GitHub
web-archiving-course by vphill

Web Archiving Course

updated at Dec. 19, 2023, 6:19 p.m.

Unknown languages

1 +0

19 +0

6 +0

GitHub
heritrix-walkthrough by web-archive-group

None

updated at Dec. 9, 2023, 12:31 a.m.

Shell

6 +0

9 +0

1 +0

GitHub
linkstat by httpreserve

CLI implementation of httpreserve that can test links and retrieve internet archive replacements

updated at Nov. 18, 2023, 5:02 p.m.

Go

3 +0

7 +0

0 +0

GitHub
crocoite by PromyLOPh

Web archiving using Google Chrome

updated at Oct. 23, 2023, 11:32 a.m.

Python

8 +0

42 +0

7 +0

GitHub
Web2Warc by helgeho

An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)

updated at Oct. 22, 2023, 8:37 p.m.

Scala

3 +0

24 +0

4 +0

GitHub