webarchive-indexing by ikreymer

Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.

created at March 9, 2015, 8:32 p.m.

Python

9 +0

42 +0

10 +0

GitHub
crocoite by PromyLOPh

Web archiving using Google Chrome

created at Nov. 17, 2017, 6:56 p.m.

Python

8 +0

42 +0

7 +0

GitHub
html2warc by steffenfritz

simple script to convert web resources to a single warc file

created at Dec. 30, 2015, 2:29 p.m.

Python

4 +0

18 +0

2 +0

GitHub
py-wasapi-client by unt-libraries

A client for the Archive-It And Webrecorder WASAPI Data Transfer API

created at Aug. 10, 2017, 5:25 p.m.

Python

5 +0

14 +0

5 +0

GitHub
MementoMap by oduwsdl

A Tool to Summarize Web Archive Holdings

created at Jan. 20, 2019, 1:30 a.m.

Python

7 +0

10 +0

1 +0

GitHub
warc-safe by natliblux

A tool for detecting viruses and NSFW material in WARC files

created at May 3, 2024, 6:24 a.m.

Python

4 +0

10 +0

0 +0

GitHub
unwarcit by emmadickson

None

created at Dec. 11, 2021, 7:19 p.m.

Python

5 +0

8 +0

0 +0

GitHub