crocoite by PromyLOPh

Web archiving using Google Chrome

updated at Oct. 23, 2023, 11:32 a.m.

Python

8 +0

42 +0

7 +0

GitHub
MementoMap by oduwsdl

A Tool to Summarize Web Archive Holdings

updated at March 6, 2024, 7:56 p.m.

Python

7 +0

9 +0

1 +0

GitHub
webarchive-indexing by ikreymer

Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.

updated at March 26, 2024, 10:50 p.m.

Python

9 +0

41 +0

10 +0

GitHub
fbarc by justinlittman

A commandline tool and Python library for archiving data from Facebook using the Graph API.

updated at May 17, 2024, 4:57 a.m.

Python

16 +0

77 +0

11 +0

GitHub
ArchiveTools by recrm

A collection of tools for archiving and analysing the internet.

updated at June 17, 2024, 9:09 p.m.

Python

6 +0

68 +0

15 +0

GitHub
py-wasapi-client by unt-libraries

A client for the Archive-It And Webrecorder WASAPI Data Transfer API

updated at June 28, 2024, 7:33 p.m.

Python

5 +0

14 +0

5 +0

GitHub
html2warc by steffenfritz

simple script to convert web resources to a single warc file

updated at June 29, 2024, 9:24 a.m.

Python

4 +0

18 +0

2 +0

GitHub
crau by turicas

Easy-to-use Web archiver

updated at July 6, 2024, 7:32 p.m.

Python

4 +0

56 +0

10 +0

GitHub
warcworker by peterk

A dockerized, queued high fidelity web archiver based on Squidwarc

updated at July 23, 2024, 9:51 p.m.

Python

6 +0

54 +0

9 +0

GitHub
warctools by internetarchive

Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)

updated at Aug. 29, 2024, 5:43 p.m.

Python

37 +0

149 +0

27 +0

GitHub
warcio by webrecorder

Streaming WARC/ARC library for fast web archive IO

updated at Aug. 31, 2024, 6:12 a.m.

Python

22 +0

369 +0

58 +0

GitHub
warcat by chfoo

Tool and library for handling Web ARChive (WARC) files.

updated at Sept. 6, 2024, 1:40 p.m.

Python

11 +0

147 +0

21 +0

GitHub
warcprox by internetarchive

WARC writing MITM HTTP/S proxy

updated at Sept. 8, 2024, noon

Python

34 +0

379 +0

54 +0

GitHub
WarcDB by Florents-Tselai

WarcDB: Web crawl data as SQLite databases.

updated at Sept. 10, 2024, 3:01 p.m.

Python

10 +0

390 +0

11 +0

GitHub
unwarcit by emmadickson

None

updated at Sept. 11, 2024, 2:07 a.m.

Python

5 +0

8 +0

0 +0

GitHub
warc-safe by natliblux

A tool for detecting viruses and NSFW material in WARC files

updated at Sept. 11, 2024, 2:07 a.m.

Python

4 +0

10 +0

0 +0

GitHub
waybackpy by akamhy

Wayback Machine API interface & a command-line tool

updated at Sept. 11, 2024, 1:56 p.m.

Python

10 +0

464 +0

34 +0

GitHub
ipwb by oduwsdl

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

updated at Sept. 14, 2024, 4:52 p.m.

Python

23 +0

606 +0

39 +0

GitHub
brozzler by internetarchive

brozzler - distributed browser-based web crawler

updated at Sept. 15, 2024, 12:07 p.m.

Python

36 +0

653 +0

96 +0

GitHub
har2warc by webrecorder

Convert HTTP Archive (HAR) -> Web Archive (WARC) format

updated at Sept. 18, 2024, 11:21 a.m.

Python

7 +0

44 +1

4 +0

GitHub