brozzler in iipc/awesome-web-archiving

brozzler - distributed browser-based web crawler

created at July 13, 2015, 11:48 p.m.

Python

36 +0

630 +0

93 +0

GitHub
warcprox in iipc/awesome-web-archiving

WARC writing MITM HTTP/S proxy

created at Oct. 25, 2013, 11:27 p.m.

Python

33 +1

363 +1

55 +0

GitHub
warctools in iipc/awesome-web-archiving

Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)

created at March 22, 2013, 8:52 p.m.

Python

36 +0

141 +0

25 +0

GitHub
arch in iipc/awesome-web-archiving

Web application for distributed compute analysis of Archive-It web archive collections.

created at April 28, 2022, 3:18 p.m.

Scala

19 +0

13 +0

4 +0

GitHub
Sparkling in iipc/awesome-web-archiving

Internet Archive's Sparkling Data Processing Library

created at April 28, 2022, 2:28 p.m.

Scala

17 +0

10 +0

2 +0

GitHub