brozzler in iipc/awesome-web-archiving

brozzler - distributed browser-based web crawler

created at July 13, 2015, 11:48 p.m.

Python

40 +0

671 +2

97 +0

GitHub
warcprox in iipc/awesome-web-archiving

WARC writing MITM HTTP/S proxy

created at Oct. 25, 2013, 11:27 p.m.

Python

39 +0

381 +1

54 +0

GitHub
warctools in iipc/awesome-web-archiving

Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)

created at March 22, 2013, 8:52 p.m.

Python

44 +0

152 +0

27 +0

GitHub
arch in iipc/awesome-web-archiving

Web application for distributed compute analysis of Archive-It web archive collections.

created at April 28, 2022, 3:18 p.m.

Scala

21 +0

15 +0

4 +0

GitHub
Sparkling in iipc/awesome-web-archiving

Internet Archive's Sparkling Data Processing Library

created at April 28, 2022, 2:28 p.m.

Scala

20 +0

11 +0

2 +0

GitHub