webarchive-discovery by ukwa

WARC and ARC indexing and discovery tools.

created at Dec. 20, 2012, 12:17 p.m.

Java

24 +0

113 +0

24 +0

GitHub
solrwayback by netarchivesuite

A search interface and wayback machine for the UKWA Solr based warc-indexer framework.

created at Feb. 8, 2017, 9:33 a.m.

Java

24 +0

95 +1

18 +0

GitHub
jwarc by iipc

Java library for reading and writing WARC files with a typed API

created at Sept. 21, 2015, 3:07 a.m.

Java

5 +0

43 +1

8 +0

GitHub
warc2html by iipc

Converts WARC files to static HTML

created at Nov. 8, 2021, 4:09 a.m.

Java

10 +0

38 +0

3 +0

GitHub
outbackcdx by nla

Web archive index server based on RocksDB

created at Jan. 15, 2015, 11:53 p.m.

Java

23 +0

29 +0

20 +0

GitHub
httrack2warc by nla

Converts HTTrack crawls to WARC files

created at Oct. 23, 2017, 5:52 a.m.

Java

20 +0

27 +0

6 +0

GitHub
wasp by webis-de

None

created at March 25, 2018, 6:58 p.m.

Java

13 +0

25 +0

4 +0

GitHub
HadoopConcatGz by helgeho

A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

created at Aug. 8, 2016, 1:36 p.m.

Java

2 +0

9 +0

3 +0

GitHub
warcrefs by arcalex

Web archive deduplication tools

created at April 22, 2014, 8:02 a.m.

Java

5 +0

6 +0

1 +0

GitHub
wasapi-downloader by sul-dlss

Java application to download WARCs from WASAPI

created at April 28, 2017, 9:15 p.m.

Java

22 +0

6 +0

4 +0

GitHub
jwat-tools by netarchivesuite

JWAT Tools

created at Aug. 30, 2018, 5:54 p.m.

Java

NEW!

7 +0

4 +0

2 +0

GitHub
jwat by netarchivesuite

Java Web Archive Toolkit

created at Aug. 30, 2018, 5:28 p.m.

Java

NEW!

8 +0

3 +0

2 +0

GitHub
WarcPartitioner by helgeho

Partition (W)ARC Files by MIME Type and Year

created at Feb. 13, 2017, 3:45 p.m.

Java

2 +0

1 +0

1 +0

GitHub