jwat-tools by netarchivesuite

JWAT Tools

updated at May 22, 2024, 5:55 a.m.

Java

7 +0

5 +1

2 +0

GitHub
jwarc by iipc

Java library for reading and writing WARC files with a typed API

updated at May 17, 2024, 4:43 p.m.

Java

5 +0

42 +0

8 +0

GitHub
wasapi-downloader by sul-dlss

Java application to download WARCs from WASAPI

updated at May 13, 2024, 4:05 p.m.

Java

22 +0

6 +0

4 +0

GitHub
solrwayback by netarchivesuite

A search interface and wayback machine for the UKWA Solr based warc-indexer framework.

updated at May 7, 2024, 6:08 a.m.

Java

24 +0

95 +0

18 +0

GitHub
outbackcdx by nla

Web archive index server based on RocksDB

updated at May 4, 2024, 5:05 a.m.

Java

23 +0

29 +0

20 +0

GitHub
warc2html by iipc

Converts WARC files to static HTML

updated at April 26, 2024, 4:02 p.m.

Java

10 +0

38 +0

3 +0

GitHub
webarchive-discovery by ukwa

WARC and ARC indexing and discovery tools.

updated at March 31, 2024, 2:13 p.m.

Java

24 +0

113 +0

24 +0

GitHub
wasp by webis-de

None

updated at March 30, 2024, 10:57 a.m.

Java

13 +0

25 +0

4 +0

GitHub
httrack2warc by nla

Converts HTTrack crawls to WARC files

updated at Jan. 30, 2024, 12:40 p.m.

Java

20 +0

27 +0

6 +0

GitHub
warcrefs by arcalex

Web archive deduplication tools

updated at Jan. 26, 2024, 12:55 a.m.

Java

5 +0

6 +0

1 +0

GitHub
jwat by netarchivesuite

Java Web Archive Toolkit

updated at April 17, 2023, 8:40 p.m.

Java

8 +0

3 +0

2 +0

GitHub
WarcPartitioner by helgeho

Partition (W)ARC Files by MIME Type and Year

updated at Jan. 29, 2022, 10:23 p.m.

Java

2 +0

1 +0

1 +0

GitHub
HadoopConcatGz by helgeho

A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

updated at April 7, 2021, 12:20 a.m.

Java

2 +0

9 +0

3 +0

GitHub