HadoopConcatGz in iipc/awesome-web-archiving

A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz

created at Aug. 8, 2016, 1:36 p.m.

Java

2 +0

9 +0

3 +0

GitHub
WarcPartitioner in iipc/awesome-web-archiving

Partition (W)ARC Files by MIME Type and Year

created at Feb. 13, 2017, 3:45 p.m.

Java

2 +0

1 +0

1 +0

GitHub