Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
created at Nov. 6, 2019, 3:09 a.m.
golang readers for ARC and WARC webarchive formats
created at Sept. 21, 2015, 6:38 a.m.
simple script to convert web resources to a single warc file
created at Dec. 30, 2015, 2:29 p.m.
Create Robust Links from within Zotero
created at June 28, 2021, 9:38 p.m.
Web application for distributed compute analysis of Archive-It web archive collections.
created at April 28, 2022, 3:18 p.m.
A client for the Archive-It And Webrecorder WASAPI Data Transfer API
created at Aug. 10, 2017, 5:25 p.m.
Internet Archive's Sparkling Data Processing Library
created at April 28, 2022, 2:28 p.m.
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
created at Aug. 8, 2016, 1:36 p.m.
CLI implementation of httpreserve that can test links and retrieve internet archive replacements
created at March 19, 2019, 9:23 p.m.
Tika based link (URL) extractor for httpreserve
created at April 3, 2017, 2:35 a.m.
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.
created at Nov. 29, 2019, 2:52 p.m.