simple script to convert web resources to a single warc file
created at Dec. 30, 2015, 2:29 p.m.
An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.
created at Nov. 29, 2019, 2:52 p.m.
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
created at Nov. 6, 2019, 3:09 a.m.
Core Python Web Archiving Toolkit for replay and recording of web archives
created at Dec. 9, 2013, 3:30 a.m.
A client for the Archive-It And Webrecorder WASAPI Data Transfer API
created at Aug. 10, 2017, 5:25 p.m.
brozzler - distributed browser-based web crawler
created at July 13, 2015, 11:48 p.m.
golang readers for ARC and WARC webarchive formats
created at Sept. 21, 2015, 6:38 a.m.
Tika based link (URL) extractor for httpreserve
created at April 3, 2017, 2:35 a.m.
Powerful yet simple to use screenshot software :desktop_computer: :camera_flash:
created at May 10, 2017, 7:44 p.m.
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
created at May 5, 2017, 8:50 a.m.