node-warc by N0taN3rd

Parse And Create Web ARChive (WARC) files with node.js

created at May 21, 2017, 6 a.m.

JavaScript

9 +0

92 +0

20 +0

GitHub
chronicler by CGamesPlay

Offline-first web browser

created at Dec. 27, 2018, 4:01 a.m.

JavaScript

6 +0

83 +0

5 +0

GitHub
fbarc by justinlittman

A commandline tool and Python library for archiving data from Facebook using the Graph API.

created at Feb. 14, 2017, 11:45 p.m.

Python

16 +0

78 +0

11 +0

GitHub
awesome-memento by machawk1

A list of things related to software, literature, and other content for 🕣 Memento

created at Sept. 16, 2016, 1:33 a.m.

Unknown languages

8 +0

77 +0

8 +0

GitHub
ArchiveTools by recrm

A collection of tools for archiving and analysing the internet.

created at Jan. 14, 2015, 6:53 p.m.

Python

6 +0

66 +0

15 +0

GitHub
MemGator by oduwsdl

A Memento Aggregator CLI and Server in Go

created at Sept. 8, 2015, 1:43 a.m.

Go

14 +1

54 +1

11 +0

GitHub
warcworker by peterk

A dockerized, queued high fidelity web archiver based on Squidwarc

created at July 21, 2018, 8:31 a.m.

Python

6 +0

53 +0

9 +0

GitHub
crau by turicas

Easy-to-use Web archiver

created at Oct. 26, 2019, 7:21 p.m.

Python

4 +0

53 +0

9 +1

GitHub
warclight by archivesunleashed

A Rails engine supporting the discovery of web archives.

created at Aug. 3, 2017, 5:45 p.m.

Ruby

5 +0

48 +0

10 +0

GitHub
Mink by machawk1

Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user access to the copy

created at Jan. 17, 2014, 6:25 p.m.

JavaScript

6 +0

45 +0

3 +0

GitHub
chatnoir-resiliparse by chatnoir-eu

A robust web archive analytics toolkit

created at June 22, 2021, 9:03 a.m.

Cython

9 +0

44 +2

8 +0

GitHub
jwarc by iipc

Java library for reading and writing WARC files with a typed API

created at Sept. 21, 2015, 3:07 a.m.

Java

5 +0

43 +1

8 +0

GitHub
shine by ukwa

Prototype SOLR-powered web archive exploration UI.

created at July 3, 2013, 8:18 p.m.

JavaScript

17 +0

42 +0

7 +0

GitHub
crocoite by PromyLOPh

Web archiving using Google Chrome

created at Nov. 17, 2017, 6:56 p.m.

Python

8 +0

42 +0

7 +0

GitHub
har2warc by webrecorder

Convert HTTP Archive (HAR) -> Web Archive (WARC) format

created at March 16, 2017, 12:14 a.m.

Python

7 +0

42 +0

3 +0

GitHub
webarchive-indexing by ikreymer

Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.

created at March 9, 2015, 8:32 p.m.

Python

9 +0

41 +0

8 +0

GitHub
cc-notebooks by commoncrawl

Various Jupyter notebooks about Common Crawl data

created at July 19, 2019, 11:38 a.m.

Jupyter Notebook

16 +0

40 +0

8 +0

GitHub
warc2html by iipc

Converts WARC files to static HTML

created at Nov. 8, 2021, 4:09 a.m.

Java

10 +0

38 +0

3 +0

GitHub
cairn by wabarc

NPM package and CLI tool for saving web page as single HTML file

created at Oct. 8, 2020, 7:18 a.m.

TypeScript

4 +0

37 +0

2 +0

GitHub
outbackcdx by nla

Web archive index server based on RocksDB

created at Jan. 15, 2015, 11:53 p.m.

Java

23 +0

29 +0

20 +0

GitHub