This is a list of data hoarding resources if you want to get started and help archival teams, or simply backup web content for your own personal use.
Filters: Tools Services Communities Knowledge
Tools
ArchiveBox
Description:
ArchiveBox is an open source tool that lets organizations and individuals archive public or private web content while retaining control over their data. It can be used to save copies of bookmarks, preserve evidence for legal cases, backup photos from FB/Insta/Flickr or media from YT/Soundcloud/etc., save research papers, and more.
- archivebox.io - Archival tools.
Tools
ArchivesSpace
Description:
ArchivesSpace is an open-source archives information management application for managing and providing access to archives, manuscripts and digital objects and supports a range of archival functions.
- archivesspace.org - Home page.
- github.com/archivesspace/archivessp... - GitHub repo.
- youtube.com/@archivesspace5340 - Videos channel.
Tools
ArchiveTeam Warrior
Description:
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions and done their best to save history before it's lost forever. They provide the ArchiveTeam Warrior, a virtual archiving appliance to help with the ArchiveTeam archiving efforts, along with other tools.
- wiki.archiveteam.org - The ArchiveTeam wiki.
- arrior.archiveteam.org - The ArchiveTeam Warrior virtual archiving appliance.
- wiki.archiveteam.org/index.php/Deat... - A list of dead or dying web sites that should be archived.
- wiki.archiveteam.org/index.php/Wiki... - WikiTeam software is a set of tools for archiving wikis.
- wiki.archiveteam.org/index.php/Arch... - ArchiveBot is an IRC bot designed to automate the archival of smaller websites.
- archive.fart.website/archivebot/vie... - List of sites archived by ArchiveBot.
Knowledge
Automate the Boring Stuff
Description:
If you've ever spent hours renaming files or updating hundreds of spreadsheet cells, you know how tedious tasks like these can be. But what if you could have your computer do them for you? In Automate the Boring Stuff with Python, you'll learn how to use Python to write programs that do in minutes what would take you hours to do by hand, no prior programming experience required.
- automatetheboringstuff.com - Book content.
Tools
Awesome AI
Description:
Awesome AI is a curated list of awesome AI tools, frameworks, api, software and resources related to machine learning.
- github.com/openbestof/awesome-ai - List of tools 1.
- github.com/mahseema/awesome-ai-tool... - List of tools 2.
- github.com/re50urces/Awesome-AI - List of tools 3.
Tools
Awesome Datahoarding
Description:
These tools are aimed at those wishing to get started with data hoarding. The list includes applications that you can run locally to gather data, parse data and index it.
- github.com/simon987/awesome-datahoa... - List of tools.
Tools
Awesome Selfhosted
Description:
Self-hosting is the practice of hosting and managing applications on your own server(s) instead of consuming from SaaSS providers. This is a list of Free Software network services and web applications which can be hosted on your own server(s). Non-Free software is listed on the Non-Free page.
- awesome-selfhosted.net - List of self-hosted software.
- github.com/awesome-selfhosted/aweso... - Source repository.
- github.com/hotheadhacker/awesome-se... - Docker focused list.
- selfh.st/apps - Alternative list.
Tools
Browsertrix
Description:
Browsertrix is an open source web archiving system created by Webrecorder. It provides a web interface to start crawling jobs of web sites, and is available as a SaaS app and as self hosted on Kubernetes. It also supports proxy servers.
- webrecorder.net/browsertrix - Browsertrix home page.
- docs.browsertrix.com - Documentation.
- youtube.com/@webrecorder - Videos channel.
Services
Canadian Technology Resources
Description:
Canadian alternatives for digital products. This site helps you find Canadian alternatives for digital service and products, like cloud services and SaaS products. These can be useful should you want to set up your own web site, email, or other cloud services but want to avoid big tech companies.
- canadian-tech.ca - Index of Canadian alternatives.
Knowledge
CensorTrace
Description:
Following the 2025 U.S. presidential inauguration, this automated tool monitors changes to major government websites by identifying and tracking removed pages, using publicly available data from the Internet Archive.
- censortrace.org - Home page.
Knowledge
Cybersecurity Mastery Roadmap
Description:
A comprehensive, step-by-step guide to mastering cybersecurity from beginner to expert level with curated resources, tools, and career guidance.
- github.com/Hamed233/Cybersecurity-M... - List of resources.
Knowledge
Digital Preservation
Description:
This site is hosted by the US Library of Congress and presents information about the National Digital Information Infrastructure and Preservation Program (NDIIPP) and its initiatives.
- digitalpreservation.gov - Home page.
- youtube.com/@loc - Videos channel.
Communities
Digital Preservation Coalition
Description:
The Digital Preservation Coalition (DPC) is a charity building a welcoming and inclusive global community, working together to bring about a sustainable future for our digital assets. It was established in 2002 as a collaboration between a number of agencies operating in the UK and Ireland.
- dpconline.org - Home page.
- youtube.com/@digitalpreservationcoa... - Videos channel.
Tools
DOSBox
Description:
DOSBox is a free and open-source emulator which runs software for MS-DOS applications and games on modern PCs, supporting thousands of programs.
- dosbox.com - Home page.
- dosbox.com/comp_list.php?letter=a - Software compatibility list.
- sourceforge.net/projects/dosbox - Download site.
- freedos.org - FreeDOS project.
Services
European Alternatives
Description:
European alternatives for digital products. This site helps you find European alternatives for digital service and products, like cloud services and SaaS products. These can be useful should you want to set up your own web site, email, or other cloud services but want to avoid big tech companies.
- european-alternatives.eu - Index of European alternatives.
Services
Filecoin
Description:
Filecoin is a peer-to-peer network that enables reliable, decentralized file storage through built-in economic incentives and cryptographic proofs. Users pay storage providers—computers that store and continuously prove file integrity—to securely store their files over time. Anyone can join Filecoin as a user seeking storage or as a provider offering storage services. Storage availability and pricing aren't controlled by any single entity; instead, Filecoin fosters an open market for file storage and retrieval accessible to all.
- filecoin.io - Filecoin web site.
- docs.filecoin.io/smart-contracts/fu... - Developers documentation.
- youtube.com/@FilecoinProject - Videos channel.
Tools
Gallery-DL
Description:
Gallery-DL is a program to download image galleries and collections from several image hosting sites, similar to how yt-dlp can download videos.
- github.com/mikf/gallery-dl - Gallery-DL source.
Services
Git-annex
Description:
Git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.
- git-annex.branchable.com - Git-annex web page.
Communities
International Council on Archives
Description:
The International Council on Archives (ICA) promotes the efficient and effective management and use of records, archives and data in all their formats and their preservation as the cultural and evidentiary heritage of humanity.
- ica.org - Home page.
- youtube.com/@ICArchives - Videos channel.
Communities
International Internet Preservation Consortium
Description:
The International Internet Preservation Consortium (IIPC) identifies and develops best practices for selecting, harvesting, collecting, preserving and providing access to Internet content.
- netpreserve.org - Home page.
- youtube.com/@iipc8855 - Videos channel.
Tools
Internet Archive API
Description:
The Internet Archive is one of the largest online archival source, and as such many data hoarders need to deal with its content programmatically. They offer a Python module allowing you to script and automate commands using their public API.
- archive.org/developers/internetarch... - Internet Archive developers documentation.
Tools
Interoperable Europe
Description:
The Interoperable Europe Portal is the European Union's platform for promoting and supporting interoperability, collaboration, and knowledge sharing across public administrations, businesses, and citizens. It acts as a one-stop shop for discovering, sharing, and reusing IT solutions and good practices.
- interoperable-europe.ec.europa.eu - Main web site.
Tools
IPFS
Description:
The InterPlanetary File System (IPFS) is a protocol, hypermedia and file sharing peer-to-peer network for storing and sharing data in a distributed hash table. This content delivery network is built around the innovation of content addressing: store, retrieve, and locate data based on the fingerprint of its actual content rather than its name or location.
- ipfs.tech - IPFS home page.
- ecosystem.ipfs.tech - List of IPFS applications.
- youtube.com/@IPFSbot - Videos channel.
Services
Knowledge Commons
Description:
Knowledge Commons is an open, adaptable collection of tools that support the human work of education and research and make that work more visible and impactful. Projects of the Commons are funded by the National Science Foundation and the National Endowment for the Humanities.
- hcommons.org - Home page.
- hcommons.org/sites - List of sites.
- works.hcommons.org - Featured collections.
- dahd.hcommons.org - Digital art history directory.
- youtube.com/@Knowledge_Commons - Videos channel.
Tools
Libre Self-hosted
Description:
This is a curated list of free (libre) self-hosted projects.
- libreselfhosted.com - List of software.
Tools
Memento Protocol
Description:
Memento is a project aimed at making Web-archived content more readily discoverable and accessible to the public. It's a protocol that allows clients to find archived web content at specific timestamps.
- cdlib.org/cdlinfo/2010/02/04/web-ar... - Initial project description.
- datatracker.ietf.org/doc/html/rfc70... - Technical paper of the protocol.
- timetravel.mementoweb.org - Web portal to find archived web content.
- github.com/oduwsdl/MemGator - A Memento Aggregator CLI and Server in Go.
- groups.google.com/g/memento-dev - Memento development group.
Tools
Metadata Editor
Description:
The World Bank Metadata Editor is an open-source web-based application designed to assist data curators in documenting data of various types according to specialized metadata standards. It supports many types including DDI CodeBook 2.5, Dublin Core, ISO 19139, IPTC, etc.
- github.com/worldbank/metadata-edito... - Source code.
Services
ODCrawler
Description:
A search engine for open directories. Find millions of publicly available files.
- odcrawler.xyz - Search engine.
Services
Perma
Description:
Websites change, go away, and get taken down. When linked citations lead to broken, blank, altered, or even malicious pages, that’s called link rot. Perma.cc helps scholars, journals, courts, and others create permanent records of the web sources they cite. The site is developed and maintained by the Harvard Library Innovation Lab at the Harvard Law School Library and administered by a consortium of libraries, with each library assisting its local users.
- perma.cc - Perma web site.
Services
Permanent
Description:
The Permanent Legacy Foundation is a non-profit organization offering cloud storage services at a low cost for building a digital archive for families, organizations and historians. Their mission is to preserve and provide perpetual access to the digital legacy of all people for the historical and educational benefit of future generations.
- permanent.org - Home page.
- youtube.com/@permanentlegacyfoundat... - Videos channel.
Communities
Description:
Reddit is an American social news aggregation, content rating, and forum social network. There are several useful subreddits around data hoarding, self hosting and archivals. If you have questions about these subjects or just want to chat, those communities are very active.
- reddit.com/r/DataHoarder - r/DataHoarder
- reddit.com/r/selfhosted - r/selfhosted
- reddit.com/r/opendirectories - r/opendirectories
- reddit.com/r/usenet - r/usenet
Communities
Safeguarding Research
Description:
Safeguarding Research is a group of individuals organizing to safeguard as much publicly available research, GLAM-collections, etc. as possible.
- forum.safeguar.de - The discourse group.
Tools
Servarr
Description:
Servarr includes Lidarr, Prowlarr, Radarr, Readarr, Sonarr, and Whisparr. Collectively they are referred to as "*Arr", "*Arrs", "Starr", or "Starrs". They are designed to automatically grab, sort, organize, and monitor your Music, Movie, E-Book, or TV Show collections for Lidarr, Radarr, Readarr, Sonarr, and Whisparr; and to manage your indexers and keep them in sync with the aforementioned apps.
- wiki.servarr.com - The Servarr wiki
- github.com/Servarr/Wiki - The Servarr source code
- github.com/Ravencentric/awesome-arr - List of ~arr apps.
Knowledge
Sustainable Heritage Network
Description:
The Sustainable Heritage Network (SHN) is an answer to the pressing need for comprehensive workshops, online tutorials, and web resources dedicated to the lifecycle of digital stewardship. The SHN is a collaborative project that complements the work of Indigenous peoples globally to preserve, share, and manage cultural heritage and knowledge.
- sustainableheritagenetwork.org - Home page.
Communities
Video Game History Foundation
Description:
The Video Game History Foundation is one of several communities dedicated to the preservation of video games, both in physical and digital form. They provide a community forum and resources for gaming enthusiasts.
- gamehistory.org - Video Game History Foundation home page.
- hitsave.org - Hit Save! home page.
- gamepres.org/en - Game Preservation Society home page.
- stopkillinggames.com - Stop Killing Games.
- youtube.com/@GameHistoryOrg - Videos channel.
Tools
YT-DLP
Description:
Yt-dlp is a feature-rich command-line audio/video downloader with support for thousands of sites. The project is a fork of youtube-dl based on the now inactive youtube-dlc. It's currently the most popular way to download videos from YouTube and many other sites, allowing you to download a single video, a full playlist or a complete channel with a single command.
- github.com/yt-dlp/yt-dlp - YT-DLP source.
- ostechnix.com/yt-dlp-tutorial - Complete YT-DLP tutorial.
- github.com/graham-walker/youtube-dl... - Web based interface for YT-DLP.
- stacher.io - Popular graphical front end.
- preservetube.com - Web app to preserve videos.