This is a list of data hoarding resources if you want to get started and help archival teams, or simply backup web content for your own personal use.
Filters: Tools Services Communities Knowledge
Tools
ArchiveBox
Description:
ArchiveBox is an open source tool that lets organizations and individuals archive public or private web content while retaining control over their data. It can be used to save copies of bookmarks, preserve evidence for legal cases, backup photos from FB/Insta/Flickr or media from YT/Soundcloud/etc., save research papers, and more.
- archivebox.io - Archival tools.
Tools
ArchiveTeam Warrior
Description:
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions and done their best to save history before it's lost forever. They provide the ArchiveTeam Warrior, a virtual archiving appliance to help with the ArchiveTeam archiving efforts, along with other tools.
- archiveteam.org - The ArchiveTeam wiki.
- archiveteam.org - The ArchiveTeam Warrior virtual archiving appliance.
- archiveteam.org - A list of dead or dying web sites that should be archived.
- archiveteam.org - WikiTeam software is a set of tools for archiving wikis.
- archiveteam.org - ArchiveBot is an IRC bot designed to automate the archival of smaller websites.
Knowledge
Automate the Boring Stuff
Description:
If you've ever spent hours renaming files or updating hundreds of spreadsheet cells, you know how tedious tasks like these can be. But what if you could have your computer do them for you? In Automate the Boring Stuff with Python, you'll learn how to use Python to write programs that do in minutes what would take you hours to do by hand, no prior programming experience required.
- automatetheboringstuff.com - Book content.
Tools
Awesome AI
Description:
Awesome AI is a curated list of awesome AI tools, frameworks, api, software and resources related to machine learning.
- github.com - List of tools 1.
- github.com - List of tools 2.
- github.com - List of tools 3.
Tools
Awesome Datahoarding
Description:
These tools are aimed at those wishing to get started with data hoarding. The list includes applications that you can run locally to gather data, parse data and index it.
- github.com - List of tools.
Tools
Awesome Selfhosted
Description:
Self-hosting is the practice of hosting and managing applications on your own server(s) instead of consuming from SaaSS providers. This is a list of Free Software network services and web applications which can be hosted on your own server(s). Non-Free software is listed on the Non-Free page.
- awesome-selfhosted.net - List of self-hosted software.
- github.com - Source repository.
- github.com - Docker focused list.
- selfh.st - Alternative list.
Services
Canadian Technology Resources
Description:
Canadian alternatives for digital products. This site helps you find Canadian alternatives for digital service and products, like cloud services and SaaS products. These can be useful should you want to set up your own web site, email, or other cloud services but want to avoid big tech companies.
- canadian-tech.ca - Index of Canadian alternatives.
Knowledge
CensorTrace
Description:
Following the 2025 U.S. presidential inauguration, this automated tool monitors changes to major government websites by identifying and tracking removed pages, using publicly available data from the Internet Archive.
- censortrace.org - Home page.
Knowledge
Cybersecurity Mastery Roadmap
Description:
A comprehensive, step-by-step guide to mastering cybersecurity from beginner to expert level with curated resources, tools, and career guidance.
- github.com - List of resources.
Tools
DOSBox
Description:
DOSBox is a free and open-source emulator which runs software for MS-DOS applications and games on modern PCs, supporting thousands of programs.
- dosbox.com - Home page.
- dosbox.com - Software compatibility list.
- sourceforge.net - Download site.
Services
European Alternatives
Description:
European alternatives for digital products. This site helps you find European alternatives for digital service and products, like cloud services and SaaS products. These can be useful should you want to set up your own web site, email, or other cloud services but want to avoid big tech companies.
- european-alternatives.eu - Index of European alternatives.
Services
Filecoin
Description:
Filecoin is a peer-to-peer network that enables reliable, decentralized file storage through built-in economic incentives and cryptographic proofs. Users pay storage providers—computers that store and continuously prove file integrity—to securely store their files over time. Anyone can join Filecoin as a user seeking storage or as a provider offering storage services. Storage availability and pricing aren't controlled by any single entity; instead, Filecoin fosters an open market for file storage and retrieval accessible to all.
- filecoin.io - Filecoin web site.
- filecoin.io - Developers documentation.
Tools
Gallery-DL
Description:
Gallery-DL is a program to download image galleries and collections from several image hosting sites, similar to how yt-dlp can download videos.
- github.com - Gallery-DL source.
Services
Git-annex
Description:
Git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.
- branchable.com - Git-annex web page.
Tools
Internet Archive API
Description:
The Internet Archive is one of the largest online archival source, and as such many data hoarders need to deal with its content programmatically. They offer a Python module allowing you to script and automate commands using their public API.
- archive.org - Internet Archive developers documentation.
Tools
Interoperable Europe
Description:
The Interoperable Europe Portal is the European Union's platform for promoting and supporting interoperability, collaboration, and knowledge sharing across public administrations, businesses, and citizens. It acts as a one-stop shop for discovering, sharing, and reusing IT solutions and good practices.
- europa.eu - Main web site.
Tools
IPFS
Description:
The InterPlanetary File System (IPFS) is a protocol, hypermedia and file sharing peer-to-peer network for storing and sharing data in a distributed hash table. This content delivery network is built around the innovation of content addressing: store, retrieve, and locate data based on the fingerprint of its actual content rather than its name or location.
Tools
Libre Self-hosted
Description:
This is a curated list of free (libre) self-hosted projects.
- libreselfhosted.com - List of software.
Tools
Metadata Editor
Description:
The World Bank Metadata Editor is an open-source web-based application designed to assist data curators in documenting data of various types according to specialized metadata standards. It supports many types including DDI CodeBook 2.5, Dublin Core, ISO 19139, IPTC, etc.
- github.com - Source code.
Services
ODCrawler
Description:
A search engine for open directories. Find millions of publicly available files.
- odcrawler.xyz - Search engine.
Services
Perma
Description:
Websites change, go away, and get taken down. When linked citations lead to broken, blank, altered, or even malicious pages, that’s called link rot. Perma.cc helps scholars, journals, courts, and others create permanent records of the web sources they cite. The site is developed and maintained by the Harvard Library Innovation Lab at the Harvard Law School Library and administered by a consortium of libraries, with each library assisting its local users.
- perma.cc - Perma web site.
Communities
Description:
Reddit is an American social news aggregation, content rating, and forum social network. There are several useful subreddits around data hoarding, self hosting and archivals. If you have questions about these subjects or just want to chat, those communities are very active.
- reddit.com - r/DataHoarder
- reddit.com - r/selfhosted
- reddit.com - r/opendirectories
- reddit.com - r/usenet
Communities
Safeguarding Research
Description:
Safeguarding Research is a group of individuals organizing to safeguard as much publicly available research, GLAM-collections, etc. as possible.
- safeguar.de - The discourse group.
Tools
Servarr
Description:
Servarr includes Lidarr, Prowlarr, Radarr, Readarr, Sonarr, and Whisparr. Collectively they are referred to as "*Arr", "*Arrs", "Starr", or "Starrs". They are designed to automatically grab, sort, organize, and monitor your Music, Movie, E-Book, or TV Show collections for Lidarr, Radarr, Readarr, Sonarr, and Whisparr; and to manage your indexers and keep them in sync with the aforementioned apps.
- servarr.com - The Servarr wiki
- github.com - The Servarr source code
Communities
Video Game History Foundation
Description:
The Video Game History Foundation is one of several communities dedicated to the preservation of video games, both in physical and digital form. They provide a community forum and resources for gaming enthusiasts.
- gamehistory.org - Video Game History Foundation home page
- hitsave.org - Hit Save! home page
- gamepres.org - Game Preservation Society home page
Tools
YT-DLP
Description:
Yt-dlp is a feature-rich command-line audio/video downloader with support for thousands of sites. The project is a fork of youtube-dl based on the now inactive youtube-dlc. It's currently the most popular way to download videos from YouTube and many other sites, allowing you to download a single video, a full playlist or a complete channel with a single command.
- github.com - YT-DLP source.
- ostechnix.com - Complete YT-DLP tutorial.
- github.com - Web based interface for YT-DLP.
- stacher.io - Popular graphical front end.