This is a list of data hoarding resources if you want to get started and help archival teams, or simply backup web content for your own personal use.
Tools
ArchiveTeam Warrior
Description:
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions and done their best to save history before it's lost forever. They provide the ArchiveTeam Warrior, a virtual archiving appliance to help with the ArchiveTeam archiving efforts, along with other tools.
- archiveteam.org - The ArchiveTeam wiki.
- warrior.archiveteam.org - The ArchiveTeam Warrior virtual archiving appliance.
- Deathwatch - A list of dead or dying web sites that should be archived.
- WikiTeam - WikiTeam software is a set of tools for archiving wikis.
- ArchiveBot - ArchiveBot is an IRC bot designed to automate the archival of smaller websites.
Knowledge
Automate the Boring Stuff
Description:
If you've ever spent hours renaming files or updating hundreds of spreadsheet cells, you know how tedious tasks like these can be. But what if you could have your computer do them for you? In Automate the Boring Stuff with Python, you'll learn how to use Python to write programs that do in minutes what would take you hours to do by hand, no prior programming experience required.
- automatetheboringstuff.com - Book content.
Tools
Awesome Selfhosted
Description:
Self-hosting is the practice of hosting and managing applications on your own server(s) instead of consuming from SaaSS providers. This is a list of Free Software network services and web applications which can be hosted on your own server(s). Non-Free software is listed on the Non-Free page.
- awesome-selfhosted.net - List of self-hosted software.
- github.com - Source repository.
Services
European Alternatives
Description:
European alternatives for digital products. This site helps you find European alternatives for digital service and products, like cloud services and SaaS products. These can be useful should you want to set up your own web site, email, or other cloud services but want to avoid big tech companies.
- european-alternatives.eu - Index of European alternatives.
Services
Filecoin
Description:
Filecoin is a peer-to-peer network that enables reliable, decentralized file storage through built-in economic incentives and cryptographic proofs. Users pay storage providers—computers that store and continuously prove file integrity—to securely store their files over time. Anyone can join Filecoin as a user seeking storage or as a provider offering storage services. Storage availability and pricing aren't controlled by any single entity; instead, Filecoin fosters an open market for file storage and retrieval accessible to all.
- filecoin.io - Filecoin web site.
- docs.filecoin.io - Developers documentation.
Services
Git-annex
Description:
Git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.
- git-annex.branchable.com - Git-annex web page.
Tools
Internet Archive API
Description:
The Internet Archive is one of the largest online archival source, and as such many data hoarders need to deal with its content programmatically. They offer a Python module allowing you to script and automate commands using their public API.
- archive.org - Internet Archive developers documentation.
Tools
IPFS
Description:
The InterPlanetary File System (IPFS) is a protocol, hypermedia and file sharing peer-to-peer network for storing and sharing data in a distributed hash table. This content delivery network is built around the innovation of content addressing: store, retrieve, and locate data based on the fingerprint of its actual content rather than its name or location.
- ipfs.tech - IPFS home page.
- ecosystem.ipfs.tech - List of IPFS applications.
Tools
Libre Self-hosted
Description:
This is a curated list of free (libre) self-hosted projects.
- libreselfhosted - List of software.
Services
ODCrawler
Description:
A search engine for open directories. Find millions of publicly available files.
- odcrawler.xyz - Search engine.
Services
Perma
Description:
Websites change, go away, and get taken down. When linked citations lead to broken, blank, altered, or even malicious pages, that’s called link rot. Perma.cc helps scholars, journals, courts, and others create permanent records of the web sources they cite. The site is developed and maintained by the Harvard Library Innovation Lab at the Harvard Law School Library and administered by a consortium of libraries, with each library assisting its local users.
- perma.cc - Perma web site.
Communities
Description:
Reddit is an American social news aggregation, content rating, and forum social network. There are several useful subreddits around data hoarding, self hosting and archivals. If you have questions about these subjects or just want to chat, those communities are very active.
- r/DataHoarder - This is a sub that aims at bringing data hoarders together to share their passion with like minded people.
- r/selfhosted - A place to share, discuss, discover, assist with, gain assistance for, and critique self-hosted alternatives to popular services.
- r/opendirectories - Links to unprotected directories of pics, vids, music, software and otherwise interesting files.
- r/usenet - A community dedicated to discussing all things Usenet.
Communities
Safeguarding Research
Description:
Safeguarding Research is a group of individuals organizing to safeguard as much publicly available research, GLAM-collections, etc. pp. as possible.
- safeguarding-research - The discourse group.
Tools
Servarr
Description:
Servarr includes Lidarr, Prowlarr, Radarr, Readarr, Sonarr, and Whisparr. Collectively they are referred to as "*Arr", "*Arrs", "Starr", or "Starrs". They are designed to automatically grab, sort, organize, and monitor your Music, Movie, E-Book, or TV Show collections for Lidarr, Radarr, Readarr, Sonarr, and Whisparr; and to manage your indexers and keep them in sync with the aforementioned apps for Prowlarr
- wiki.servarr.com - The Servarr wiki
- github.com - The Servarr source code
Tools
YT-DLP
Description:
Yt-dlp is a feature-rich command-line audio/video downloader with support for thousands of sites. The project is a fork of youtube-dl based on the now inactive youtube-dlc. It's currently the most popular way to download videos from YouTube and many other sites, allowing you to download a single video, a full playlist or a complete channel with a single command.
- github.com - YT-DLP source.
- ostechnix.com - Complete YT-DLP tutorial.