Data Hoarding - Resources

This is a list of data hoarding resources if you want to get started and help archival teams, or simply backup web content for your own personal use.

Tools

ArchiveTeam Warrior

Description:
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions and done their best to save history before it's lost forever. They provide the ArchiveTeam Warrior, a virtual archiving appliance to help with the ArchiveTeam archiving efforts, along with other tools.

Links:
  • archiveteam.org - The ArchiveTeam wiki.
  • warrior.archiveteam.org - The ArchiveTeam Warrior virtual archiving appliance.
  • Deathwatch - A list of dead or dying web sites that should be archived.
  • WikiTeam - WikiTeam software is a set of tools for archiving wikis.
  • ArchiveBot - ArchiveBot is an IRC bot designed to automate the archival of smaller websites.

Knowledge

Automate the Boring Stuff

Description:
If you've ever spent hours renaming files or updating hundreds of spreadsheet cells, you know how tedious tasks like these can be. But what if you could have your computer do them for you? In Automate the Boring Stuff with Python, you'll learn how to use Python to write programs that do in minutes what would take you hours to do by hand, no prior programming experience required.

Links:

Tools

Awesome Selfhosted

Description:
Self-hosting is the practice of hosting and managing applications on your own server(s) instead of consuming from SaaSS providers. This is a list of Free Software network services and web applications which can be hosted on your own server(s). Non-Free software is listed on the Non-Free page.

Links:

Services

European Alternatives

Description:
European alternatives for digital products. This site helps you find European alternatives for digital service and products, like cloud services and SaaS products. These can be useful should you want to set up your own web site, email, or other cloud services but want to avoid big tech companies.

Links:

Services

Filecoin

Description:
Filecoin is a peer-to-peer network that enables reliable, decentralized file storage through built-in economic incentives and cryptographic proofs. Users pay storage providers—computers that store and continuously prove file integrity—to securely store their files over time. Anyone can join Filecoin as a user seeking storage or as a provider offering storage services. Storage availability and pricing aren't controlled by any single entity; instead, Filecoin fosters an open market for file storage and retrieval accessible to all.

Links:

Services

Git-annex

Description:
Git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.

Links:

Tools

Internet Archive API

Description:
The Internet Archive is one of the largest online archival source, and as such many data hoarders need to deal with its content programmatically. They offer a Python module allowing you to script and automate commands using their public API.

Links:
  • archive.org - Internet Archive developers documentation.

Tools

IPFS

Description:
The InterPlanetary File System (IPFS) is a protocol, hypermedia and file sharing peer-to-peer network for storing and sharing data in a distributed hash table. This content delivery network is built around the innovation of content addressing: store, retrieve, and locate data based on the fingerprint of its actual content rather than its name or location.

Links:

Tools

Libre Self-hosted

Description:
This is a curated list of free (libre) self-hosted projects.

Links:

Services

ODCrawler

Description:
A search engine for open directories. Find millions of publicly available files.

Links:

Services

Perma

Description:
Websites change, go away, and get taken down. When linked citations lead to broken, blank, altered, or even malicious pages, that’s called link rot. Perma.cc helps scholars, journals, courts, and others create permanent records of the web sources they cite. The site is developed and maintained by the Harvard Library Innovation Lab at the Harvard Law School Library and administered by a consortium of libraries, with each library assisting its local users.

Links:

Communities

Reddit

Description:
Reddit is an American social news aggregation, content rating, and forum social network. There are several useful subreddits around data hoarding, self hosting and archivals. If you have questions about these subjects or just want to chat, those communities are very active.

Links:
  • r/DataHoarder - This is a sub that aims at bringing data hoarders together to share their passion with like minded people.
  • r/selfhosted - A place to share, discuss, discover, assist with, gain assistance for, and critique self-hosted alternatives to popular services.
  • r/opendirectories - Links to unprotected directories of pics, vids, music, software and otherwise interesting files.
  • r/usenet - A community dedicated to discussing all things Usenet.

Communities

Safeguarding Research

Description:
Safeguarding Research is a group of individuals organizing to safeguard as much publicly available research, GLAM-collections, etc. pp. as possible.

Links:

Tools

Servarr

Description:
Servarr includes Lidarr, Prowlarr, Radarr, Readarr, Sonarr, and Whisparr. Collectively they are referred to as "*Arr", "*Arrs", "Starr", or "Starrs". They are designed to automatically grab, sort, organize, and monitor your Music, Movie, E-Book, or TV Show collections for Lidarr, Radarr, Readarr, Sonarr, and Whisparr; and to manage your indexers and keep them in sync with the aforementioned apps for Prowlarr

Links:

Tools

YT-DLP

Description:
Yt-dlp is a feature-rich command-line audio/video downloader with support for thousands of sites. The project is a fork of youtube-dl based on the now inactive youtube-dlc. It's currently the most popular way to download videos from YouTube and many other sites, allowing you to download a single video, a full playlist or a complete channel with a single command.

Links: