This page covers frequently asked questions.
What is this site about?
DataHoarding.org is an index of resources and archives related to data hoarding, web archival and digital preservation. It was inspired by the recent purge of online information by government agencies, corporations and others, and aims to provide easier access to tools and information. The goal is not only to hoard data, but curate and index it as well.
This site is run as a non-profit, volunteer effort located in Montreal, Canada. On the technical side, it runs on a custom 3-node Proxmox cluster, with a QNAP 10 TB NAS appliance for storage and uses Cloudflare for content delivery. It runs in high-availability mode with a custom off-site backup solution. We serve around 500 unique visitors each day.

Technical deployment pipeline
The site has 2 main indexes:
- Tools and resources - This is a list of data hoarding resources if you want to get started and help archival teams, or simply backup web content for your own personal use.
- Web archives - On this page you will find links to data archives from various countries. These archives contain data that was gathered and saved for the public good.
New archives are added on a weekly basis. You can follow us on BlueSky for updates.
Why is digital perservation so important?
Data archival matters because our history, culture, governance and science increasingly exist in digital form. Without intervention, this data disappears. A study from Pew Research showed that 38% of web pages from 2013 had disappeared within 10 years. Over 50% of Wikipedia articles have references to sites that no longer exist. Physical media increasingly lives on obsolete media, from Zip disks to tapes. And governments are increasingly rewriting the past, removing datasets and defunding institutions focused on topics they disagree with. It's up to individuals and organizations to pick up the pace.
How do I get started with digital archiving?
If your group or organization is interested in data preservation, the first thing to do is run an assessment of your current situation. Two popular frameworks exist for this:
- NDSA Levels of Digital Preservation - The Levels of Digital Preservation is a resource to help digital preservation practitioners build or assess their digital preservation program. It covers 4 levels including: Know your content, Protect your content, Monitor your content and Sustain your content.
- DPC Rapid Assessment Model - The DPC's Rapid Assessment Model (DPC RAM) is a digital preservation maturity modelling tool that has been designed to enable rapid benchmarking of an organization's digital preservation capability and facilitate continuous improvement over time. It's a spreadsheet that you can use to identify goals, shortcomings and more.
Once your assessment is done, you can start prioritizing what you want to focus on, such as at-risk data, topics you care about, time sensitive data, and so on. You can create a triage list or a list of sources that you want to use. This should follow the policy you've set up in your assessment phase.
Once you start collecting data, make sure you include useful metadata such as: creation date, source, checksum, size, document type, etc. This will make your archive more usable in the future.
Storage and redundancy solutions should be adopted early on, including backups. You can use the resources page to find relevant software and services. Ongoing maintenance is crucial for the long term preservation of your data.
Finally, consider making your archives available online. This ensures accessibility and makes your work shine for everyone.

Stages of Data Preservation
What are the criteria for inclusion?
The archives listed on this site are curated using 2 criteria: the site must have a significant collection of items, and these items must be available to the public without having to jump through significant hoops (ie. requirement to have a local library card) or requiring a subscription fee. We also use custom scripts that ensure these sites are up and running, so all links should usually work. Note that while we strive to provide safe and accurate information, we can't guarantee the safety of these sites, so use your own discretion.
How can I contribute?
If you know any resources or achival sites, or if you have legal concerns about any existing data, contact us at contact@datahoarding.org. We do not run ads, accept sponsorships or donations, and are not looking for additional volunteers at this time.