On this page you will find links to data archives from various countries. These archives contain data that was gathered and saved for the public good.
Filters: Science World Government Technology Climate Health Law Gaming History Books
Health
101 Cookbooks
Description:
101 Cookbooks is a food blog from California that archived thousands of healthy recipes, made available for free.
- 101cookbooks.com - List of recipes.
Science
Academic Torrents
Description:
Making over 127.15TB of research data available, this site provides a distributed system for sharing enormous datasets for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.
- academictorrents.com - List of torrents.
Books Science History
Anna's Archive
Description:
Described as the largest truly open library in human history. This site mirrors Sci-Hub and LibGen. They also scrape and open-source Z-Lib, DuXiu, and more. Currently hosting over 42 million books, 98 million papers, preserved forever. All their code and data are completely open source.
- annas-archive.org - Main web page.
- annas-archive.se - Mirror site.
- annas-archive.li - Mirror site.
Science
Archaology Data Service
Description:
ADS is the leading accredited repository in the UK for archaeology and historic environment data, with over 25 years of experience supporting research, learning and teaching with free, high quality and dependable digital resources.
World
Archive.today
Description:
Archive.today is a time capsule for web pages! It takes a 'snapshot' of a webpage that will always be online even if the original page disappears. It saves a text and a graphical copy of the page for better accuracy and provides a short and reliable link to an unalterable record of any web page.
- archive.today - The web site archives.
Science
arXiv
Description:
arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. arXiv is a community of volunteer authors, readers, moderators, advisory board members, supporting members, donors, and third-party collaborators that are supported by the staff at Cornell University.
- arxiv.org - Browse articles.
World Technology
AWS Data Exchange
Description:
AWS Data Exchange makes it easy to find datasets made publicly available through AWS services. Browse available data and learn how to register your own datasets.
- amazon.com - List of all Data Exchange applications.
- amazon.com - A corpus of web crawl data composed of over 50 billion web pages.
- amazon.com - Registry of Earth related datasets.
- archives.gov - National Archives Catalog on the AWS Registry of Open Data.
Government Health Climate Science
CAFE
Description:
The Convene-Accelerate-Foster-Expand (CAFE) site is an open collection designed to support and enhance global research initiatives focused on understanding and mitigating the health impacts of climate change. It's hosted by Harvard University, Boston University and contains hundreds of datasets, mostly from US Gov web sites.
- harvard.edu - Index of CAFE datasets.
Government Law
Caselaw Access Project
Description:
The Caselaw Access Project (CAP) scanned the entirety of the Harvard Law School Library's physical collection of American case law and made it machine-readable in a consistent format available online. To facilitate that agreement, the Library Innovation Lab (LIL) maintained the case.law website as the primary access point for the data. CAP includes all official, book-published state and federal United States case law through 2020, every volume or case designated as an official report of decisions by a court within the United States.
- case.law - List of law volumes per state.
History
Chartlann Mhileata Military Archives
Description:
The Military Archives offers a diverse range of collections documenting Ireland's military history, including pensions and historical documents.
- militaryarchives.ie - Browse the collections.
Technology
CivitAI
Description:
CivitAI is an online platform and marketplace for generative AI content, primarily focused on AI-generated images and models.
- civitai.com - Home page.
- civitaiarchive.com - NSFW models archive site.
- diffusionarc.com - Alternative database of images models.
- civitasbay.org - List of CivitAI torrents.
- miyukiai.com - Backup site.
Government Climate
Climate Mirror Project
Description:
The Climate Mirror Project is trying to mirror and safely archive US Gov websites and datasets related to climate, climate change, and global warming. It provides mirrors of official NOAA and other government web sites.
- daknob.net - List of datasets and torrents.
World
Common Crawl
Description:
Common Crawl maintains a free, open repository of web crawl data that can be used by anyone. They believe that everyone should have the opportunity to indulge their curiosities, analyze the world, and pursue brilliant ideas. The latest crawl contains over 2.74 billion web pages.
- commoncrawl.org - Home page.
Government Technology
Common Vulnerabilities and Exposures (CVE)
Description:
The CVE program identifies, defines, and catalogs publicly disclosed cybersecurity vulnerabilities. There are currently over 274,000 CVE Records accessible through the program. While it depends on US Government funding, there are several alternative databases also available.
- cve.org - Main CVE website.
- github.com - Official list of all CVEs.
- kevintel.com - Feed of KEVs from multiple sources.
- circl.lu - Main vulnerability lookup site.
- vulnerability-lookup.org - Vulnerability lookup software.
- gcve.eu - Decentralized CVE alternative.
- europa.eu - EU vulnerability database.
- gc.ca - Canadian vulnerability database.
- seclists.org - Archive of popular cybersecurity mailing lists.
- infosec.exchange - Mastodon community surrounding InfoSec.
Gaming
Console Mods
Description:
This wiki contains information on game console modding and game dumping tools.
- consolemods.org - Console mods wiki
- redump.org - Mod dumping wiki
World
Cross-National Time-Series Data
Description:
CNTS provides more than 200 years of annual data from 1815 onward, including 196 demographic, political, legislative, economic and social science variables.
- cntsdata.com - List of databanks.
Government
Data Liberation Project
Description:
The Data Liberation Project is an initiative to identify, obtain, reformat, clean, document, publish, and disseminate US Gov datasets of public interest.
- data-liberation-project.org - List of published datasets.
Government
Data Lumos
Description:
DataLumos is an ICPSR archive for valuable government data resources. ICPSR has a long commitment to safekeeping and disseminating US government and other social science data. DataLumos accepts deposits of public data resources from the community and recommendations of public data resources that ICPSR itself might add to DataLumos. The site is hosted by the University of Michigan.
- datalumos.org - List of datasets.
Government
Data Rescue Project
Description:
The Data Rescue Project is a coordinated effort among a group of data organizations focusing on rescue-related efforts and data access points for public US governmental data that are currently at risk. It provides resources, collections of datasets and news updates.
- datarescueproject.org - Home page.
- datarescueproject.org - List of collections.
World History Books
Digital Public Library of America
Description:
The DPLA highlights millions of items from libraries, archives and museums across the United States, organized into easy-to-navigate topics through a single catalog.
Technology
Drivers Collection
Description:
Drivers Collection is one of largest free web library of device drivers for computer hardware. It contains over 6 million drivers from various hardware vendors.
- driverscollection.com - Drivers home page.
- archive.org - Torrent links for driver packs.
Science
Dryad
Description:
Dryad is an open data publishing platform and a community committed to the open availability and routine re-use of all research data. Their multi-stakeholder community of academic and research institutions, research funders, scholarly societies and publishers is committed to leading in best practices for open data sharing and reuse.
- datadryad.org - Dryad datasets.
Government
End-of-Term web archive
Description:
The End of Term Web Archive captures and saves U.S. Government websites at the end of presidential administrations. The EOT has thus far preserved websites from administration changes in 2008, 2012, 2016, and 2020. The End of Term Web Archive contains federal government websites (.gov, .mil, etc) in the Legislative, Executive, or Judicial branches of the government.
- eotarchive.org - EOT web archive.
Government
European Data
Description:
European Data is the official portal for European data, collected from governments from around the EU, made available on this central portal.
- europa.eu - Home page.
Technology
Files dot Dog
Description:
This site contains a large collection of Microsoft Developer Network (MSDN) files, along with random other files.
- files.dog - Files archive.
Science
Free GIS Data
Description:
This page contains a categorized list of links to over 500 sites providing freely available geographic datasets, all ready for loading into a Geographic Information System (GIS).
- rtwilson.com - Archives listing.
Gaming
Games Database
Description:
Games Database is one of the biggest source for manuals, videos, music and artwork. The site provides over 32k videos, 8k music files, 14k manuals, 5k game adverts, 822 TV commercials for 126 systems.
- gamesdatabase.org - Main web site.
Science
Global Biodiversity Information Facility
Description:
GBIF (the Global Biodiversity Information Facility) is an international network and data infrastructure funded by the world's governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth. It provides access to over 110,000 datasets.
- gbif.org - Main web site.
Technology
Hugging Face
Description:
Hugging Face is the platform where the machine learning community collaborates on models, datasets, and applications. It contains the largest collection of open source AI models and focuses on machine learning tasks.
- huggingface.co - Main web page.
Technology
Ibiblio
Description:
Ibiblio (then called SunSITE) began mirroring open source software in 1992, and was one of only three such repositories available on the internet. Now almost 30 years later mirroring and open source software has evolved.
- ibiblio.org - Main web page.
- ibiblio.org - List of mirrored projects.
Government
ICPSR
Description:
ICPSR is research science data and resources on topics like social media, politics, economics, social sciences, government, GIS, & more. ICPSR is part of the Institute for Social Research at the University of Michigan.
Science
INSDC
Description:
The International Nucleotide Sequence Database Collaboration (INSDC) archives nucleotide sequence data, from raw to assembled and annotated sequences, from around the world.
- insdc.org - Link to archival sites.
Technology World
Internet Archive
Description:
The Internet Archive is an American non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including websites, software applications, music, audiovisual, and print materials.
- archive.org - The main archive web site.
- archive.org - The Wayback Machine.
- bibalex.org - Bibalex mirror of the Wayback Machine.
Science Health World
IPUMS
Description:
IPUMS provides census and survey data from around the world integrated across time and space. IPUMS integration and documentation makes it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community contexts. Data and services available free of charge.
- ipums.org - List of IPUMS datasets.
Technology World Science
Kaggle
Description:
Kaggle is one of the largest collection of datasets, mostly focusing on statistics, science, world affairs and technology. It contains 430K high-quality public datasets. Everything from avocado prices to video game sales.
- kaggle.com - List of Kaggle datasets.
Gaming
Keitai Game Preservation
Description:
This wiki is dedicated to cataloging games from Japanese Feature Phones (keitai), pre-Android/iPhone mobile devices released in Japan. (e.g. i-Mode game, i-Appli game, EZweb game, S!Appli game). They also provide information and support for preserving Japanese feature phone games.
- keitaiwiki.com - Keitai wiki
- youtube.com - Documentary on Lessons from Keitai Game Preservation
Technology World Science Books
Kiwix
Description:
3 billion people have no or little access to internet. This can be because of costs, lack of infrastructure, or outright censorship. Kiwix provides offline versions of popular web sites like Wikipedia, Wikibooks and Project Gutenberg.
Science Books
LibreTexts
Description:
LibreTexts is the adaptable, user-friendly open education resource platform that educators trust for creating, customizing, and sharing accessible, interactive textbooks, adaptive homework, and ancillary materials. We collaborate with individuals and organizations to champion open education initiatives, support institutional publishing programs, drive curriculum development projects, and more. The LibreText Commons hosts curated Open Educational Resources from all 16 libraries in the LibreVerse in one convenient location.
- libretexts.org - Project home page.
- libretexts.org - Index of textbooks.
Books
MangaDex
Description:
MangaDex is one of many websites dedicated to archiving scanned mangas and other Asian comic books. These sites provide thousands of titles to read for free, compiled by volunteers.
- mangadex.org - MangaDex web site.
- mangakatana.com - Manga Katana web site.
- bato.to - Bato web site.
- mangahere.cc - MangaHere web site.
- weebcentral.com - Weeb Central web site.
World
Mirror Service
Description:
The UK Mirror Service provides a collection of mirrors of FTP, web and rsync sites of interest to academic users. The service is provided by the University of Kent's School Of Computing.
- mirrorservice.org - List of mirrored sites.
Technology History
Museum of Obsolete Media
Description:
A unique online museum of physical media formats showcasing developments in audio, video, film and data storage, the Museum preserves the memory of those objects that held our memories, and every format listed in the Museum is represented by at least one example in the collection.
- obsoletemedia.org - Museum collection.
Gaming
My Abandonedware
Description:
On My abandonware you can download all the old video games from 1965 to 2012 for free. You can play Pacman, Arkanoid, Tetris, Galaxian, Alter Ego, or Blackthorne, Civilization, Sim City, Prince of Persia, Xenon 2, King's quest, Ultima, Kyrandia, The Incredible Machine, Another World, Test Drive, Flashback, Lemmings and more. For each game, they offer all related information included publication year, publisher, developer, size of the game, language, review of the game, instructions to play, the game manual and, of course, the game archive that you can download for free.
- myabandonware.com - Index of software.
World Government
National Archives
Description:
The National Archives is a common term to designate a government funded archival institution focused on cataloging and making available historically significant works from the country in question.
- archives.gov - US National Archives.
- canada.ca - Library and Archives Canada.
- gov.uk - UK National Archives.
- gov.au - National Archives of Australia.
- gouv.fr - Archives Nationales de France.
- go.jp - National Diet Library, Japan.
- nationaalarchief.nl - Nationaal Archief.
- rigsarkivet.dk - Rigsarkivet.
World
National Film Board of Canada
Description:
In addition to being a public producer and distributor of Canadian content, the National Film Board of Canada (NFB) is the caretaker of over 7,000 productions available for free for personal use.
- nfb.ca - Home page.
Government
National Security Archive
Description:
Founded in 1985 by journalists and scholars to check rising government secrecy, the National Security Archive combines a unique range of functions: investigative journalism center, research institute on international affairs, library and archive of declassified U.S. documents.
- gwu.edu - Home page.
Gaming
Nexus Mods
Description:
Nexus Mods is one of several gaming mods archives, hosting over 300,000 mods for over 3,500 PC games.
- nexusmods.com - Nexus Mods web site.
- moddb.com - Mod DB web site.
- curseforge.com - Curse Forge web site.
Gaming
Old Games
Description:
Old-Games.com provides 10,000+ old PC games free to download, along with screenshots and descriptions.
- old-games.com - Main web site.
Books
Open Library
Description:
Open Library is an initiative of the Internet Archive and provides access to thousands of books, out of print and otherwise. It provides an open, editable library catalog, building towards a web page for every book ever published.
- openlibrary.org - Open Library home page.
- openlibrary.org - Banned books collection.
Climate Science
OpenEI
Description:
The Open Energy Data Initiative (OEDI) enables research, collaboration, and transparency by providing open access to energy data and information. The OpenEI Data Lake is a centralized repository of datasets aggregated from the U.S. Department of Energy’s Programs, Offices, and National Laboratories. It provides links to over 4.19 PB of data.
- openei.org - Links to datasets.
Technology
OpenML
Description:
OpenML is an open platform for sharing datasets, algorithms, and experiments. It contains thousands of datasets and machine learning tasks running openly.
- openml.org - Home page.
World
OSINT Ukraine
Description:
This is a public repository of tools, resources and an archive of Telegram messages related to the war in Ukraine. Note that some of the media on the site are very graphic.
- osintukraine.com - OSINT Ukraine home page.
- osintukraine.com - Repository of Telegram messages.
World
Our World in Data
Description:
Our World in Data is a project of the Global Change Data Lab, a non-profit organization providing analysis from thousands of researchers around the world about poverty, disease, hunger, climate change, war, existential risks, and inequality.
- ourworldindata.org - Main page.
- ourworldindata.org - Data catalog.
Climate Science
Pangea
Description:
The information system PANGAEA is operated as an Open Access library aimed at archiving, publishing and distributing georeferenced data from earth system research. PANGAEA is open to any project, institution, or individual scientist to use or to archive and publish data.
- pangaea.de - List of datasets.
Books
Project Gutenberg
Description:
Project Gutenberg is a library of over 75,000 free eBooks. Everything from Project Gutenberg is gratis, libre, and completely without cost to readers. Michael Hart, founder of Project Gutenberg, invented eBooks in 1971 and his memory continues to inspire the creation of eBooks and related content today. The Project Gutenberg Literary Archive Foundation (PGLAF) is the non-profit corporation that oversees operation of the project.
- gutenberg.org - The main project web site.
- mirrorservice.org - Mirror site.
Climate Science Government
Public Environmental Data Project
Description:
The Public Environmental Data Project is committed to preserving and providing public access to federal environmental data. They are a volunteer coalition of several environmental, justice, and policy organizations, researchers across several universities, archivists, and students who rely on federal datasets and tools to support critical research, advocacy, policy, and litigation work. Several datasets are available on their site.
- screening-tools.com - The screening tools site.
Technology History
Radio Museum
Description:
The radio museum contains a vast library of data about radio devices. It contains over 350K radio models, 2.8M pictures including 1M schematics, and 79K tubes/semiconductors.
- radiomuseum.org - Collection of radio data.
Books Gaming
RetroMags
Description:
This site indexes and makes available for free download thousands of retro gaming magazines and strategy guides from 10 years ago and earlier.
- retromags.com - Home page.
Science Books
Sci-Hub
Description:
Sci-Hub started as a tool for providing quick access to articles from scientific journals - such articles are the main medium of communication of scientific knowledge today. Now Sci-Hub has grown a database of over 88 millions research articles and books freely accessible for anyone to read and download.
- sci-hub.st - Sci-Hub mirror site.
- sci-hub.se - Sci-Hub mirror site.
- sci-hub.ru - Sci-Hub mirror site.
Technology
Sigma AI
Description:
Sigma AI provides a list of open AI related datasets from various other sites.
- sigma.ai - List of datasets.
Technology
Software Heritage
Description:
The long term goal of the Software Heritage initiative is to collect all publicly available software in source code form together with its development history, replicate it massively to ensure its preservation, and share it with everyone who needs it. The Software Heritage archive is growing over time as they crawl new source code from software projects and development forges.
- softwareheritage.org - Source files archive.
Climate Science Government
Source COOP
Description:
Source Cooperative is a data publishing utility that allows trusted organizations and individuals to share data using standard HTTP methods. It contains large data collections and mirrors of various sites, mostly centered around science, government and climate.
- source.coop - List of projects.
- source.coop - Full mirror of data.gov.
- github.com - Scripts used to scrape data.gov.
Technology
TextFiles
Description:
TEXTFILES.COM has been online for nearly 25 years providing text files, focusing mostly on the years 1980-1995.
- textfiles.com - Collection of text files.
- textfiles.com - Collection of shareware files.
Gaming
The Cutting Room Floor
Description:
The Cutting Room Floor is a site dedicated to unearthing and researching unused and cut content from video games. From debug menus, to unused music, graphics, enemies, and levels.
- tcrf.net - Home page.
Technology
The Eye
Description:
The Eye is a very large archive of files of all types covering decades. It provides archives of various sub-reddits, Telegram channels, AI models, books, website crawls, 3D models, images and more.
- the-eye.eu - Home page.
- the-eye.eu - Files directory.
Gaming
The Old Computer
Description:
Home to the largest collection of roms and emulators anywhere on the web with over 500,000 ROMs and Emulators for every major computer, console, arcade machine, pinball table and mobile device. Box Scans, Manuals, Magazines and a 179,000+ strong user community.
- theoldcomputer.com - Home page.
Technology
The Unix Heritage
Description:
The Unix Heritage Society's aims include the preservation and maintenance of historical and non-mainstream UNIX systems; the further development of existing UNIX systems; and the continual fostering of the Unix community spirit. They host historical Unix distribution and packages available for download.
World
Uppsala Conflict Data Program
Description:
The Uppsala Conflic Data Program (UCDP) is the world's largest collection of wartime and organized violence data, covering over 40 years of conflicts, based at Uppsala University in Sweden and in collaboration with the Peace Research Institute in Oslo.
Climate
US Drought Monitor
Description:
The U.S. Drought Monitor provides climate maps weekly since 1999. It's produced jointly by the NDMC, NOAA and USDA.
- unl.edu - Maps archive.
Technology
Web Design Museum
Description:
The Web Design Museum exhibits thousands of screenshots and videos of old websites, mobile apps and software from 1990s to mid-00s.
- webdesignmuseum.org - Home page.
World
Wikipedia
Description:
Wikipedia is a free-content online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. It is the largest and most-read reference work in history. Wikipedia is hosted by the Wikimedia Foundation, a non-profit organization that also hosts a range of other projects.
- wikipedia.org - English Wikipedia home page.
- wikipedia.org - Wikipedia in other languages.
Gaming Technology
WinWorld
Description:
WinWorld is an online museum created in 2003 dedicated to the preservation and sharing of vintage, abandoned, and pre-release software. It offers information, media and downloads for a wide variety of computers and operating systems. Get classic operating systems, applications, games and betas for every platform from PC to Mac to Amiga, right here from the software library on WinWorld.
- winworldpc.com - The WinWorld library.
World Government
World Bank Open Data
Description:
The World Bank Open Data portal provides free and open access to global development data, mostly focusing on economic datasets.
- worldbank.org - Data portal.
Technology
Your.Org
Description:
Your.Org is a hosting company that provides hundreds of terrabytes of data for various sites. They also host a mirror of various open source software including Linux distributions, FreeBSD, Wikipedia database dumps, other websites such as Microsoft, Corel, IBM and much more.