On this page you will find links to data archives from various countries. These archives contain data that was gathered and saved for the public good.
Science
Academic Torrents
Description:
Making over 127.15TB of research data available, this site provides a distributed system for sharing enormous datasets for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.
- academictorrents.com - List of torrents.
Science
Anna's Archive
Description:
Described as the largest truly open library in human history. This site mirrors Sci-Hub and LibGen. They also scrape and open-source Z-Lib, DuXiu, and more. Currently hosting over 42 million books, 98 million papers, preserved forever. All their code and data are completely open source.
- annas-archive.org - Main web page.
- annas-archive.se - Mirror site.
- annas-archive.li - Mirror site.
World
Archive.today
Description:
Archive.today is a time capsule for web pages! It takes a 'snapshot' of a webpage that will always be online even if the original page disappears. It saves a text and a graphical copy of the page for better accuracy and provides a short and reliable link to an unalterable record of any web page.
- archive.today - The web sites archive.
Science
arXiv
Description:
arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. arXiv is a community of volunteer authors, readers, moderators, advisory board members, supporting members, donors, and third-party collaborators that are supported by the staff at Cornell University.
- arxiv.org - Browse articles.
Technology, World
AWS Data Exchange
Description:
AWS Data Exchange makes it easy to find datasets made publicly available through AWS services. Browse available data and learn how to register your own datasets.
- aws.amazon.com - List of all Data Exchange applications.
- Common Crawl - A corpus of web crawl data composed of over 50 billion web pages.
- Earth on AWS - Registry of Earth related datasets.
- archives.gov - National Archives Catalog on the AWS Registry of Open Data.
Government, Health, Climate, Science
CAFE
Description:
The Convene-Accelerate-Foster-Expand (CAFE) site is an open collection designed to support and enhance global research initiatives focused on understanding and mitigating the health impacts of climate change. It's hosted by Harvard University, Boston University and contains hundreds of datasets, mostly from US Gov web sites.
- dataverse.harvard.edu - Index of CAFE datasets.
Government, Law
Caselaw Access Project
Description:
The Caselaw Access Project (CAP) scanned the entirety of the Harvard Law School Library's physical collection of American case law and made it machine-readable in a consistent format available online. To facilitate that agreement, the Library Innovation Lab (LIL) maintained the case.law website as the primary access point for the data. CAP includes all official, book-published state and federal United States case law through 2020, every volume or case designated as an official report of decisions by a court within the United States.
- case.law - List of law volumes per state.
Government, Climate
Climate Mirror Project
Description:
The Climate Mirror Project is trying to mirror and safely archive US Gov websites and datasets related to climate, climate change, and global warming. It provides mirrors of official NOAA and other government web sites.
- climate.daknob.net - List of datasets and torrents.
Government
Data Liberation Project
Description:
The Data Liberation Project is an initiative to identify, obtain, reformat, clean, document, publish, and disseminate US Gov datasets of public interest.
- data-liberation-project.org - List of published datasets.
Government
Data Lumos
Description:
DataLumos is an ICPSR archive for valuable government data resources. ICPSR has a long commitment to safekeeping and disseminating US government and other social science data. DataLumos accepts deposits of public data resources from the community and recommendations of public data resources that ICPSR itself might add to DataLumos. The site is hosted by the University of Michigan.
- datalumos.org - List of datasets.
Science
Dryad
Description:
Dryad is an open data publishing platform and a community committed to the open availability and routine re-use of all research data. Their multi-stakeholder community of academic and research institutions, research funders, scholarly societies and publishers is committed to leading in best practices for open data sharing and reuse.
- datadryad.org - Dryad datasets.
Government
End-of-Term web archive
Description:
The End of Term Web Archive captures and saves U.S. Government websites at the end of presidential administrations. The EOT has thus far preserved websites from administration changes in 2008, 2012, 2016, and 2020. The End of Term Web Archive contains federal government websites (.gov, .mil, etc) in the Legislative, Executive, or Judicial branches of the government.
- eotarchive.org - EOT web archive.
Technology
Files dot Dog
Description:
This site contains a large collection of Microsoft Developer Network (MSDN) files, along with random other files.
- files.dog - Files archive.
Technology
Games Database
Description:
Games Database is one of the biggest source for manuals, videos, music and artwork. The site provides over 32k videos, 8k music files, 14k manuals, 5k game adverts, 822 TV commercials for 126 systems.
- gamesdatabase.org - Main web site.
Technology
Hugging Face
Description:
Hugging Face is the platform where the machine learning community collaborates on models, datasets, and applications. It contains the largest collection of open source AI models and focuses on machine learning tasks.
- huggingface.co - Main web page.
Technology
Ibiblio
Description:
Ibiblio (then called SunSITE) began mirroring open source software in 1992, and was one of only three such repositories available on the internet. Now almost 30 years later mirroring and open source software has evolved.
- ibiblio.org - Main web page.
- software-mirrors - List of mirrored projects.
Government
ICPSR
Description:
ICPSR is research science data and resources on topics like social media, politics, economics, social sciences, government, GIS, & more. ICPSR is part of the Institute for Social Research at the University of Michigan.
- icpsr.umich.edu - Main web page.
- Browse by Topic - List of datasets.
Science
INSDC
Description:
The International Nucleotide Sequence Database Collaboration (INSDC) archives nucleotide sequence data, from raw to assembled and annotated sequences, from around the world.
- insdc.org - Link to archival sites.
Technology, World
Internet Archive
Description:
The Internet Archive is an American non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including websites, software applications, music, audiovisual, and print materials.
- archive.org - The main archive web site.
- web.archive.org - The Wayback Machine.
- bibalex.org - Bibalex mirror of the Wayback Machine.
Science, Health, World
IPUMS
Description:
IPUMS provides census and survey data from around the world integrated across time and space. IPUMS integration and documentation makes it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community contexts. Data and services available free of charge.
- ipums.org - List of IPUMS datasets.
Technology, Science, World
Kaggle
Description:
Kaggle is one of the largest collection of datasets, mostly focusing on statistics, science, world affairs and technology. It contains 430K high-quality public datasets. Everything from avocado prices to video game sales.
- kaggle.com - List of Kaggle datasets.
Technology, Science, World
Kiwix
Description:
3 billion people have no or little access to internet. This can be because of costs, lack of infrastructure, or outright censorship. Kiwix provides offline versions of popular web sites like Wikipedia, Wikibooks and Project Gutenberg.
- kiwix.org - Home page.
- applications - List of offline applications.
- library - Library of content for Kiwix.
Science
LibreTexts
Description:
LibreTexts is the adaptable, user-friendly open education resource platform that educators trust for creating, customizing, and sharing accessible, interactive textbooks, adaptive homework, and ancillary materials. We collaborate with individuals and organizations to champion open education initiatives, support institutional publishing programs, drive curriculum development projects, and more. The LibreText Commons hosts curated Open Educational Resources from all 16 libraries in the LibreVerse in one convenient location.
- libretexts.org - Project home page.
- commons.libretexts.org - Index of textbooks.
World
Mirror Service
Description:
The UK Mirror Service provides a collection of mirrors of FTP, web and rsync sites of interest to academic users. The service is provided by the University of Kent's School Of Computing.
- mirrorservice.org - List of mirrored sites.
Technology
My Abandonedware
Description:
On My abandonware you can download all the old video games from 1965 to 2012 for free. You can play Pacman, Arkanoid, Tetris, Galaxian, Alter Ego, or Blackthorne, Civilization, Sim City, Prince of Persia, Xenon 2, King's quest, Ultima, Kyrandia, The Incredible Machine, Another World, Test Drive, Flashback, Lemmings and more. For each game, they offer all related information included publication year, publisher, developer, size of the game, language, review of the game, instructions to play, the game manual and, of course, the game archive that you can download for free.
- myabandonware.com - Index of software.
Science, Climate
OpenEI
Description:
The Open Energy Data Initiative (OEDI) enables research, collaboration, and transparency by providing open access to energy data and information. The OpenEI Data Lake is a centralized repository of datasets aggregated from the U.S. Department of Energy’s Programs, Offices, and National Laboratories. It provides links to over 4.19 PB of data.
- data.openei.org - Links to datasets.
World
Project Gutenberg
Description:
Project Gutenberg is a library of over 75,000 free eBooks. Everything from Project Gutenberg is gratis, libre, and completely without cost to readers. Michael Hart, founder of Project Gutenberg, invented eBooks in 1971 and his memory continues to inspire the creation of eBooks and related content today. The Project Gutenberg Literary Archive Foundation (PGLAF) is the non-profit corporation that oversees operation of the project.
- gutenberg.org - The main project web site.
- mirrorservice.org - Mirror site.
Government, Science, Climate
Public Environmental Data Project
Description:
The Public Environmental Data Project is committed to preserving and providing public access to federal environmental data. They are a volunteer coalition of several environmental, justice, and policy organizations, researchers across several universities, archivists, and students who rely on federal datasets and tools to support critical research, advocacy, policy, and litigation work. Several datasets are available on their site.
- screening-tools.com - The screening tools site.
Science
Sci-Hub
Description:
Sci-Hub started as a tool for providing quick access to articles from scientific journals - such articles are the main medium of communication of scientific knowledge today. Now Sci-Hub has grown a database of over 88 millions research articles and books freely accessible for anyone to read and download.
- sci-hub.st - Sci-Hub mirror site.
- sci-hub.se - Sci-Hub mirror site.
- sci-hub.ru - Sci-Hub mirror site.
Technology
Software Heritage
Description:
The long term goal of the Software Heritage initiative is to collect all publicly available software in source code form together with its development history, replicate it massively to ensure its preservation, and share it with everyone who needs it. The Software Heritage archive is growing over time as they crawl new source code from software projects and development forges.
- archive.softwareheritage.org - Source files archive.
Science, Climate, Government
Source COOP
Description:
Source Cooperative is a data publishing utility that allows trusted organizations and individuals to share data using standard HTTP methods. It contains large data collections and mirrors of various sites, mostly centered around science, government and climate.
- source.coop - List of projects.
- gov-data - Full mirror of data.gov.
- data-vault - Scripts used to scrape data.gov.
Technology
TextFiles
Description:
TEXTFILES.COM has been online for nearly 25 years providing text files, focusing mostly on the years 1980-1995.
- textfiles.com - Collection of text files.
- cd.textfiles.com - Collection of shareware files.
Technology
The Old Computer
Description:
Home to the largest collection of roms and emulators anywhere on the web with over 500,000 ROMs and Emulators for every major computer, console, arcade machine, pinball table and mobile device. Box Scans, Manuals, Magazines and a 179,000+ strong user community.
- theoldcomputer.com - Home page.
Technology
The Unix Heritage
Description:
The Unix Heritage Society's aims include the preservation and maintenance of historical and non-mainstream UNIX systems; the further development of existing UNIX systems; and the continual fostering of the Unix community spirit. They host historical Unix distribution and packages available for download.
- tuhs.org - Heritage home page.
- wiki.tuhs.org - Unix archive.
- utree - Unix source code.
World
Wikipedia
Description:
Wikipedia is a free-content online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. It is the largest and most-read reference work in history. Wikipedia is hosted by the Wikimedia Foundation, a non-profit organization that also hosts a range of other projects.
- en.wikipedia.org - English Wikipedia home page.
- wikipedia.org - Wikipedia in other languages.
Technology
WinWorld
Description:
WinWorld is an online museum created in 2003 dedicated to the preservation and sharing of vintage, abandoned, and pre-release software. It offers information, media and downloads for a wide variety of computers and operating systems. Get classic operating systems, applications, games and betas for every platform from PC to Mac to Amiga, right here from the software library on WinWorld.
- winworldpc.com - The WinWorld library.