This post compiles a collection of sites that provide data, covering curated collections, data releases through Open Data movements or ‘Data-as-a-Service’ platforms. This collection has previously lived elsewhere on my site under the ‘REFERENCES‘ menu but I have decided that I no longer intend to put energies in to maintaining this as an ongoing, comprehensive and up-to-date list. (On that note I can’t guarantee that all links are live, active and current).
Rather than kill the page remorselessly I thought it would at least be worth sharing here as a form of permanent archive and just let it slowly, gradually decay in value over time.
Government and NGOs
Data.gov: “As a priority Open Government Initiative for President Obama’s administration, Data.gov increases the ability of the public to easily find, download, and use datasets that are generated and held by the US Federal Government.”
Data.gov.uk: “The Government is releasing public data to help people understand how government works and how policies are made. Some of this data is already available, but data.gov.uk brings it together in one searchable website. Making this data easily available means it will be easier for people to make decisions and suggestions about government policies based on detailed information.”
Data.gov.au: “Data.gov.au provides an easy way to find, access and reuse public datasets from the Australian Government. We encourage you to use government data to analyse, mashup and develop tools and applications which benefit all Australians.”
US Census Bureau: “serves as the leading source of quality data about the United States’ people and economy… providing the best mix of timeliness, relevancy, quality, and cost for the data we collect and services we provide.”
UK Data Service: “We hold the UK’s largest collection of digital social research data providing a unified point of access to data from ESDS, Census Programme, Secure Data Service and others.”
Australian Bureau of Statistics: “The Australian Bureau of Statistics assists and encourages informed decision making, research and discussion within governments and the community, by leading a high quality, objective and responsive national statistical service.”
UK Office for National Statistics: “‘Trusted statistics – understanding the UK’. The ONS is the UK’s largest independent producer of official statistics and the recognised national statistical institute of the UK. Our people play a leading role in the development of national and international good practice in the production of official statistics.”
The World Bank: “The World Bank provides free and open access to a comprehensive set of data about development in countries around the globe, together with other datasets cited in the data catalog. The Data Catalog provides download access to over 8,000 indicators from World Bank data sets.”
Eurostat: “Eurostat’s mission: to be the leading provider of high quality statistics on Europe. Eurostat is the statistical office of the European Union situated in Luxembourg. Its task is to provide the European Union with statistics at European level that enable comparisons between countries and regions.”
UN: “A data access system to UN databases. The United Nations Statistics Division (UNSD) of the Department of Economic and Social Affairs (DESA) launched this new internet based data service for the global user community. It brings UN statistical databases within easy reach of users through a single entry point (http://data.un.org/). Users can now search and download a variety of statistical resources of the UN system.”
UN Office on Drugs and Crime: “UNODC regularly provides global statistical series on crime, criminal justice, drug trafficking and prices, drug production, and drug use.”
UNDP Human Development Reports: Provides a variety of portals and tools to explore and export ‘human development data’ utilized in the preparation of the Human Development Index (HDI)
UN Habitat: “This website provides UN-Habitat project information, statistical indicators from cities across the world and the Cities Prosperity Index as open data.”
OECD Data Lab: The OECD’s laboratory for data visualisations and data downloads
WHO: “The GHO data repository provides access to over 50 datasets on priority health topics including mortality and burden of diseases, the Millennium Development Goals, non communicable diseases and risk factors, epidemic-prone diseases, health systems, environmental health, violence and injuries, equity among others.”
UNICEF: Access to the latest statistics, data analysis and data dissemination about UNICEF’s work
Italian Senate: “The point for direct access to the data of the Senate. Daily updates of information easily and freely be used (open data) that affect every aspect of policy and institutional framework: the case of bills with their process, electronic voting the Classroom, Commissions, parliamentary groups and many other information. An information base made available to citizens, researchers, journalists and developers to analyze and share the knowledge of what is being proposed, discussed and voted on by the people’s representatives in the upper house of the Italian Parliament.”
Italian Chamber of Deputies: “A platform for publishing and sharing of Linked Open Data on the business of the Chamber and the Chamber bodies, free to download or query.”
National institute for statistics: “I.Stat is the warehouse of statistics produced by Istat, a complete and homogeneous wealth of information unique for the Italian official statistics.”
Datos.gob.es: “Datos.gob.es is the Spanish national portal that organizes and manages the Catalog of Public Information Public Sector.”
Census India: “The Indian Census is the most credible source of information on Demography (Population characteristics), Economic Activity, Literacy and Education, Housing & Household Amenities, Urbanisation, Fertility and Mortality.”
‘Local’ Data
An ever-increasing number of cities, states and regions are opening up their local data around the world. This list will point to some of the most relevant and interesting (do inform me of additions):
NYC Open Data
Services and Marketplaces
Timetric: “We deliver forward-looking economic and industry intelligence and data services based on high-quality data and expert analysis.”
DataMarket: “Datamarket offers solutions to data publishing and other data-related issues and needs, also giving you access to a vast array of private and public data on, among many others, consumerism, finance and industry.”
Airsage: “Airsage provides access to the most accurate, up-to-the minute population, location and movement patterns based on 15 billion mobile locations every day.”
Infochimps: “We are a provider of data management and analysis solutions alongside access to a wealth of datasets including social media data, geographic data, various types of census and much, much more.”
Factual: “Global Location Data – Tap into definitive data on 65 million places, updated and improved in real-time by Factual’s data stack.”
Windows Azure data marketplace: “One-Stop Shop for Premium Data and Applications: Hundreds of Apps, Thousands of Subscriptions, Trillions of Data Points”
AggData: “AggData has grown to be a business partner by developing and updating data for over 2,000 businesses and organizations. The AggData team prides itself on developing and delivering data-driven solutions that assist you in your critical tasks.”
Import.io: “Extract data from a website and keep it live. Import.io allows you to identify data on a website and create a live connection to it.”
DelRay: “DelRay takes any data source and makes it usable. Simplify your complex data integration projects: More Data. Less time. Better results.”
Swirrl: “Swirrl is a small company based in Manchester and Stirling, creating beautiful and powerful data solutions for organisations who want to open up their data for both humans and machines.”
Junar: “Junar provides a cloud-based platform for opening data to drive innovation, collaboration, and to meet legislative goals.”
Social Data
Tweet Archivist: “Tweet Archivist Desktop is a Windows application that helps you archive tweets for later data-mining and analysis. Start a search with Tweet Archivist and it will get as many results as it can. Then, leave Tweet Archivist running and it will poll Twitter for that search as frequently as once every five minutes.”
Datasift: “Grow your business with social data. DataSift is the most sophisticated data platform used to filter insights from the world’s most popular social & news sources.”
Gnip: “Gnip is the Largest Provider of Social Media Data to the Enterprise – Never Miss a Tweet, Post, Comment or Like”
Dataminr: “Dataminr identifies hotspots of activity across Twitter, distilling the sea of real-time information down to the signals that are early on the awareness curve and relevant to financial markets.”
Twitter archive: Download your Twitter archive
Foursquare API: “The foursquare API gives you access to all of the data used by the foursquare mobile applications, and, in some cases, even more.”
Instagram API: “The first version of the Instagram API is an exciting step forward towards making it easier for users to have open access to their data. We created it so that you can surface the amazing content Instagram users share every second, in fun and innovative ways.”
Facebook API: “The Graph API is the primary way that data is retrieved or posted to Facebook.”
Portals and Platforms
Knoema: “Knoema is a knowledge platform. The basic idea is to connect data with analytical and presentation tools. As a result, we end with one uniformed platform for users to access, present and share data-driven content. Within Knoema, we capture most aspects of a typical data use cycle: accessing data from multiple sources, bringing relevant indicators into a common space, visualizing figures, applying analytical functions, creating a set of dashboards, and presenting the outcome.”
Freebase: “An open, Creative Commons licensed repository of structured data of almost 23 million entities. Explore a free, open knowledge graph of 39 million people, places, and things.”
Socrata OpenData: “Socrata is a company that provides social data discovery services for opening government data, targeting non-technical Internet users who want to view and share government, healthcare, energy, education, or environment data.”
CKAN: “CKAN is a powerful data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.”
PublicData.eu: “PublicData.eu is a Pan European data portal, providing access to open, freely reusable datasets from local, regional and national public bodies across Europe.”
Data Hub: “The free, powerful data management platform from the Open Knowledge Foundation. Search for data, and get updates from datasets and groups that you’re interested in. Publish or register datasets, create and manage groups and communities.”
Google’s PublicData: “The Google Public Data Explorer makes large, public-interest datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand. You don’t have to be a data expert to navigate between different views, make your own comparisons, and share your findings.”
Info-vis Wiki: A broad collection of data libraries and references
Visualizing.org Data Channels: “Data Channels are a one-stop shop for data and related resources from leading NGOs, government agencies, and companies around the world. Engage with the scientists behind the data, download and comment on new sets, and explore what other designers have created.”
OpenCorporates API: “The Open Database Of The Corporate World: 50 million companies, 65 jurisdictions”
Open Data for Africa: “The African Development Bank Group (AfDB) is committed to supporting statistical development in Africa as a sound basis for designing and managing effective development policies for reducing poverty on the continent. Reliable and timely data is critical to setting goals and targets as well as evaluating project impact. Reliable data constitutes the single most convincing way of getting the people involved in what their leaders and institutions are doing.”
OpenDataSoft: “OpenDataSoft is a turnkey platform over the cloud designed to make simple to re-use data through API & apps models. It reinvents data management for data-driven innovation & smart services time-to-market.”
Newspapers and Media
Guardian Datastore: The Guardian datastore publish and share datasets, statistics and visualisations relating to the latest topical issues from the news agenda.
NYT APIs: The Times Developer Network is the “API clearinghouse and community” which provides access to a range of APIs to explore and access New York Times content going back to 1851.
Associated Press APIs: “AP Content API allows you to search and download AP Images content using your own editorial tools, without visiting the AP Images website. AP Metadata Services is a new automated service that applies rich semantic metadata to news content.”
Thomson Reuters APIs: “Access the one stop shop for developers using Reuters Market Data System APIs. Support includes a comprehensive package of information, software (APIs) and global email- based support for members.”
Culture and Heritage
Digital Public Library of America: “PublicData.eu is a Pan European data portal, providing access to open, freely reusable datasets from local, regional and national public bodies across Europe.”
Europeana: “Explore millions of items from a range of Europe’s leading galleries, libraries, archives and museums. Books and manuscripts, photos and paintings, television and film, sculpture and crafts, diaries and maps, sheet music and recordings, they’re all here. No need to travel the continent, either physically or virtually!”
Data-Related Movements
Open Data Institute: “The Open Data Institute is catalysing the evolution of open data culture to create economic, environmental, and social value. It helps unlock supply, generates demand, creates and disseminates knowledge to address local and global issues.”
Open Knowledge Foundation: “Empowering through Open Knowledge – We are a global movement to open up knowledge around the world and see it used and useful.”
Datakind: “Using data in the service of humanity.”
Data Search Tools
WolframAplha: ‘Computational knowledge engine’ that let’s you enter what you want to calculate or know about with details statistical and data results
Google Custom Search Engine: Example of Google CSE that “targets 800+ academic, government agency, non-profit, and other web sites that provide high quality, downloadable statistical information and data sets. Emphasis is on data pertaining to the social sciences, health, developing countries, energy, natural resources, and the environment.”
Get the Data: Simply ask and answer data related questions (now only exists as an archive search)
Zanran: “Search the web for data and statistics. Zanran gets you more meaningful numerical results than any other search engine”
Quandl: “Quandl provides a database of millions of open and free global datasets from many international organisations, central banks, agencies and more.”
Data engine: “A resource for finding and manipulating data from the web. Explore over 50 thousand data sets containing over 1 million time series.”
US-Specific Collections of Data Sources
Databases of US-related statistical information, curated by UC Berkeley Graduate School of Journalism, providing a wealth of demographic and other data on a city, county, state and national level in the United States:
Mapping
Blue Marble: “Free World map data for your applications! Download a free dataset that includes country, capital and population information in either Esri shapefile or MapInfo TAB format.”
World maps: A collection of vector graphics (SVG) of blank maps from around the world.
Vector World Map: “Working towards a free, accurate world map, available in vector AI, EPS and CDR formats.”
ISO 3166-1: “ISO 3166-1 is part of the ISO 3166 standard published by the International Organization for Standardization (ISO), and defines codes for the names of countries, dependent territories, and special areas of geographical interest.”
ISO 3166-2: “ISO 3166-2 is part of the ISO 3166 standard published by the International Organization for Standardization (ISO), and defines codes for identifying the principal subdivisions (e.g., provinces or states) of all countries coded in ISO 3166-1.”
Natural Earth: “Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110 million scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software.”
Ordnance Survey Open Data: “Mapping data and geographic information from Ordnance Survey.”
OpenGeography: “The Open Geography Portal allows users to discover, view and download geographical reference data to support National Statistics.”
U.S. Bureau of Reclamation: Data on US reservoir
The U.S. national map: “The National Map is a collaborative effort among the USGS and other Federal, State, and local partners to improve and deliver topographic information for the Nation. It has many uses ranging from recreation to scientific analysis to emergency response.”
Protected Planet: “Discover and learn about the protected areas of our planet (national parks, etc.)”
OpenTopography: High-resolutuon topographic data from LIDAR
Transport and Travel
Open Flights: “OpenFlights is a tool that lets you map your flights around the world, search and filter them in all sorts of interesting ways, calculate statistics automatically, and share your flights and trips with friends and the entire world (if you wish). It’s also the name of the open-source project to build the tool.”
Transport for London: Collection of data feeds relating to London’s transport infrastructure including live underground, roadside messages, cycle docking stations etc.
MTA: Collection of data feeds relating to New York City’s transport infrastructure including live metro, bus, railroad and bridges/tunnel data.
Earth, Space and beyond…
EarthExplorer: “Primarily Landsat and USGS aerial photos, but also some other datasets (GeoTIFF)”
EOSDIS: NASA’s Earth Observing System Data and Information System – the top-level for NASA’s Earth science data
Earthnet: European Space Agency Earth science data
NASA Planetary Data System “The PDS archives and distributes scientific data from NASA planetary missions, astronomical observations, and laboratory measurements. The PDS is sponsored by NASA’s Science Mission Directorate. Its purpose is to ensure the long-term usability of NASA data and to stimulate advanced research. Learn more about PDS.”
Solar Dynamics Observatory: “The Solar Dynamics Observatory (SDO) is the first mission to be launched for NASA’s Living With a Star (LWS) Program, a program designed to understand the causes of solar variability and its impacts on Earth. SDO is designed to help us understand the Sun’s influence on Earth and Near-Earth space by studying the solar atmosphere on small scales of space and time and in many wavelengths simultaneously.”
Astronaut Photography: The very cool NASA archive of astronaut photography.
Weather and Climate
National Climatic Data Center: “NCDC is the world’s largest provider of weather and climate data. Land-based, marine, model, radar, weather balloon, satellite, and paleoclimatic are just a few of the types of datasets available.”
WeatherBase: “Find travel weather, climate averages, forecasts, current conditions and normals for 29,252 cities worldwide.”
Met Office DataPoint: “DataPoint is a way of accessing freely available Met Office data feeds in a format that is suitable for application developers. It is aimed at professionals, the scientific community and student or amateur developers, in fact anyone looking to re-use Met Office data within their own innovative applications.”
Met Office Hadley Centre: “Researchers at the Met Office Hadley Centre produce and maintain a range of gridded datasets of meteorological variables for use in climate monitoring and climate modelling. This site provides access to these datasets for bona fide scientific research and personal usage only.”
Wunderground API: “a great source of free weather data, aggregating a multitude of independent weather stations all around the world. Reliable data, accurate forecast, & global coverage in 80 language.”
EUMETSAT: “EUMETSAT is an intergovernmental organisation and was founded in 1986. Our purpose is to supply weather and climate-related satellite data, images and products – 24 hours a day, 365 days a year – to the National Meteorological Services of our Member and Cooperating States in Europe, and other users worldwide.”
CLASS: “The Comprehensive Large Array-data Stewardship System (CLASS) is an electronic library of NOAA environmental data. This web site provides capabilities for finding and obtaining those data.”
GOES Project Science: “U.S. Geostationary weather satellite images (easier to use, but more limited than the link above)”
GLIMS: “GLIMS (Global Land Ice Measurements from Space) is a project designed to monitor the world’s glaciers primarily using data from optical satellite instruments, such as ASTER (Advanced Spaceborne Thermal Emission and reflection Radiometer).”
OneGeology: “OneGeology is an international initiative of the geological surveys of the world. This ground-breaking project was launched in 2007 and contributed to the ‘International Year of Planet Earth’, becoming one of their flagship projects.”
Goddard Institute for Space Studies: “Collection of Goddard Institute for Space Studies (GISS) climate data”
USGS streamflow data: “USGS Current Water Data for the Nation”
Movies
IMDB: Options for accessing IMDb locally by holding copies of the data directly on your system under a non-commercial license.
IMP Awards:Gallery/database of Movie Posters.
Movie Scripts: Full scripts from thousands of movies!
Misc
Google Ngram Viewer: Search tool for exploring and comparing the use of specified phrases or language over time in books
Transfermarkt: A family of databases and portals for player transfers across all major football leagues going back through history.
Wikimedia Commons: “A database of 18,032,822 freely usable media files to which anyone can contribute”
Sean Revell: Sean Revell’s very useful and detailed collection of data resources.
GeneratedData: Tool for generating customised but randomised test data.
NYC’s Open Data: Fabulous visual interface by Chris Whong providing a single view of the more than 1,100 open datasets made available by New York City.
Peter Skomoroch Delicious: Peter Skomoroch’s huge personal collection of bookmarked data markets, portals and services.
Data Remixed: Ben Jones’ invaluable Tableau tool that helps users find Open Data sites around the world.