2022-06-09
Discovering Hidden Gems - Popular and Lesser-Known Dataset Sharing Platforms
"Looking for the key to unlocking valuable datasets? Dive into the world of Kaggle, UCI, and more as we unveil the best platforms for data enthusiasts."
There are several popular dataset-sharing platforms available that researchers, data scientists, and machine learning practitioners can utilize to access and share datasets. Here are some of the best dataset-sharing platforms:
Kaggle
Kaggle is a well-known platform for data science competitions, but it also provides a dataset repository where users can discover and share datasets. It offers a wide range of datasets in various domains, along with tools for data exploration and collaboration.
UCI Machine Learning Repository
The University of California, Irvine (UCI) hosts a repository of datasets specifically designed for machine learning research. It provides a diverse collection of datasets, including text, image, and time series data, covering a wide range of domains.
Google Dataset Search
Google Dataset Search is a search engine that specifically focuses on indexing datasets. It aggregates datasets from various sources on the web, making it easier to find publicly available datasets. It provides information about the dataset, including its description, author, and availability.
Data.gov
Data.gov is a U.S. government initiative that provides access to a wide range of datasets from different federal agencies. It offers datasets covering various domains such as health, climate, finance, transportation, and more. The platform aims to promote transparency and facilitate public access to government data.
OpenML
OpenML is an open-source platform that allows users to share, discover, and analyze datasets and machine learning experiments. It provides a collaborative environment for researchers and practitioners to collaborate and contribute to the development of machine learning algorithms.
GitHub
Although GitHub is primarily a code hosting platform, it also serves as a repository for datasets. Many researchers and organizations share datasets on GitHub, making it a valuable resource for finding datasets across various domains. You can search for datasets using specific keywords or explore repositories dedicated to datasets.
Other platforms
Here are 30 lesser-known dataset-sharing platforms that you can explore:
- DataHub: https://datahub.io/
- Figshare: https://figshare.com/
- Quandl: https://www.quandl.com/
- Zillow Prize: https://www.kaggle.com/c/zillow-prize-1
- Data.world: https://data.world/
- OpenSNP: https://opensnp.org/
- Dataverse: https://dataverse.org/
- Datacite: https://www.datacite.org/
- Open Data Network: https://www.opendatanetwork.com/
- HDX: https://data.humdata.org/
- AWS Public Datasets: https://registry.opendata.aws/
- Social Science Data Repository (SSDR): https://data.nber.org/
- Open Energy Data: https://open-power-system-data.org/
- Open Neuro: https://openneuro.org/
- GeoNetwork: https://geonetwork-opensource.org/
- Zenodo: https://zenodo.org/
- Awesome Public Datasets: https://github.com/awesomedata/awesome-public-datasets
- Open Images: https://storage.googleapis.com/openimages/web/index.html
- PubMed: https://pubmed.ncbi.nlm.nih.gov/
- Earthdata: https://earthdata.nasa.gov/
- Humanitarian Data Exchange (HDX): https://data.humdata.org/
- Registry of Open Data on AWS: https://registry.opendata.aws/
- European Data Portal: https://www.europeandataportal.eu/
- Global Database of Events, Language, and Tone (GDELT): https://www.gdeltproject.org/
- OpenMLCC: https://openml.github.io/openmlcc/
- Data.gov.uk: https://data.gov.uk/
- National Centers for Environmental Information (NCEI): https://www.ncei.noaa.gov/
- DataONE: https://www.dataone.org/
- International Monetary Fund (IMF) Data: https://www.imf.org/en/data
- Open Data Soft: https://www.opendatasoft.com/
Any comments or suggestions? Let me know.
To cite this article:
@article{Saf2022Discovering, author = {Krystian Safjan}, title = {Discovering Hidden Gems - Popular and Lesser-Known Dataset Sharing Platforms}, journal = {Krystian's Safjan Blog}, year = {2022}, }
Tags:
machine-learning
python