21 Data Platforms You Can Search With a Single Query
From Kaggle and Hugging Face to NASA and Eurostat — here's every platform Mobus connects to and what each one is best for.
One of the biggest friction points in data work is platform fragmentation. The dataset you need might be on Kaggle, Zenodo, the World Bank, or buried in a government open data portal. Mobus connects to 21 platforms and growing, so you can search them all from a single conversation.
Here's what each platform offers and when you'd want to pull from it.
Machine Learning & AI
Kaggle — The largest community-driven data platform. Strong for competition datasets, tabular data, and community notebooks. Licensing varies per dataset.
Hugging Face — The hub for ML-ready datasets with standardized loading (via the datasets library). Excellent for NLP, vision, and audio tasks. Most datasets include data cards with documentation.
Papers With Code — Links academic papers to their associated datasets and benchmarks. Useful for finding the exact data used in a specific paper.
Academic & Scientific
Zenodo — CERN's open repository for research data. Every upload gets a DOI. Strong for long-tail academic datasets that don't fit on other platforms.
arXiv — Preprint server with increasingly structured supplementary data. Good for finding datasets referenced in cutting-edge research.
Dryad — Focused on research data underlying peer-reviewed publications. Strong curation and metadata standards.
OpenAlex — Open catalog of scholarly works, authors, institutions, and concepts. Useful for bibliometric analysis and research landscape mapping.
Government & International
World Bank — Development indicators across 200+ countries. Time series data on economics, health, education, and infrastructure.
Eurostat — The EU's statistical office. Comprehensive economic, demographic, and trade data across member states.
data.gov — The U.S. government's open data portal. Covers everything from agriculture to transportation.
NASA Earthdata — Satellite imagery, climate models, and Earth science observations. Large-scale geospatial data.
NOAA — Weather, ocean, and atmospheric data. Historical climate records and real-time observations.
Domain-Specific
WHO — Global health statistics, disease surveillance, and health system data.
FRED — Federal Reserve Economic Data. Time series on interest rates, employment, GDP, and hundreds of other economic indicators.
UCI Machine Learning Repository — Classic ML benchmark datasets. Smaller but well-documented and widely cited.
OpenML — Collaborative ML platform with standardized dataset formats, tasks, and reproducible experiments.
Emerging & Specialized
Harvard Dataverse — Research data repository used across disciplines. Strong versioning and citation support.
Figshare — General-purpose research data hosting. Supports any file type with DOI assignment.
GBIF — Global Biodiversity Information Facility. Occurrence records for species worldwide.
Pangaea — Earth and environmental science data. Strong for oceanography, geology, and paleoclimate.
Crossref — DOI registration and metadata for scholarly publications. Useful for building citation networks.
Why cross-platform search matters
Each platform serves a different community and indexes data differently. A search for "air quality time series" returns different results on Kaggle (community uploads, often cleaned for competitions), Zenodo (raw research data with DOIs), and the WHO (official statistics by country). Searching all of them at once gives you the full picture and lets you pick the best source for your specific use case.