Copenhagen, Denmark · June 14, 2026

3 Billion Records.
One Open Infrastructure.
All of Life on Earth.

GBIF federates species occurrence data from thousands of institutions into an open, citable research infrastructure

3 billion+ occurrence records indexed from museums, surveys, and citizen science.Data from thousands of datasets published by institutions in 100+ countries.Underpins 6+ peer-reviewed papers per day. 25 years of global infrastructure.
Photo by Alexander Mass on Pexels

Biodiversity science depends on a deceptively simple question: what species was seen, where, and when?

For centuries, the answers lived in museum drawers, herbarium sheets, field notebooks, local databases, government surveys, environmental assessments, and now smartphone apps and automated sensors. Each record may be small — a plant specimen collected in 1890, a bird checklist from yesterday, a camera-trap image, a DNA-derived detection, a preserved insect, a coral observation — but together they form the evidence base for understanding life on Earth.

GBIF, the Global Biodiversity Information Facility, is one of the core pieces of global biodiversity infrastructure. It is not just a website and not merely a database. It is an international network, standards ecosystem, data publishing system, API platform, and citation framework designed to make biodiversity occurrence data discoverable, reusable, and attributable.

Its importance is easiest to understand this way: GBIF does for species occurrence data what public web infrastructure does for documents. It gives many independent data holders a shared way to publish, index, search, cite, and reuse records without forcing every institution into a single database or software stack.

Source note: This article was prepared from public GBIF documentation, GBIF.org pages, technical documentation, GBIF publishing guidance, data quality guidance, and related biodiversity informatics sources accessed on 14 June 2026. Live counters for occurrence records, datasets, publishers, papers, and participants change continuously, so exact operational totals should be checked on GBIF.org at the time of use.

Why GBIF Matters

Conservation decisions are only as good as the evidence behind them. A government deciding where to place a protected area, a researcher modelling species distributions under climate change, a conservation NGO assessing invasive species risk, or a company screening biodiversity impacts all need reliable information about the distribution of species.

The problem is that biodiversity knowledge is scattered. Natural history museums may hold millions of specimens. Universities maintain research datasets. Birders and naturalists contribute observations through citizen-science platforms. National agencies publish monitoring data. Environmental consultants generate survey records. Molecular labs produce DNA-derived detections. Without shared standards and open infrastructure, these datasets remain isolated and difficult to combine.

GBIF reduces that fragmentation by giving the biodiversity community a common publishing and discovery layer. Its value comes from three linked functions:

  • Mobilization: helping institutions publish biodiversity data using shared standards, tools, metadata, and licences.

  • Integration: indexing records from thousands of datasets so they can be searched by taxon, place, time, basis of record, institution, licence, coordinate quality, and other fields.

  • Reuse: supporting downloads, APIs, cloud access, DOI-based citation, and literature tracking so data can flow into research, conservation, policy, and decision-making.

This makes GBIF especially important for global-scale questions. No single research group can collect enough records to represent global biodiversity. GBIF works because it federates many sources: museums, herbaria, government agencies, NGOs, citizen-science platforms, research projects, and thematic networks.

A researcher examining wildflowers with a magnifying glass in a forest, representing the field observation and collection process that feeds biodiversity data into GBIF
Photo by KATRIN BOLOVTSOVA on Pexels

What GBIF Is

GBIF describes itself as an international network and data infrastructure funded by governments, with the goal of giving anyone, anywhere, open access to data about all types of life on Earth. It is coordinated by a Secretariat in Copenhagen and works through participating countries, organizations, and participant nodes.

The important phrase is network and data infrastructure . GBIF does not own most of the data it serves. Data remain associated with their publishers. GBIF provides the shared infrastructure that lets those records be standardized, indexed, discovered, downloaded, cited, and reused.

GBIF isGBIF is not

A global network of participating countries and organizations.

A single research institute collecting all biodiversity records itself.

A publishing and indexing infrastructure for biodiversity datasets.

A guarantee that every record is correct, complete, unbiased, or fit for every analysis.

A standards-based platform using Darwin Core and related biodiversity data standards.

A replacement for taxonomic experts, collection managers, field ecologists, or local knowledge.

A discovery, download, API, cloud, and citation layer for biodiversity data reuse.

A closed commercial data product or paywalled repository.

GBIF began from international recognition that biodiversity data were too fragmented to support science and sustainable development properly. The Organization for Economic Cooperation and Development’s Megascience Forum recommended a global mechanism in 1999, and GBIF was established in 2001. In 2026, GBIF marks 25 years of building global biodiversity informatics infrastructure.

What Kind of Data GBIF Handles

The core unit in GBIF is the species occurrence record: evidence that a taxon occurred at a particular place and time. A record may come from a museum specimen, a herbarium sheet, a human observation, a machine observation, a fossil, a living specimen, a material citation, an environmental sample, or a published dataset.

Occurrence records are powerful because they combine three basic facts: taxon, location, and date. When those facts are standardized and linked to metadata, they can support mapping, trend analysis, species distribution modelling, invasive species alerts, conservation prioritization, and environmental risk assessment.

Source typeWhat it contributesTypical strengthsTypical caveats
Museum and herbarium specimensPhysical evidence collected over centuries.Verifiable material, historical depth, taxonomic value.

May have imprecise locality data, colonial collection bias, or outdated taxonomy.

Citizen-science observationsHigh-volume recent records from volunteers and naturalists.Fresh, wide coverage, often image-supported.

Uneven sampling near roads, cities, parks, and popular species.

Monitoring and survey datasets

Structured observations from agencies, researchers, NGOs, and field projects.

Protocol-driven, repeatable, often designed for trend detection.

May be geographically limited or difficult to compare across protocols.

Machine observations

Records from sensors, camera traps, acoustic devices, or automated systems.

Scalable and increasingly important for near-real-time monitoring.

May require careful validation of automated identifications.

DNA-derived and molecular records

Detections from DNA barcodes, metabarcoding, eDNA, or sequence-based workflows.

Can reveal cryptic, microscopic, or hard-to-observe biodiversity.

Requires careful handling of sequence provenance, taxonomic assignment, and sampling context.

GBIF also supports checklist, sampling-event, metadata-only, and other dataset classes. The long-term direction is toward a richer data model that can better represent surveys, interactions, material samples, DNA-derived data, and policy-ready biodiversity indicators.

A vibrant collection of diverse butterfly specimens on a white background, representing the museum and herbarium collections that form the historical backbone of GBIF occurrence records
Photo by Giulia Botan on Pexels

How Data Moves Through GBIF

GBIF works because it separates publishing from indexing. A museum, university, government department, citizen-science network, or research organization publishes its dataset. GBIF crawls and indexes it. Users then discover and download data through GBIF.org, the API, cloud snapshots, or other tools.

  1. Prepare: the data holder cleans its dataset, maps fields to standard terms, chooses a licence, and writes metadata.

  2. Publish: the organization publishes through the Integrated Publishing Toolkit, a hosted IPT, a Living Atlas installation, an API workflow, or another endorsed route.

  3. Index: GBIF processes the dataset, interprets taxonomy and geography, flags issues, and makes records searchable.

  4. Discover: users find records through GBIF.org, species pages, dataset pages, maps, filters, APIs, or literature links.

  5. Download: serious research use normally creates an occurrence download with a DOI, making the exact data extraction citable and reproducible.

  6. Cite and improve: users cite the DOI, data publishers receive credit, and errors or improvements can flow back to the original data holders.

This workflow is why GBIF is more than a search engine. It creates a feedback loop between data publishers, data users, standards, tools, and citation practices.

Darwin Core: The Shared Language

GBIF’s interoperability depends heavily on Darwin Core, a community-developed biodiversity data standard maintained through Biodiversity Information Standards (TDWG). Darwin Core provides a stable vocabulary for describing biodiversity records from many sources.

The standard matters because biodiversity data are messy. One database might store a species name as scientific_name, another as taxon, another as latinName. One dataset might call latitude lat; another might store it inside a locality string. Darwin Core gives publishers and aggregators a shared set of terms such as scientificName, eventDate, decimalLatitude, decimalLongitude, basisOfRecord, occurrenceID, institutionCode, and catalogNumber.

GBIF’s occurrence data quality requirements show how basic Darwin Core terms become practical publishing rules. For occurrence-only datasets, fields such as occurrenceID, basisOfRecord, scientificName, and eventDate are required, while country codes, coordinates, geodetic datum, coordinate uncertainty, kingdom, taxon rank, and abundance fields are strongly recommended.

Darwin Core termWhy it matters
occurrenceID

Provides a persistent identifier for the record, helping avoid duplication and ambiguity.

scientificName

Links the record to a taxon and enables taxonomic interpretation.

eventDate

Places the record in time, enabling historical and trend analysis.

decimalLatitude / decimalLongitude

Supports mapping, modelling, and spatial analysis.
coordinateUncertaintyInMeters

Helps users decide whether a record is spatially precise enough for a given use case.

basisOfRecord

Indicates whether the evidence is a preserved specimen, human observation, machine observation, fossil, living specimen, or other source.

For publishers, Darwin Core is a discipline: describe the data well enough that someone else can reuse it safely. For users, it is a map: know which fields to trust, filter, inspect, and cite.

A collection of pinned butterflies arranged systematically in a natural history display, representing the museum specimens and Darwin Core standards that make biodiversity data interoperable through GBIF
Photo by Tamula Aura on Pexels

The Integrated Publishing Toolkit

The Integrated Publishing Toolkit, or IPT, is GBIF’s widely used open-source tool for publishing biodiversity datasets. An IPT installation lets organizations prepare metadata, map their data to Darwin Core, publish datasets, and register them with GBIF.

The IPT is important because many biodiversity data holders are not software companies. A herbarium, national park agency, university collection, or NGO may have valuable records but limited capacity to build custom publishing infrastructure. IPT gives them a common path to publish data in a standards-compliant way.

GBIF documentation describes several publishing routes: running an institutional IPT, using hosted IPT services through a national or thematic node, publishing through Living Atlases infrastructure, or using more customized programmatic publishing workflows. Individuals generally publish through affiliated organizations, citizen-science platforms, or data papers rather than directly as standalone publishers.

Publishing routeBest fit
Institutional IPT

Organizations that can host and maintain their own publishing server.

Hosted IPT

Institutions that need a national, regional, or thematic node to host the publishing infrastructure.

Living Atlases

National or regional biodiversity portals aligned with GBIF-style data publishing and discovery.

API or custom workflows

Large, technically mature publishers with automated data pipelines.

Data papers

Researchers who want peer-reviewed recognition for curated datasets.

How Researchers and Analysts Use GBIF Data

GBIF-mediated data support a wide range of scientific and policy uses. Common applications include species distribution modelling, climate-change impact studies, invasive species risk assessment, protected-area planning, pollinator research, crop wild relative mapping, disease vector studies, red-list assessments, environmental impact screening, and national biodiversity reporting.

GBIF reports that its data are used in peer-reviewed studies at a rate of more than six papers per day. The platform also maintains a literature-tracking programme and Science Review series that surface examples of how open biodiversity data are being used in research.

Use caseWhat GBIF contributesImportant caution
Species distribution modellingOccurrence points across broad geography and time.

Sampling bias, duplicate records, coordinate uncertainty, and pseudo-absence design matter.

Climate impact research

Historical and recent species records for range-shift analysis.

Observation effort changes over time, so raw counts are not automatically population trends.

Invasive species early warning

New and historical occurrences of alien or expanding species.

Taxonomic misidentifications and reporting delays can affect confidence.

Protected-area planning

Known species occurrences inside and outside candidate areas.

Absence of records is not evidence of absence without sampling-effort context.

Corporate biodiversity screening

Open species occurrence data around sites and supply chains.

Screening outputs should be treated as risk signals, not complete ecological assessments.

National reporting and policy

Shared infrastructure for mobilizing and reusing country-level biodiversity records.

Countries differ in data mobilization capacity, digitization history, and publishing coverage.

The best use of GBIF data is rarely a simple download-and-map exercise. Good analysis usually includes taxonomic cleaning, duplicate handling, coordinate filtering, uncertainty filtering, temporal filtering, licence checks, bias correction, and a clear citation trail.

The Developer View

For developers and data scientists, GBIF is unusually useful because it exposes biodiversity infrastructure through web services, documented APIs, download formats, and cloud-accessible data products.

The main API families include species services, occurrence services, occurrence image services, maps, literature, registry, and validation. The occurrence API supports real-time paged search, but serious research downloads should use GBIF’s asynchronous occurrence download system or cloud snapshots rather than attempting to page through massive search results.

GBIF’s technical documentation describes several download formats:

  • Simple CSV / tab-separated text: a practical table for spreadsheet and scripting workflows.
  • Darwin Core Archive: a richer zipped package with interpreted records, verbatim records, metadata, and optional extensions.
  • Species list: a summary export of distinct species names returned by a filter.
  • Occurrence cubes: aggregated occurrence outputs by taxonomic, temporal, and spatial dimensions.
  • Parquet / cloud access: formats suitable for large analytical workflows.

A simple API exploration might look like this:

# Match a scientific name to the GBIF Backbone Taxonomy
curl "https://api.gbif.org/v1/species/match?name=Panthera%20leo"

# Search occurrence records for a known taxon key

curl "https://api.gbif.org/v1/occurrence/search?taxonKey=5219404&limit=10"

# For serious research, use occurrence downloads rather than deep paging.

# Downloads generate a DOI and make the dataset extract citable.

R users commonly access GBIF through rgbif, while Python users often use pygbif or direct API calls. A good rule is: use small search calls for exploration, but create a download for reproducible research.

Citation and Credit

GBIF data are free to access, but reuse comes with responsibilities. GBIF’s citation guidance states that users who download individual datasets or search results and use them in research or policy agree to cite them using a DOI.

DOI citation is central to the whole system. It lets other researchers reproduce the exact download, gives credit to data-publishing institutions and individuals, and demonstrates the impact of open data sharing to funders, collection managers, and governments.

Data access patternResponsible citation approach
Occurrence download from GBIF.org or APICite the download DOI generated by GBIF.
Individual datasetCite the dataset DOI or recommended citation from the dataset page.
Individual occurrenceCite the occurrence, dataset, and media source as appropriate.
Third-party package such as rgbif or pygbifCite the GBIF-mediated data DOI and, where relevant, the software package.
Images and other mediaRespect the media licence and credit the creator, dataset, and licence.

This is one of GBIF’s most important design decisions. Open biodiversity data only remains sustainable if the people and institutions doing the hard work of collecting, curating, digitizing, publishing, and maintaining records receive visible credit.

Governance and the Node Model

GBIF is governed through its participating countries and organizations. The Governing Board is the main decision-making body, with one representative from each participant country and organization. Voting rights are reserved for voting participant countries that contribute financially to GBIF’s central fund, while associate participants and organizations can participate in discussion.

The node model is central. A GBIF participant node is usually a team or institution designated to coordinate biodiversity data mobilization and use within a country, organization, region, or thematic community. Nodes help publishers, support training, connect stakeholders, build capacity, promote standards, and align national or institutional priorities with the global infrastructure.

This federated model is important because biodiversity data is local before it is global. Countries, museums, universities, Indigenous communities, agencies, and NGOs hold different kinds of expertise and authority. GBIF works best when global infrastructure supports local stewardship rather than replacing it.

Data Quality: Powerful, Not Perfect

GBIF is often described as the world’s largest open biodiversity data infrastructure, but large does not mean complete or unbiased. The platform exposes what has been collected, digitized, standardized, licensed, and published. It does not represent an evenly sampled census of life on Earth.

GBIF itself emphasizes data quality requirements and recommendations for publishers. Scientific studies of GBIF-mediated data repeatedly show spatial, temporal, and taxonomic biases. Records are denser in places with stronger research infrastructure, richer collection histories, active citizen-science communities, and better digitization capacity. Charismatic and easy-to-observe species are often overrepresented compared with small, cryptic, marine, tropical, soil, microbial, or poorly studied groups.

IssueWhy it mattersGood practice
Spatial biasRecords cluster near roads, cities, research stations, protected areas, and wealthy regions.Use sampling-bias correction, spatial thinning, background sampling strategy, or effort covariates.
Temporal biasHistorical specimens and recent citizen-science observations reflect different collection processes.Filter by year, model time explicitly, and avoid treating raw record counts as abundance.
Taxonomic biasBirds, mammals, plants, and charismatic groups are generally better represented than many invertebrates, fungi, microbes, and cryptic taxa.Check taxon-specific coverage and avoid broad claims from unevenly sampled groups.
Coordinate errorCoordinates may be missing, rounded, generalized, transposed, centred on a country, or imprecise.Use coordinate uncertainty fields, GBIF issue flags, country checks, and spatial validation.
Taxonomic interpretationNames may be synonyms, misspellings, outdated combinations, homonyms, or uncertain identifications.Use taxon keys, inspect the backbone match, and document taxonomic decisions.
DuplicatesThe same observation or specimen may appear through multiple routes.Deduplicate cautiously using occurrence IDs, institution/catalog numbers, coordinates, dates, and dataset context.
Sensitive speciesPrecise locations can increase risk of poaching, disturbance, or exploitation.Use generalized data where appropriate and respect publisher restrictions and conservation ethics.

The key point is not that GBIF data are unreliable. The key point is that GBIF data are evidence, and evidence needs interpretation. Used carefully, GBIF is extraordinarily valuable. Used naively, it can produce misleading maps, biased models, or false confidence.

A Practical Workflow for Responsible Use

A responsible GBIF workflow should be reproducible, transparent, and explicit about uncertainty.

  1. Define the question: decide whether you need presence records, a checklist, a survey dataset, a time series, or a modelling input.
  2. Resolve taxonomy: use scientific names carefully, prefer GBIF taxon keys, and document synonym handling.
  3. Create a reproducible download: use occurrence downloads for research-scale work so GBIF generates a DOI.
  4. Inspect licences: confirm whether records are CC0, CC BY, CC BY-NC, or otherwise constrained.
  5. Filter quality issues: inspect coordinates, dates, basis of record, coordinate uncertainty, geospatial flags, and taxonomic issues.
  6. Deduplicate: remove repeated records only when you understand the source context.
  7. Account for sampling bias: do not treat record density as species abundance unless the dataset design supports that interpretation.
  8. Cite properly: include the GBIF download DOI, dataset citations, software citations, and date accessed.
  9. Report limitations: describe geographic, temporal, taxonomic, and methodological limitations clearly.
  10. Feed corrections back: when possible, notify publishers of clear errors rather than silently fixing only your local copy.

GBIF should be treated like infrastructure for evidence-based biodiversity work, not like a magic answer machine. It gives users access to enormous biodiversity evidence; the analysis still has to be scientifically careful.

Where GBIF Is Going

GBIF’s strategic direction for 2023–2027 focuses on science and research, policy and partnerships, community and capacity, technical infrastructure, and stronger data mobilization. Several trends stand out.

  • Richer data models: moving beyond simple occurrence records toward better support for surveys, samples, interactions, and monitoring datasets.
  • DNA-derived data: building pipelines for eDNA, metabarcoding, sequence-derived occurrence evidence, and reference libraries.
  • Cloud-scale analytics: making occurrence snapshots, Parquet outputs, SQL downloads, and data cubes easier to use for large analyses.
  • Policy-ready outputs: supporting biodiversity indicators, national reporting, invasive species work, protected-area planning, and global biodiversity frameworks.
  • Equity and capacity: strengthening nodes, training, and data mobilization in underrepresented regions and taxonomic communities.
  • Data governance: responding to debates around digital sequence information, Indigenous data governance, sensitive species, CARE principles, and equitable benefit sharing.

The next phase of GBIF is therefore not just “more records.” It is better context, better provenance, better uncertainty, better governance, and better translation of biodiversity data into decisions.

Quick Reference

QuestionAnswer
What does GBIF stand for?Global Biodiversity Information Facility.
What does GBIF mainly provide?Open access to biodiversity data, especially species occurrence records, through a global network and shared infrastructure.
What is an occurrence record?Evidence that a species or other taxon was recorded at a place and time.
What standard is central to GBIF publishing?Darwin Core, usually packaged as Darwin Core Archive for many datasets.
What tool do many publishers use?The Integrated Publishing Toolkit, a free open-source GBIF publishing tool.
Can I use GBIF data commercially?It depends on the record and dataset licences. Always check licence terms, especially CC BY-NC restrictions.
Should I use the API or downloads?Use API searches for exploration; use occurrence downloads for serious research so your data extract gets a DOI.
Does GBIF prove absence?No. A lack of records usually means lack of published evidence, not confirmed absence.
Is GBIF data population abundance data?Usually no. Occurrence density is influenced by sampling effort and should not be treated as abundance without appropriate survey design.