SA-FARI — 11,609 Videos, 99 Species Categories, and a New Open Benchmark for Wildlife AI

For years, the bottleneck in AI for conservation has not been the algorithms. It has been the data. Training a computer vision model to detect, classify, segment, or track wildlife requires large numbers of verified examples — footage where animals are labelled, outlined, and followed across frames. Camera-trap projects generate this raw material every day, but annotation is slow, expensive, and difficult to standardise across species, continents, seasons, and field conditions.

SA-FARI is a major attempt to close that gap. The dataset, built by Conservation X Labs and Meta with a coalition of research and conservation partners, contains 11,609 camera-trap videos, 99 species categories, 16,224 masklet identities, and 942,702 individual bounding boxes, segmentation masks, and species labels. The videos span approximately ten years of field observations, from 2014 to 2024, across 741 independent sampling locations on four continents.

The full title is

Segment Anything in Footage of Animals for Recognition and Identification

. The name is important: SA-FARI is not only a wildlife image collection. It is a video benchmark for multi-animal tracking — the task of finding every relevant animal, preserving its identity through time, and handling the messy conditions that real camera traps produce.

What Is SA-FARI?

SA-FARI is an open-source multi-animal tracking dataset for wild animals. Each video-species pair is annotated with a spatio-temporal segmentation of all animals belonging to that category. Instead of only drawing one box around an animal in a single frame, the dataset provides masks and tracklets that preserve animal identity across video frames.

The dataset was built using Meta’s Segment Anything Model 3 (SAM 3), a model designed to detect, segment, and track objects in images and videos from text or visual prompts. In the SA-FARI paper, SAM 3 is evaluated both with species-specific prompts and with generic animal prompts, then compared with wildlife-focused detector-plus-tracker baselines such as MegaDetector combined with ByteTrack, OCSort, and BoostSort++.

Metric	Training	Test	Total
Videos	10,776	833	11,609
Duration	2,545 min	202 min	2,747 min
Species categories	91	83	99
Masklets	15,141	1,083	16,224
Annotations	880,361	62,341	942,702
Video-species pairs	31,282	2,322	33,604
Independent sampling locations	650	91	741

The authors use the term species categories carefully. Labels are based on common names confirmed by local experts and mapped into taxonomy where possible. In aggregate, the dataset spans 4 classes, 23 orders, 53 families, and 82 genera. Like real field data, it is long-tailed: a few animals appear often, while many are rare. The paper reports that 29 species categories account for 90% of the data, with spider monkeys, collared peccaries, and agoutis among the most common.

Camera-trap footage being prepared for AI training — the raw material behind SA-FARI — Photo by Ali Kazal on Pexels

Open, But Not Public Domain

SA-FARI is publicly listed on Hugging Face, but it should be described accurately. The dataset card lists the license as CC-BY-NC 4.0, which allows sharing and adaptation with attribution for non-commercial purposes. The Hugging Face page also requires users to log in and agree to share contact information before accessing the dataset content.

That still makes SA-FARI unusually open for conservation AI, where many camera-trap datasets remain private because of sensitive species locations, conservation risk, or partner restrictions. The published version includes anonymized camera-trap location identifiers rather than precise public coordinates, balancing reproducibility with field safety.

The Hugging Face repository contains the annotation files. The original videos and preprocessed 6 fps JPEG frames are hosted separately in a public Google Cloud Storage bucket referenced from the dataset card. The annotations follow a format similar to YouTube-VIS, with fields for videos, categories, annotations, and video-category pairs.

Who Built It

Conservation X Labs is a Washington, DC-based nonprofit founded in 2015 by Dr. Alex Dehgan and Dr. Paul Bunje. Its model combines conservation fieldwork, prizes, technology development, and data platforms. In Meta’s November 2025 profile of the SA-FARI collaboration, CXL is described as having hosted 20 innovation challenges, provided more than $12 million in funding to breakthrough solutions, re-identified more than 299,000 animals, and supported the expansion of nearly 400,000 acres of protected areas.

SA-FARI was built by Conservation X Labs and Meta with footage and expertise from five additional partner organizations: Osa Conservation in Costa Rica, Los Amigos Biological Station in Peru, the Pan African Programme, the Institute for Game and Wildlife Research in Spain, and the Instituto Mixto de Investigación en Biodiversidad. The broader author list also includes researchers from the University of Bristol, Hasso Plattner Institute, University of Oviedo, Senckenberg Museum of Natural History, Max Planck Institute for Evolutionary Anthropology, Climate Corridors, CXL, and Meta.

This matters because a general wildlife model cannot be trained from one reserve, one country, or one charismatic species. SA-FARI’s value comes from ecological variety: tropical rainforest, savanna, temperate woodland, day and night footage, single animals and groups, small masks, occlusion, and the long-tail distribution that conservationists actually see in the field.

Wildlife in a natural landscape, representing the diversity of camera-trap environments included in SA-FARI — Photo by Thilina Alagiyawanna on Pexels

Why This Matters

Three things make SA-FARI significant beyond its headline size.

First, it is video. Camera-trap AI has historically been dominated by still-image detection and classification. Video captures movement, interactions, gait, occlusion, and behaviour. Multi-animal tracking is the bridge between knowing that an animal appeared and understanding what that animal did over time.

Second, it provides segmentation masks and tracklets. A bounding box can tell a model roughly where an animal is. A mask tells the model which pixels belong to the animal. A tracklet preserves that animal’s identity across frames. Together, these annotations support stronger benchmarks for detection, segmentation, tracking, behaviour analysis, and eventually individual re-identification.

Third, it reflects real-world difficulty. Camera-trap data is messy. Animals enter partly cropped, overlap each other, appear at night, move quickly, or occupy a tiny fraction of the frame. The SA-FARI paper reports that small masklets are substantially harder to detect and track than large ones, and that challenging motion or occlusion makes tracking harder even when detection remains possible.

For context on how AI fits into the broader camera-trap workflow — from deployment to analysis — see our camera trap software comparison. SA-FARI is the kind of benchmark that can improve tools such as MegaDetector, Wildlife Insights, Wildbook, and other conservation AI systems that rely on robust detection and tracking.

Wildlife camera footage scene, illustrating why video contains temporal information that still images miss — Photo by Ryan Beirne on Pexels

What the Benchmarks Show

The SA-FARI paper tests both species-specific and species-agnostic tracking. In the species-specific setting, SAM 3 trained or fine-tuned with SA-FARI substantially outperforms the baseline SAM 3 model. In the species-agnostic setting, SAM 3 trained on SA-FARI also outperforms MegaDetector paired with common tracking algorithms such as ByteTrack, OCSort, and BoostSort++.

The important takeaway is not that one model permanently wins. It is that wildlife video is a distinct domain. General-purpose computer vision models can be powerful, but they still need field-relevant, geographically diverse, species-rich training and evaluation data. SA-FARI gives researchers a common basis for measuring progress rather than comparing results across incompatible private datasets.

A wild animal in natural habitat, representing the field conditions that camera-trap AI must handle — Photo by Amar Preciado on Pexels

What This Enables

A dataset like SA-FARI is not an end product. It is infrastructure. Here is what it makes possible:

Better wildlife video models. Researchers can train and evaluate detection, segmentation, and tracking systems on a shared benchmark instead of relying only on private field datasets.

Behavioural analysis. Many conservation questions are temporal: feeding, social interaction, avoidance, predator-prey dynamics, disease signals, and animal responses to human disturbance. Video tracklets are a foundation for building those downstream models.

Open-world recognition. Several species categories are deliberately present only in the test split. That forces models to face a real deployment problem: a camera trap will eventually record animals the model has not seen during training.

Lower barriers for conservation organizations. Smaller NGOs and field teams may not have the budget to annotate thousands of videos from scratch. SA-FARI provides a starting point for building or fine-tuning systems that are better aligned with field conditions.

Caveats for Responsible Use

SA-FARI is a benchmark, not a finished conservation product. Models trained on it still need local validation before they are used for management decisions. Species distributions, camera hardware, habitat, season, weather, and field protocols can all shift model performance.

The non-commercial license also matters. SA-FARI can accelerate research and conservation prototyping, but commercial users need to review the CC-BY-NC 4.0 terms carefully. Sensitive species data should also remain protected: anonymized locations reduce risk, but downstream users should avoid publishing precise locations for threatened or trafficked species.

Finally, AI should reduce annotation burden, not remove ecological expertise. The best use of SA-FARI is as a shared foundation: models do the repetitive work faster, while local experts remain responsible for taxonomy, survey design, uncertainty, and conservation interpretation.

Get the Dataset

SA-FARI is listed on Hugging Face at huggingface.co/datasets/facebook/SA-FARI. The dataset card describes the license, access conditions, file layout, annotation format, and links to the public cloud storage location for videos and 6 fps JPEG frames.

The paper is available at arxiv.org/abs/2511.15622 and as a CVPR 2026 open-access paper. It documents the data collection, annotation pipeline, train/test split, taxonomy, benchmark settings, and evaluation results.

Conservation X Labs continues to build open infrastructure for biodiversity monitoring. SA-FARI sits naturally alongside its work on Wild Me and Wildbook for animal re-identification, Sentinel for protected area monitoring, and other conservation technology programs. If SA-FARI is the dataset layer, those tools are part of the application layer.