AerialMetric: Benchmarking and Adapting UAV Monocular Metric Depth Estimation in the Real World

1Sun Yat-sen University, Shenzhen Campus, 2FNii-Shenzhen, 3SSE, CUHKSZ, 4SIAS, USTC
ECCV 2026

Teaser

AerialMetric benchmark overview and qualitative metric depth comparison

Abstract. This paper addresses the problem of monocular metric depth estimation in aerial UAV imagery. Although recent data-driven methods have achieved remarkable progress in ground-level scenarios, models trained primarily on street-view and indoor datasets exhibit significant domain gaps when applied to aerial viewpoints.

To tackle these challenges, we introduce AerialMetric, a benchmark dataset designed to evaluate and facilitate the adaptation of monocular metric depth estimation under UAV aerial viewpoints. The dataset consists of four complementary subsets collected from different sources, jointly covering real-world photogrammetry data, controlled aerial acquisition settings, photorealistic synthetic scenes, and in-the-wild Internet imagery. Totally, AerialMetric provides 52K real-world and 16K synthetic image-depth pairs with reliable metric ground truth.

Based on this dataset, we conduct systematic evaluations of existing state-of-the-art models under aerial settings and investigate the impact of viewpoint, altitude, and camera parameters on metric depth prediction. In addition, by fine-tuning representative metric depth models on our dataset, we establish a comprehensive aerial benchmark and achieve state-of-the-art performance across diverse aerial imagery. Our datasets will be made publicly available.

Projected 3D Point Maps Reveal Scale Drift

We project GT, MoGe2, and MoGe2-Aerial depth into 3D, making their metric scale differences directly visible.

Input RGB Pick P1, P2, or P3.
RGB scene for the 3D depth demo
GT
MoGe2
MoGe2-Aerial

Data Collection Pipeline

A hybrid construction pipeline turns real captures, reconstructed scenes, and synthetic views into reliable metric depth supervision.

AerialMetric data collection pipeline

The AerialMetric Dataset

To address the severe domain gap between ground-level training data and aerial perspectives, we introduce AerialMetric, built through a hybrid construction pipeline. It comprises four complementary components:

Qualitative Results

Zero-shot transfer of state-of-the-art ground-domain models (e.g., ZoeDepth, DepthPro, UniDepthV2) to aerial imagery often results in severe scale ambiguity and geometric distortion. By adapting the MoGe2 foundation model on our dataset (denoted as MoGe2-Aerial), we recover more reliable metric scale in real-world aerial scenes.

Decoupled Flight Parameter Analysis & Robustness

Parameters robustness: Baseline models exhibit severe, non-monotonic degradation regarding camera pitch and altitude, suffering catastrophic failures at strictly nadir (-90°) and highly oblique (-45°) angles. Our fine-tuned MoGe2-Aerial demonstrates exceptional stability across all evaluated pitch angles, altitudes, and FOVs.

Robustness analysis across camera pitch

BibTeX

@inproceedings{song2026aerialmetric,
  title     = {AerialMetric: Benchmarking and Adapting UAV Monocular Metric Depth Estimation in the Real World},
  author    = {Song, Zhongqiang and Chen, Guanying and Zhang, Yuqi and Zou, Yin and Fu, Chuanyu and Yuan, Zhiyuan and Huang, Chuan and Cui, Shuguang and Cao, Xiaochun},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2026}
}