TerrainMesh: Metric-Semantic Terrain Reconstruction
from Aerial Images Using Joint 2D-3D Learning

Qiaojun Feng
Nikolay Atanasov
Department of Electrical and Computer Engineering
Contextual Robotics Institute
University of California San Diego
IEEE T-RO and ICRA 2021

[Colab Demo]
[arXiv (T-RO)]
[arXiv (ICRA 2021)]

This paper develops a joint 2D-3D learning approach to reconstruct a local metric-semantic mesh using an initialization and refinement stage. In the initialization stage, we estimate the mesh vertex elevation by solving a least squares problem relating the vertex barycentric coordinates to the sparse depth measurements. In the refinement stage, we associate 2D image and semantic features with the 3D mesh vertices using camera projection and apply graph convolution to refine the mesh vertex spatial coordinates and semantic features based on joint 2D and 3D supervision.

Video for IEEE T-RO (3 min)
(journal version)

Presentation for ICRA 2021 (conference version)


Inputs: RGB image + Sparse depth measurements
Outputs: 3D metric-semantic mesh


Initialization (only sparse depth) + Refinement (sparse depth + RGB image)

The initialization step uses only the sparse depth measurements

The Vertex-Image Alignment is the key step for joint 2D-3D learning.
Each of the 3D mesh vertex retrieves the feature from the multi-layer 2D image features maps.

Use the same projection idea in Vertex-Image Alignment to initialize the mesh vertex semantic features.


Here we show the rendered depth map from the reconstructed mesh. SD-tri performs 2D Delaunay triangulation on the sparse depth measurements. Initialized is the mesh with only initialization step (RGB not used). Refined is the final mesh output of our algorithm.

Concatenating several local mesh to build a large terrain map.

Results with semantic information.

Metric-Semantic Reconstruction for large-scale scene with local meshes combined.


We gratefully acknowledge support from NSF NRI CNS-1830399.
This webpage template was borrowed from Thai Duong.