This paper develops a joint 2D-3D learning approach to reconstruct a local metric-semantic mesh using an
initialization and refinement stage. In the initialization stage, we estimate the mesh vertex elevation by solving a
least squares problem relating the vertex barycentric coordinates to the sparse depth measurements. In the
refinement stage, we associate 2D image and semantic features with the 3D mesh vertices using camera projection and
apply graph convolution to refine the mesh vertex spatial coordinates and semantic features based on joint 2D and 3D
The initialization step uses only the sparse depth measurements
The Vertex-Image Alignment is the key step for joint 2D-3D learning. Each of the 3D mesh vertex
retrieves the feature from the multi-layer 2D image features maps.
Use the same projection idea in Vertex-Image Alignment to initialize the mesh vertex semantic features.
Here we show the rendered depth map from the reconstructed mesh. SD-tri performs 2D Delaunay triangulation on
the sparse depth measurements. Initialized is the mesh with only initialization step (RGB not used).
Refined is the final mesh output of our algorithm.
Concatenating several local mesh to build a large terrain map.
Results with semantic information.
Metric-Semantic Reconstruction for large-scale scene with local meshes combined.
We gratefully acknowledge support from NSF NRI CNS-1830399.
This webpage template was borrowed from Thai