MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

Abstract

Perception systems in modern autonomous driving vehicles typically take inputs from complementary multi-modal sensors, e.g., LiDAR and cameras. However, in real-world applications, sensor corruptions and failures lead to inferior performances, thus compromising autonomous safety.

In this paper, we propose a robust framework, called MetaBEV, to address extreme real-world environments, involving overall six sensor corruptions and two extreme sensor-missing situations. In MetaBEV, signals from multiple sensors are first processed by modal-specific encoders. Subsequently, a set of dense BEV queries are initialized, termed meta-BEV. These queries are then processed iteratively by a BEV-Evolving decoder, which selectively aggregates deep features from either LiDAR, cameras, or both modalities. The updated BEV representations are further leveraged for multiple 3D prediction tasks. Additionally, we introduce a new M^2oE structure to alleviate the performance drop on distinct tasks in multi-task joint learning.

Finally, MetaBEV is evaluated on the nuScenes dataset with 3D object detection and BEV map segmentation tasks. Experiments show MetaBEV outperforms prior arts by a large margin on both full and corrupted modalities. For instance, when the LiDAR signal is missing, MetaBEV improves 35.5% detection NDS and 17.7% segmentation mIoU upon the vanilla BEVFusion model; and when the camera signal is absent, MetaBEV still achieves 69.2% NDS and 53.7% mIoU, which is even higher than previous works that perform on full-modalities. Moreover, MetaBEV performs fairly against previous methods in both canonical perception and multi-task learning settings, refreshing state-of-the-art nuScenes BEV map segmentation with 70.4% mIoU.

Method

Figure 1. An overview of MetaBEV framework. The multi-modal inputs are separately processed by the camera encoder ϕ_c(·) and LiDAR encoder ϕ_l (·) to produce the BEV representations B_c , B_l . To generate the fused BEV features, a BEV-Evolving decoder takes multi-modal BEV representations and an external initialized meta-BEV feature (as a query feature) for correlation computation. Task specific heads take the fused features for 3D detection.

Experiments

Table 1. Comparisons with SoTA methods on nuScenes val set. We use -C and -T to denote equipping MetaBEV with the CenterPoint head and Transfusion head. MTL stands for testing multi-tasks with the same model. $\dagger$ and $\ddagger$ stand for separating or sharing the BEV feature encoder, respectively. MetaBEV outperforms the SoTA multi-modal fusion methods by +4.7\% mIOU on nuScenes(val) BEV map segmentation and achieves comparable 3D object detection performance. MetaBEV also performs best in multi-task learning.

Table 2. Experimental comparisons on extreme sensor missing. MetaBEV is able to totally drop the features from the missing modalities for inference, while others cannot. We attempt to replace the missing features with zero in other works so that they can output results, which are colored as blue. MetaBEV still consistently outperforms prior works when facing extreme sensor absence.

Table 3. Experimental comparisons on sensor corruptions with various degrees. Texts in blue denote the specific corruption degrees. MetaBEV consistently outperforms BEVFusion on various sensor corruptions in both zero-shot and in-domain tests.

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

We propose MetaBEV to efficiently solve sensor failures (inluding large-scale sensor corruption and extreme sensor missing) and tasks conflict in BEV perception.

Abstract

MetaBEV on LiDAR Failures

Limited Field (LF) 300 degree

Limited Field (LF) 180 degree

Limited Field (LF) 120 degree

Beams Reduction (BR) 16 beams

Beams Reduction (BR) 8 beams

Beams Reduction (BR) 4 beams

Missing of Objects (MO) 0.1 rate

Missing of Objects (MO) 0.5 rate

Missing of Objects (MO) 1.0 rate

MetaBEV performs robust on LiDAR-failure cases, including Limited Field, Beams Reduction and Missing of Objects.

MetaBEV on Camera Failures

View Drop (VD) 1 view

View Drop (VD) 3 views

View Drop (VD) 6 views

View Noise (VN) 1 view

View Noise (VN) 3 views

View Noise (VN) 6 views

Obstacle Occlusion (OO)

MetaBEV performs robust on Camera-failure cases, including View Drop, View Noise and Obstacle Occlusion.

MetaBEV on Extreme Senor Missing

Missing LiDAR

Missing Camera

Missing LiDAR

Missing Camera

MetaBEV performs robust on Camera-failure cases, including Missing LiDAR, and Missing Camera.

Method

Experiments