Figure: Qualitative comparison of ego-only ground truth, collaborative ground truth, predicted Gaussians, predicted occupancy, and the
zero-shot variant. Red boxes highlight occluded object structure captured by Gaussian primitives in collaborative setting. The
zero-shot variant can look plausible but often shows clustered redundancy and noise; black boxes mark cases where the neighborhood-based fusion suppresses redundancy and improves consistency. An opacity threshold is applied for display, so the predicted Gaussians are not exhaustive.
We also include a demo video, which demonstrates the continuous driving scene semantic occupancy prediction achieved by our method.
Comment: Proposes lightweight adapter-reverter pairs to transform Bird’s Eye View (BEV) features between agent-specific domains and a shared protocol domain