Abstract: Advancements in multimodal learning have experienced rapid growth over the past decade, particularly within various domains, with a significant emphasis on developments in computer vision.