In the age of virtual cocreation of value by consumers, the role of the content modality in the development of social capital has been largely overlooked. Given that different modalities lead to ...
description [ICCV 2025][Object Detection][Visual Prompt] This paper proposes ModPrompt, an encoder-decoder-based visual prompting strategy that adapts vision-language object detectors (e.g., ...
Audio-visual learning has been a major pillar of multi-modal machine learning, where the community mostly focused on its modality-aligned setting, i.e., the audio and visual modality are both assumed ...
Sign languages (SLs), as natural human languages, operate within the visual-gestural modality, setting them apart from the oral-auditory systems of spoken languages. While SLs share universal ...
This paper tackles the domain of multimodal prompting for visual recognition, specifically when dealing with missing modalities through multimodal Transformers. It presents two main contributions: (i) ...
Abstract: Partially Relevant Video Retrieval (PRVR) aims to retrieve videos that match a given textual query only partially. This task is inherently challenging due to the modality gap between text ...
Average decoding scores for modality-agnostic decoders (green), compared to modality-specific decoders trained on data from subjects viewing images (orange) or on data from subjects viewing captions ...