Abstract: Recent advancements in pre-trained language-image models have ushered in a new era of visual comprehension. Leveraging the power of these models, this article tackles two issues within the ...