AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering
arXiv:2603.09689v2 Announce Type: replace-cross Abstract: Visual Question Answering (VQA) is a fundamental multimodal task that requires models to jointly understand…
