Category: ICCV

  • Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering

    Imad Eddine Marouf, Enzo Tartaglione, Stephane Lathuiliere, Joost van de Weijer Read Full Paper → Continual Learning in Visual Question Answering (VQACL) requires models to acquire new visual-linguistic skills (plasticity) while preserving previously learned knowledge (stability). The inherent multimodality of VQACL exacerbates this challenge, as models must balance stability across visual and textual domains while adapting to novel […]

  • Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing

    Taihang Hu, Linxuan Li, Kai Wang, Yaxing Wang, Jian Yang, Ming-Ming Cheng Read Full Paper → Text-to-image generation has seen groundbreaking advancements with diffusion models, enabling high-fidelity synthesis and precise image editing through cross-attention manipulation. Recently, autoregressive (AR) models have re-emerged as powerful alternatives, leveraging next-token generation to match diffusion models. However, existing editing techniques designed for diffusion models fail […]

  • ICICLE: Interpretable Class Incremental Continual Learning

    Dawid Rymarczyk, Joost van de Weijer, Bartosz Zieliński, Bartłomiej Twardowski Read Full Paper → Class-incremental learning is becoming more popular as it helps models widen their applicability while not forgetting what they already know. A trend in this area is to use a mixture-of-expert technique, where different models work together to solve the task. However, the experts are […]

  • Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection

    Liu Yuyang, Cong Yang, Goswami Dipam, Liu Xialei, Joost van de Weijer Read Full Paper → In incremental learning, replaying stored samples from previous tasks together with current task samples is one of the most efficient approaches to address catastrophic forgetting. However, unlike incremental classification, image replay has not been successfully applied to incremental object detection (IOD). In this […]

  • Generalized Source-free Domain Adaptation

    Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, Shangling Jui Read Full Paper → Domain adaptation (DA) aims to transfer the knowledge learned from a source domain to an unlabeled target domain. Some recent works tackle source-free domain adaptation (SFDA) where only a source pre-trained model is available for adaptation to the target domain. However, those methods […]

  • TransferI2I: Transfer Learning for Image-to-Image Translation from Small Datasets

    Yaxing Wang, Hector Laria Mantecon, Joost van de Weijer, Laura Lopez-Fuentes, Bogdan Raducanu Read Full Paper → Image-to-image (I2I) translation has matured in recent years and is able to generate high-quality realistic images. However, despite current success, it still faces important challenges when applied to small domains. Existing methods use transfer learning for I2I translation, but they still require […]

  • Multi-Modal Fusion for End-to-End RGB-T Tracking

    Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost van de Weijer, Fahad Shahbaz Khan Read Full Paper → We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness […]

  • SID4VAM: A Benchmark Dataset With Synthetic Images for Visual Attention Modeling

    David Berga, Xose R. Fdez-Vidal, Xavier Otazu, Xose M. Pardo Read Full Paper → A benchmark of saliency models performance with a synthetic image dataset is provided. Model performance is evaluated through saliency metrics as well as the influence of model inspiration and consistency with human psychophysics. SID4VAM is composed of 230 synthetic images, with […]

  • Active Learning for Deep Detection Neural Networks

    Hamed H. Aghdam, Abel Gonzalez-Garcia, Joost van de Weijer, Antonio M. López Read Full Paper → The cost of drawing object bounding boxes (i.e. labeling) for millions of images is prohibitively high. For instance, labeling pedestrians in a regular urban image could take 35 seconds on average. Active learning aims to reduce the cost of labeling by selecting […]

  • Learning the Model Update for Siamese Trackers

    Lichao Zhang, Abel Gonzalez-Garcia, Joost van de Weijer, Martin Danelljan, Fahad Shahbaz Khan Read Full Paper → Siamese approaches address the visual tracking problem by extracting an appearance template from the current frame, which is used to localize the target in the next frame. In general, this template is linearly combined with the accumulated template from the previous frame, […]