Anomagic: Crossmodal Prompt-driven Zero-shot Anomaly Generation

Abstract

We propose Anomagic, a zero-shot anomaly generation method that produces semantically coherent anomalies without requiring any exemplar anomalies. By unifying both visual and textual cues through a crossmodal prompt encoding scheme, Anomagic leverages rich contextual information to steer an inpainting‐based generation pipeline. A subsequent contrastive refinement strategy enforces precise alignment between synthesized anomalies and their masks, thereby bolstering downstream anomaly detection accuracy.

To facilitate training, we introduce AnomagicDataset, a collection of 12,987 anomaly–mask–caption triplets assembled from 13 publicly available datasets, where captions are automatically generated by multimodal large language models using structured visual prompts and template‐based textual hints. Extensive experiments demonstrate that Anomagic trained on AnomagicDataset can synthesize more realistic and varied anomalies than prior methods, yielding superior improvements in downstream anomaly detection. Furthermore, Anomagic can generate anomalies for any normal‐category image using user‐defined prompts, establishing a versatile foundation model for anomaly generation.

Method

Overall Framework of Anomagic

The crossmodal prompt-driven zero-shot anomaly generation pipeline with contrastive refinement.

Zero-shot Generation Results under Prompts in AnomVerse

Industrial Datasets (VisA & MVTecAD)

Synthesized anomalies on objects like PCBs and capsules, with precise defect localization.

Medical Imaging

Anomalies in brain tumors (BraTS) and retinal scans (OCT), preserving anatomical fidelity.

BraTS and OCT [deVerdier et al. 2024, Kermany et al. 2018]: Brain MRI and Retinal OCT with enhanced tumor region

Web-Crawled Real-World Objects

Diverse everyday items with realistic anomalies, sourced from web images.

Zero-shot Generation Results under User Defined Prompts

Industrial Datasets (VisA & MVTecAD)

Anomaly generation results on industrial objects with user-defined prompts.

Medical Imaging

Medical anomaly generation with custom prompt guidance.

Web-Crawled Real-World Objects

Generation results on everyday items with user-specified anomaly descriptions.

BibTeX

@article{anomagic2025,
  title={Anomagic: Crossmodal Prompt-driven Zero-shot Anomaly Generation},
  author={Jiang, Yuxin and Luo, Wei and Zhang, Hui and Chen, Qiyu and Yao, Haiming and Shen, Weiming and Cao, Yunkang},
  journal={arXiv preprint arXiv:2511.10020},
  year={2025}
}

[AAAI 2026] Anomagic: Crossmodal Prompt-driven Zero-shot Anomaly Generation

Abstract

Method

Overall Framework of Anomagic

Zero-shot Generation Results under Prompts in AnomVerse

Industrial Datasets (VisA & MVTecAD)

Medical Imaging

Web-Crawled Real-World Objects

Zero-shot Generation Results under User Defined Prompts

Industrial Datasets (VisA & MVTecAD)

Medical Imaging

Web-Crawled Real-World Objects

BibTeX