Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives
A Resource-Efficient Text-to-Image Prior for Image Generations
Weight Modulation for User Attribution and Fingerprinting in T2I Models.