Hello Tomer, Me (Itay Mohabati) and Shay Yosef, would like to present: ConsiStory: Training-Free Consistent Text-to-Image Generation.
TL;DR:
ConsiStory is a training-free approach that enables consistent subject generation in pretrained text-to-image models. It does not require finetuning or personalization, and as a result it takes ~10 seconds per generated image on an H100 (x20 faster than previous state-of-the-art methods).
Article: https://arxiv.org/pdf/2402.03286 - published on May 2024
Code: https://github.com/NVlabs/consistory
Research Page: https://research.nvidia.com/labs/par/consistory/