Thanks for the response!

Mar 25, 2024

Thanks for the response! And yeah fully agree I usually trained with around 10~20 tokens and it's very good at adapting to concepts although maybe with less fidelity than Dreambooth. Great point on using custom prompts for each image. I did notice that if I do this I can increase text alignment a bit but I still did notice the cross attention being destroyed even then. Like if I were to ever try combining multiple textual inversion concepts in one prompt, they destructively interfere with each other and the output becomes a nonsense image(but entirely possible I trained wrong)

Written by Isamu Isozaki

No responses yet