Isamu Isozaki
Mar 25, 2024

--

Thanks for the response! And yeah fully agree I usually trained with around 10~20 tokens and it's very good at adapting to concepts although maybe with less fidelity than Dreambooth. Great point on using custom prompts for each image. I did notice that if I do this I can increase text alignment a bit but I still did notice the cross attention being destroyed even then. Like if I were to ever try combining multiple textual inversion concepts in one prompt, they destructively interfere with each other and the output becomes a nonsense image(but entirely possible I trained wrong)

--

--

No responses yet