I am trying to learn the fine art of training models and WTF I am supposed to do with the squiggly lines on Tensorboard.
But I am doing this in the context of training Stable Diffusion LoRas so my sources are all highly suspicious text dumps from angry channers. And consequently there is lots of trial and error.
Important and possibly correct learnings:
The LoRas are really meant to be used at weight 1 apparently. But everybody overfits them and then tells you to back them off by downweighting them.
More training is not better, with like 30 or 50 training images throwing more than like half an hour of compute at it can overfit a lot.
You probably want a small LoRa, and you want to give it more repeats of the images and actually only a few epochs.
The loss is indeed supposed to go down over time, if it is not doing that you have broken something.
The learning rate scheduler is important, but I have yet to really get it right. I think starting fast and then slowing down is sort of required, but I think “fast” and “slow” vary by data set.
The model learns to rely on prompt structure to produce image structure. If the training prompts always say someone is making some kind of facial expression, but you don’t add any facial expression to the prompt when you use the model, it will magically lose its ability to draw a face. This is presumably overfiting but I don’t know how to fix it.
The tools for captioning data sets are terrible. Everything wants to make tag soup with no real sentences, which is not how people prompt the models.
Things I am still trying to learn:
How to take all this and make a LoRa that actually teaches the model something useful at weight 1.
Very relatable, some days I feel like my work is just staring at loss graphs going “wtf does it all mean” lol. Training models feels more like an art than a science to me most of the time (but I’m sure the meta-learning people are working on that).
I am trying to learn the fine art of training models and WTF I am supposed to do with the squiggly lines on Tensorboard.
But I am doing this in the context of training Stable Diffusion LoRas so my sources are all highly suspicious text dumps from angry channers. And consequently there is lots of trial and error.
Important and possibly correct learnings:
Things I am still trying to learn:
Very relatable, some days I feel like my work is just staring at loss graphs going “wtf does it all mean” lol. Training models feels more like an art than a science to me most of the time (but I’m sure the meta-learning people are working on that).