• 31337
      link
      fedilink
      arrow-up
      2
      ·
      3 months ago

      Likely transformers now (I think SD3 uses a ViT for text encoding, and ViTs are currently one of the best model architectures for image classification).