Explainer of Diffusion LLMs from Andrej Karpathy: “Most of the LLMs you’ve been seeing are ~clones as far as the core modeling approach goes. They’re all trained “autoregressively”, i.e. predicting tokens from left to right. Diffusion is different - it doesn’t go left to right, but all at once. You start with noise and gradually denoise into a token stream.”

  • mindbleach
    link
    fedilink
    arrow-up
    2
    ·
    3 days ago

    The premise is sort of hilarious. “Everybody’s just blindly copying this one kind of network. We made the bold decision to copy the other one.”