• noneabove1182OPM
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    I think the implication is more stating that this dataset is even more useful if you don’t jam the whole thing into your training but instead further filter it to a reasonable number of tokens, around 5T, and train on that subset instead

    I could be incorrect, cause they do explicitly say deduplicating, but it’s phrased oddly either way