The NYT might win some money based on what Microsoft published, but only to the same extent as if a human wrote that and Microsoft published it. Copyright will never be an issue for training data because training is just scanning text and guessing the next letter. Consuming an entire library to make up anything you ask for is pretty goddamn transformative.
Oh, does the model know the names of characters in a popular book? So do Google and Wikipedia. Try framing a law that’s cool with Google having a whole searchable plain-text copy of a book, so it can go ‘this book?’ when you search for a quote, but forbids OpenAI from having the essence of that book distilled somewhere in its terabyte of inscrutable numbers.
Never gonna happen.
The NYT might win some money based on what Microsoft published, but only to the same extent as if a human wrote that and Microsoft published it. Copyright will never be an issue for training data because training is just scanning text and guessing the next letter. Consuming an entire library to make up anything you ask for is pretty goddamn transformative.
Oh, does the model know the names of characters in a popular book? So do Google and Wikipedia. Try framing a law that’s cool with Google having a whole searchable plain-text copy of a book, so it can go ‘this book?’ when you search for a quote, but forbids OpenAI from having the essence of that book distilled somewhere in its terabyte of inscrutable numbers.
This fight is over.