‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says

V H@lemmy.stad.social · 1 year ago

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says

Dr. Dabbles@lemmy.world · 1 year ago

Sounds like they’re desperate to convince people that copyright law shouldn’t apply to only them. Sorry, but that’s not going to work. License the content you’re making money from, or don’t use it.

V H@lemmy.stad.social · 1 year ago

Possibly. On the other hand, OpenAI’s market cap is bigger than the ten largest publishers combined - despite their whining they can afford to. It’s not OpenAI that will be prevented from getting training data - the biggest impact will be that it might stop smaller competitors and prevent open-source models.

TootSweet@lemmy.world · 1 year ago

OpenAI’s market cap is bigger than the ten largest publishers combined

Only until the AI bubble bursts, I expect.

V H@lemmy.stad.social · 1 year ago

Why do you think anything will “burst”? If anything, if licensing requirements for contents makes training expensive it’s likely to make the biggest existing players far more valuable.

TootSweet@lemmy.world · 1 year ago

If the courts decide that copyright already required licensing before LLMs started being much of a thing, then that will hit existing big players probably harder than newer smaller players.

But that’s not why I think AI is a bubble that will soon burst. I don’t think I can put it more elequently than Cory Doctorow does in the opening paragraph of this article:

Of course AI is a bubble. It has all the hallmarks of a classic tech bubble. Pick up a rental car at SFO and drive in either direction on the 101 – north to San Francisco, south to Palo Alto – and every single billboard is advertising some kind of AI company. Every business plan has the word “AI” in it, even if the business itself has no AI in it. Even as two major, terrifying wars rage around the world, every newspaper has an above-the-fold AI headline and half the stories on Google News as I write this are about AI. I’ve had to make rule for my events: The first person to mention AI owes everyone else a drink.

AI (or at least some “AI” algorithms) isn’t completely useless. In the right hands and not misused, it’s been used to great effect for quite a long time. But AI is currently overvalued in the market and underdelivering on the recently ubiquitous fantastical claims about it. OpenAI’s market cap is artificially inflated by hype. And hype is a finite resource.

V H@lemmy.stad.social · 1 year ago

Bubble in the sense that “many companies will fail” we can agree on. Companies like OpenAI will survive - lawsuits or not - and even if they were to fail due to the lawsuits the algorithms are known and e.g. Microsoft, who has a license to the tech would just hire the team and start over and let the corporate entity go bankrupt.

But all of the “ChatGPT for field X” companies that are just razor-thin layers on top of OpenAI’s API, sure, they will almost all fail, and the only ones of them that won’t will be the ones that leverages initial investment into an opportunity to quickly pivot into something more substantial.

A lot of people talk about AI as a bubble in the sense of believing the tech will go away, though, and that will never happen, because it’s useful enough.

Regarding OpenAI’s market cap, I don’t agree - I think it’ll increase far more, unless they massively misstep, because even though it’s riding high on hype, they also still have big lead not down to their hype but down to actually being significantly ahead of even competitors like Google, and given the high P/E ratios in tech they don’t need to be the backend all that many big deployments behind big companies even just to field really stupid-simple uses that don’t really need the capabilities of GPT before they’ll justify that valuation.

TootSweet@lemmy.world · 1 year ago

it’s useful enough.

To whom in what endeavor, though?

You can’t really trust anything an LLM says because of hallucinations. What’s the use case for an algorithm that gives you convincingly-worded but very likely false answers to your questions? Or writes professional-sounding documents filled with lies?

And if you’ve got people fact checking your LLM’s output, is the LLM really benefitting anybody?

We haven’t found an algorithm yet that a) is general purpose, b) produces trustworthy output, and c) doesn’t require specialized skills or babysitting to operate. And the current algorithms can’t really be retrofitted to make them fit these criteria.

ChatGPT is a cool parlor trick. But the first actually useful “AI” chat bot won’t run on the same algorithms or principles as ChatGPT.

V H@lemmy.stad.social · 1 year ago

You can’t really trust anything a human says because we’re frequently wrong yet convinced we’re right, or not nearly as competent as we think, yet we manage, because in a whole lot of endeavours being right often enough and being able to verify answers is sufficient.

There are plenty of situations where they are “right enough” and/or where checking the output is trivial enough. E.g. for software development, where I can easily tell if the output is “right enough”, and where humans are often wrong, and where we rely on tests to verify correctness anyway.

Having to cross-check results is a nuisance, but when I can e.g. run things past it on subjects I know well enough to tell if the answers are bullshit and where it can often produce answers better than a lot of actual software developers, it’s worth it. E.g. I recently had it give me a refresher on the algorithms to convert an Non-deterministic finite automata (NFA) to a deterministic finite automata (DFA) and it explained it perfectly (which is not a surprise; there will be plenty of material on that subject), but unlike if I’d just looked it up in google, I could also construct examples to test that I remembered it right and have it produce the expected output (which, yes, I verified was correct).

I also regularly has it write full functions. I have a web application where it has written ca 80% of the code without intervention from me. Plenty of my libraries now have functions it has written.

I use it regularly. It’s saving me more than enough time to justify both the subscription to ChatGPT and API fees for other use.

As such, it is “actually useful” for me, and for many others.

Dr. Dabbles@lemmy.world · 1 year ago

I couldn’t care less what their market cap is, it’s a scam. Ponzi schemes are incredibly valuable until they aren’t

This BS is an obvious attempt to astroturf Lemmy for the benefit of a corporation, and anybody falling for it is an easy mark.

V H@lemmy.stad.social · 1 year ago

Lol, what. OpenAI shares aren’t available - there’d be no benefit to anyone trying to pump them.

Dr. Dabbles@lemmy.world · 1 year ago

…except if they IPO or someone sells their shares on the secondary market. You can sell shares without being on a public exchange. Not doing much to dissuade me from my opinion that this is all a shitty effort at Open AI astroturfing.

V H@lemmy.stad.social · 1 year ago

Except if they were it’d be well known, and no startup typically has contracts that doesn’t involve approvals for secondary sales at this kind of early stage because increasing the number of people on the cap table enough triggers nearly the same reporting requirements as being public, and is a massive burden. Just doesn’t work that way.

It’s also hilarious that you take posting an article that is at best neutral, with a message of doom and gloom about risks to their business, on Lemmy is something OpenAI would have any interest in. If I wanted to pump OpenAI there are better places to do it, and more positive spins to put on it.

humorlessrepost@lemmy.world · edit-2 1 year ago

What human author hasn’t read and been inspired by existing copyrighted works?

It’s not even that uncommon for humans to accidentally copy them too closely later on.

polyploy@lemmy.dbzer0.com · 1 year ago

Machines don’t have inspiration, they are not people. They do not make decisions based upon artistic choice or aesthetic preference or half-remembered moments, they are plagiarism machines trained on millions of protected works designed for the explicit purpose of putting all those who created what it copies out of work.

In a vacuum AI tools are as harmless and benign as you want them to be, but in reality they are disastrously harmful to the environment to train, and they are already ruining the livelihoods of human creators who actually make art.

V H@lemmy.stad.social · 1 year ago

Whenever I see them described as “plagiarism machines”, odds are about 99% that the person using the term have no idea how these models work. Like with humans, they can overfit, but most of what they output will have have far less in common with any individual work than levels of imitations people engage in without being accused of plagiarism all the time.

As for the environmental effects, it’s a totally ridiculous claim - the GPUs used to train even the top of the line ChatGPT models adds up to a tiny rounding error of the power use of even middling online games, and training has only gotten more efficient since.

E.g. researchers at Oak Ridge National Labs published a paper in December after having trained a GPT4 scale model with only 3k GPUs on the Frontier supercomputer using Megatron-DeepSpeed. 3k GPUs is about 8% of Frontiers capacity, and while Frontier is currently fastest, there are hundreds of supercomputers at that kind of scale publicly known about, and many more that are not. Never mind the many millions of GPUs not part of any supercomputer.

neoinvin@lemm.ee · edit-2 10 months ago

nothing wrong with humans doing it. It’s yet to be determined whether machines should be able to.

linearchaos@lemmy.world · 1 year ago

I fully agree with you. I mean, even search engines are fully reliant on the ingest and storage of copyrighted material.

Of course the elephant in the room is how do we stop multi-billion dollar companies from advancing the technology significantly enough to put artists, programmers, writers and the like out of business.

V H@lemmy.stad.social · 1 year ago

You can’t. The cat is out of the bag. The algorithms are well understood, and new papers on ways to improve output of far smaller models come out every day. It’s just a question of time before training competitive models will be doable for companies in a whole range of jurisdictions entirely unlikely to care.

CriticalMiss@lemmy.world · 1 year ago

good