Tyler Perry Puts $800M Studio Expansion On Hold After Seeing OpenAI’s Sora: “Jobs Are Going to Be Lost”::Tyler Perry is raising the alarm about the impact of OpenAI’s Sora on Hollywood.
Tyler Perry Puts $800M Studio Expansion On Hold After Seeing OpenAI’s Sora: “Jobs Are Going to Be Lost”::Tyler Perry is raising the alarm about the impact of OpenAI’s Sora on Hollywood.
Sora can sometimes do 1 minute clips that mostly look ok as long as you don’t pay too close attention. We are incredibly far away from coherent, feature-length narratives and even those aren’t likely to be thematically interesting or engaging.
Yep. I watched their demo clips, and the “good” ones are full of errors, have lots of thematically incoherent content, and - this is the biggie - can’t be fixed.
Say you’re a 3D animator and build an animation with thousands of different assets and individual, alterable elements. Your editor comes to you and says, “This furry guy over here is looking in the wrong direction, he should be looking at the kangaroo king over there, but it looks like he’s just glaring at his own hand.”
So you just fix it. You go in, tweak the furry guy’s animation, and now he’s looking in the right direction.
Now say you made that animation with Sora. You have no manipulatable assets, just a set of generated frames that made the furry guy look in the wrong direction.
So you fire up Sora and try to fine-tune its instructions, and it generates a completely new animation that shares none of the elements of the previous one, and has all sorts of new, similarly unfixable errors.
If I use an AI assistant while coding, I can correct its coding errors. But you can’t just “correct” frames of video it has created. If you try, you’re looking at painstakingly hand-painting every frame where there’s an error. You’ll spend more time trying to fix an AI-generated animation that’s 90% good and 10% wrong than you will just doing the animation with 3D assets from scratch.
“Sora, regenerate $Scene153 with $Character looking at $OtherCharacter. Same Style.”
Or “Sora, regenerate $Scene153 from time mark X to time mark Y with $Character looking at $OtherCharcter. Same Style”.
It’s a new model, you won’t work with frames anymore you’ll work with scenes and when the tools get a bit smarter you’ll be working with scene layers.
“Sora, regenerate $Scene153 with $Character in Layer1 looking at $OtherCharacter in Layer2. Same Style, both layers.”
I give it 36 months or less before that’s the norm.
I agree, I don’t think people realise how early into this tech we are at the moment. There are going to be huge leaps over the next few years.
Or just “take the frame and replace the head with the same face pointed a different way”.
This seems like a fundamental misunderstanding of how generative AI works. To accomplish what you’re describing you’d need:
The whole system would need to be able to rewind to specific trouble spots, correct them, and still generate everything that comes after unchanged. We’re talking orders of magnitude more complexity and difficulty.
And in the meantime, artists creating 3D assets the regular way would suddenly look a lot less expensive and a lot less difficult.
If all you have is a hammer, everything looks like a nail. Right now, generative AI is everyone’s really attractive hammer. But I don’t see it working here in 36 months. Or 48. Or even 60.
The first 90% is easy. The last 10% is really fucking hard.
I’d imagine eventually we’re gonna get something like in painting.
Yeah you can.
Same way you can correct parts of a generated image, and have the generator go back and smooth it over again. Denoiser-based networks identify parts that don’t look right and nudge them toward the expectations of the model. Sora clearly has decent expectations for how things look and move. I would bet anything that pasting a static image of a guy’s head, facing the desired direction, will result in an equally-plausible shot with that guy facing the right way.
There have been image-generator demos where elements can be moved in real-time. Yeah, it has wider effects on the whole image, because this technology is a pile of hacks - but it’s not gonna turn red wallpaper into a green forest, or shift the whole camera angle. You’re just negotiating with a model that expects, if this guy’s facing this way now, his hands must go over here. Goofy? Yes. Ruinous? Nope.
And at the end of the day you can still have human artists modify the output, as surely as they can modify actual film of actual people. That process is not quick or cheap. But if your video was spat out by a robot, requiring no actors, sets, or animators, manual visual effects might be your entire budget.
Really - the studios that do paint-overs for Hollywood could be the first to make this tech work. They’d only need a few extra people to start from the first 90% of a movie.
And ironically when we do get to the point where an AI can string together a semi-coherent narrative, the first things it’ll start to produce will probably be exactly the sort of mid-level dross that Tyler Perry likes to make.
This won’t get used for key narrative content. This will be used to a lot of b-roll and the quick cuts that audiences don’t examine closely. A lot of a movie is content like that, and since the dawn of the effects industry, editors and effects artists have known that they can get away with janky stuff in certain places. The audience won’t know it’s there because they’re not watching the film frame by frame.
It seems pretty good with backgrounds though, and it’s only going to get better. I think the threats of job losses are a lot more imminent than people are ready to admit.
If you expect to type in “comedy movie oscar bait five stars” and have it spit out a finished MP4, then sure, that’s not happening any time soon.
But movies are composed of shots. Most shots are shorter than one minute. Narratives are constructed in the edit. Actors talking to one another don’t have to be in the same room… or alive at the same time. One composite wide shot and a bunch of jump cuts will stop the audience from even thinking about it.
This is going to be used for short films before summer, the same way image generators were used for comics. Both generally terrible - but mostly because the people leaping into it are boring, impatient, and just want to go ‘look what I made!’ while pointing at the parts they absolutely did not make. It’s the fancy version of saying ‘my characters look so cool!’ when your webcomic is made from stolen Mega Man sprites.
But considering we’re about eighteen months removed from 256x256 blobs that vaguely resemble an avocado chair, and Sora slaps down a variety of pessimistic timelines, it seems incomprehensible to bet against using this for worthwhile storytelling. Sora spits out half-decent shots from text alone. Video-to-video style transfer has been in research papers for like five years now - and unless this is a completely novel form of generative network, that means you can probably insert your own footage halfway into the process.
Some of these networks are denoisers. They remove the parts of the input that don’t look like the prompt. Starting from random noise is only the laziest way to get a finished output. Any blurry approximation of what you want, any blob-colored animatics, any 1 FPS storyboard, should guide the network to produce matching results.
What that does for Tyler Perry, I have no friggin’ idea. I was under the impression most of his movies could have been filmed at his house. (Alright damn, A Jazzman’s Blues must have taken some money.) We are not decades away from twenty-minute OVAs of sci-fi bullshit that would otherwise cost a fortune. It will be a matter of months.
Narrative will come first and foremost, because this technology frees writers and directors from needing studios… not vice-versa.