Was video proof? Used to be... look at Google's new AI - Critical summary review - 12min Originals
×

New Year, New You, New Heights. 🥂🍾 Kick Off 2024 with 70% OFF!

I WANT IT! 🤙
70% OFF

Operation Rescue is underway: 70% OFF on 12Min Premium!

New Year, New You, New Heights. 🥂🍾 Kick Off 2024 with 70% OFF!

107 reads ·  4 average rating ·  39 reviews

Was video proof? Used to be... look at Google's new AI - critical summary review

translation missing: en.categories_name.radar-12min

This microbook is a summary/original review based on the book: 

Available for: Read online, read in our mobile apps for iPhone/Android and send in PDF/EPUB/MOBI to Amazon Kindle.

ISBN: 

Publisher: 12min

Critical summary review

Picture this. You record a short clip in your living room... ten seconds, nothing more. Then you open an app, type a sentence, and within moments your living room has become a desert at sunset, your shirt has become a military uniform, and you, on screen, make a gesture you have never made in real life. No one opened an editing program. No one learned how to use a timeline. No one paid for a post-production course. It was just... a conversation with an artificial intelligence.

That scene, which looked like science fiction twelve months ago, became a commercial product on Tuesday, May nineteenth, twenty twenty-six. Google introduced in Mountain View, California, during Google I/O, the Gemini Omni... a new family of artificial intelligence models capable of generating and editing videos from natural language commands. The first member to be released is called Gemini Omni Flash, and it begins reaching subscribers of the Google AI Plus, Pro, and Ultra plans today, inside the Gemini app and on the Google Flow platform. On YouTube Shorts and YouTube Create, access will be free later this week.

Demis Hassabis, head of DeepMind, took the stage with a philosophical promise before a technical one. The goal of the Omni project, he said... is to generate any kind of output from any kind of input. Text becomes video. A photo becomes animation. Audio becomes a character. A video becomes another video. Everything in the same model, talking to everything, with no borders between the media.

Google already had a video generator, Veo, launched last year. But Koray Kavukcuoglu, chief technology officer at Google DeepMind, made a point of explaining the difference to reporters. Veo, he said, works on the traditional text-to-video model... you write, it renders. Omni is natively multimodal. It was built from the ground up on the Gemini architecture, which means it understands context, reasons about what it has seen before, and keeps consistency across scenes. You can ask it to swap a character's outfit while keeping the face, change the camera angle without losing the lighting, transform a daytime scene into a nighttime one without making the dog in the background vanish along the way. In a demonstration shown to the press, the model generated a stop-motion video explaining protein folding, with coherent narration and believable physics.

Up to that point, it is one more chapter in the race between the giants. But Omni introduces a feature that opens a different door. It lets the user create a digital avatar with their own voice and their own face. To activate the function, you have to go through an onboarding... record yourself in front of the camera, speak a sequence of numbers, authenticate your own face and your own voice. After that, the avatar is saved. You can appear in videos without ever stepping in front of a camera again. You can speak languages you have never spoken. You can be in places you have never been. All of it with a face and a voice indistinguishable from yours.

The company itself acknowledges the weight of this feature. In its official statement, Google said it is committed to developing artificial intelligence responsibly and that it has clear policies in place to protect users from harm. For that reason, it has decided to keep disabled, for now, the ability to edit audio and speech in existing videos. The company says it is still testing that capability and studying how to release it responsibly. In other words... the machinery that would allow someone to take a real video, swap out the speaker's words, and deliver the result as if nothing had happened... exists. It just has not been turned on yet.

And there is a safety mechanism being sold as the main antidote. Every video generated by Omni carries SynthID, an invisible digital watermark, imperceptible to the human eye but detectable by machines, which signals that the content was generated by artificial intelligence. Google promises that verification will be available inside the Gemini app, in Chrome, and in Search. The problem is that this promise has already found its limits. In April of this year, a developer published a free, open source tool on GitHub capable of partially bypassing SynthID on images generated by Gemini. The tool does not erase the watermark... it just confuses Google's own decoder enough that it no longer detects the signal. Within weeks, the project had collected more than sixteen hundred stars on the platform. The watermark is still a useful layer of defense. But it has stopped being a wall.

There is a bright side, of course. For independent creators, advertisers, small filmmakers, teachers, and journalists in lean newsrooms, Omni dramatically shortens the path between an idea and a presentable video. Nicole Brichtova, director of product management at DeepMind, reminded reporters that the current limit of ten seconds per generation is not a technical boundary but a product decision... most users, according to her, are not yet asking for long videos. Advertisers gain the ability to render text correctly inside a scene, something that has always been the Achilles' heel of previous models. Educators gain a pocket-sized multimodal studio. Small businesses gain audiovisual production without an audiovisual production budget.

There is also a dark side... and no one needs much imagination to picture it. Convincing voice avatars in an election year. Phone scams using the voice and face of a real family member. Non-consensual pornography produced in seconds. Contestable courtroom evidence. Videos of public figures saying things they never said, circulating faster than any fact-check can reach. Omni does not invent any of these problems. But it shortens the distance between intention and execution to a typed sentence.

To understand where Google is positioning itself, it helps to remember the terrain. OpenAI shut down the Sora app in March, leaving a gap in mass-market consumption of artificial intelligence video. The same OpenAI launched last month its Images two point zero inside ChatGPT, expanding conversational editing to images. Adobe Firefly is advancing in the professional niche. The startup Luma AI promises entire advertising campaigns from a short brief. The consulting firm Statista projects the global market for artificial intelligence video tools at more than twelve billion dollars by twenty twenty-seven. Google now enters with a structural advantage that is hard to match... distribution. YouTube, Android, Chrome, Search, Workspace. Billions of people, one button away from the tool. Omni Pro, the more powerful version, was announced without a release date. And the developer interface should open in a few weeks.

What to do with this information.

First... build into your routine the idea that video, on its own, has stopped being proof of anything. Treat viral clips of public figures as suspect by default, until they arrive through an official channel or are confirmed by serious outlets. Second... if you produce content, it is worth getting to know Omni Flash this week. Not to abandon what you do today... but to map, calmly, where the tool shortens your workflow and where it still cannot replace a human eye. Third... think twice before offering your voice and your face to any digital avatar. The onboarding is simple. The reversal, much less so. And fourth... learn how to verify SynthID in the Gemini app, in Chrome, or in Search. The layer is not perfect, but having it is better than not having it.

Video, which for more than a century was the most reliable form of testimony humanity ever produced, is going through its own May nineteenth. This is not the end of the moving image. It is the beginning of a new relationship with it... one in which seeing still matters. It is just that believing... will take a little more work.

Sign up and read for free!

By signing up, you will get a free 7-day Trial to enjoy everything that 12min has to offer.

Who wrote the book?

Original content curated by 12... (Read more)

Start learning more with 12min

6 Milllion

Total downloads

4.8 Rating

on Apple Store and Google Play

91%

of 12min users improve their reading habits

A small investment for an amazing opportunity

Grow exponentially with the access to powerful insights from over 2,500 nonfiction microbooks.

Today

Start enjoying 12min's extensive library

Day 5

Don't worry, we'll send you a reminder that your free trial expires soon

Day 7

Free Trial ends here

Get 7-day unlimited access. With 12min, start learning today and invest in yourself for just USD $4.14 per month. Cancel before the trial ends and you won't be charged.

Start your free trial

More than 70,000 5-star reviews

Start your free trial

12min in the media