Google quietly rolled out a powerful new version of Gemini last week with which everyone can edit photos using ordinary English commands instead of technical skills. The experimental version of Gemini 2.0 Flash with native image generations options is now available for all users after it has only been limited to testers since last year.
Unlike most current AI image tools, this is not just about generating new images completely again. Google has created a system that understands existing photos well enough to change them through a natural conversation, where much of the original content is retained and makes specific changes.
This is possible because Gemini 2.0 is native multimodal, which means that it can understand both text and images at the same time. The model converts images in tokens – the same basic units that it uses to process text – so that the visual content can manipulate with the same neural paths that it uses to understand language. This uniform approach means that the system does not have to call on separate specialized models to process different media types.
“Gemini 2.0 Flash combines multimodal input, improved reasoning and understanding of natural language to make images,” said Google in the official announcement. “Use Gemini 2.0 Flash to tell a story and it will illustrate it with photos, keep the characters and settings consistent. Give the feedback and the model will again tell the story if the style of the drawings will change.”
Google’s approach differs considerably from competitors such as OpenAI, whose chatgpt images can generate with the help of Dall-E 3 and itteren on the creations that the natural language requires a separate AI model to do this. In other words, Chatgpt coordinates between GPT-V for vision, GPT-4O for Language and Dall-E 3 for generating image, instead of having one model to understand everything to achieve with GPT-5.
A similar concept exists in the open-source world through Omnigen, developed by researchers from the Beijing Academy of Artificial Intelligence. The makers suggest that “different images generate directly through random multimodal instructions without the need for extra plug -ins and operations, similar to how GPT works in language generation.”
Omnigen is also able to change objects, merge elements in one scene and dealing with aesthetics. For example, we tested the model in 2024 and could generate an image of Decrypt Co-founder Josh Quitittner hangs around with Ethereum co-founder Vitalik Buterin.

However, enhancing is much less user -friendly, works with smaller resolutions, requires more complex assignments and is not as powerful as the new Gemini. Yet it is a great open-source alternative that can be interesting for some users.
This is what we have found with Google’s Gemini 2.0 flash.
Testing the model
We have put Gemini 2.0 flash through its passes to see how it performs in different editing scenarios. The results reveal both impressive options and some remarkable limitations.
Realistic topics

The model maintains surprising coherence in changing realistic topics. In my tests I uploaded a self -portrait and asked to add muscles. The AI delivered as requested, and although my face changed somewhat, it remained recognizable.
Other elements in the photo remained largely unchanged, whereby the AI only focused on the requested specific adjustment. This targeted editing capacity stands out compared to typical generative approaches that often make entire images again.
The model is also censored, often refuses to edit photos of children and of course refuses to handle nakedness. After all, it is a model from Google. If you want to be naughty with photos of adults, then Omigen is your friend.
Style transformations

Gemini 2.0 Flash shows a very good predisposition for style conversions. When asked to transform a photo of Donald Trump into the Japanese Manga style, it successfully re -invented the image after a few attempts.
The model deals with a wide range of style transfers – converting photos into drawings, oil paintings or almost any art style that you can describe. You can refine the results by adjusting the temperature settings and switching filters, although settings with a higher temperature tend to produce less recognizable transformations of the original.
However, one limitation is done when applying for artist -specific styles. Tests that asked the model to apply the styles of Leonardo da Vinci, Michelangelo, Botticelli or Van Gogh resulted in the AI that reproduced real paintings of these artists instead of applying their techniques to the source image.
After a few fast tweaks and few repetitions, we could get a medium but usable result. In the ideal case, it is better instead of asking the artist, better to call on the art style.

Element manipulation

The model really shines for practical processing tasks. It deals with expertly contributions and object manipulation – the removal of specific objects when requested, or adding new elements to a composition. In one test we encouraged the AI to replace a basketball with a gigantic rubber chicken for some reason, and it produced a funny but contextually appropriate result.
Sometimes the specific bits of the topics can change, but this is a problem that can easily be imposed with digital processing tools in a few seconds.
To be honest, we don’t know what we expected after we asked to have basketball players fight for a rubber chicken.
Perhaps the most controversial, the model is very good at removing copyright protection-one function on which a lot was discussed on X. When we uploaded and asked for an image with water brands to remove all the letters, logos and water brands, Gemini produced a clean image that resembled the non-battered originally.

Perspective changes
One of the most technically impressive performance is the ability of Gemini to change the perspective – something that regular diffusion models cannot do. The AI can reset a scene from different angles, although the results are essentially new creations instead of precise transformations.
Although perspective shifts do not produce perfect results, the model is, after all, 100% of the image conceptualization while it is displayed of new points of view, they are an important progress in AI’s understanding of three-dimensional space from two-dimensional inputs.

It is also important to have a good phrasing when asking the model to deal with backgrounds. Usually it tends to adjust the whole picture, making the composition look completely different.
In one test, for example, we asked Gemini to change the background of a photo, making a sitting robot in Egypt instead of the original location. We asked Gemini not to change the subject. However, the model was unable to do not be able to handle this specific task correctly and instead offered a brand new composition with the pyramids, with a robot that is not as the primary subject.

Another error that we have found is that the model can repeat several times with one image, but the quality of the details will reduce the more iterations it goes through. It is therefore important to keep in mind that there can be a noticeable demolition in quality if you go overboard with the operations.

The experimental model is now available for developers via Google AI Studio and the Gemini API in all supported regions. It is also available on a cuddle face for users who are not familiar with sending their information to Google.
In general, this seems to be one of those hidden gems of Google, just like Notebooklm. It does something that other models cannot do and is good enough in it, but still not many people talk about it. It is certainly worth trying users who want to have fun and see the potential of generative AI in image editing.
Generally intelligent Newsletter
A weekly AI trip told by Gen, a generative AI model.