Google outdoes itself with new AI: Imagen can specify the generation of objects, and the style can be converted at will

By    30 Aug,2022

How powerful can Imagen be when you add the ability to "point and shoot"? Simply upload 3-5 photos of the specified object and describe the background, action or expression you want to generate with text, and the specified object will "flash" into the scene you want, and the action and expression will come to life.


This amazing text-image generation model called DreamBooth is Google's latest research, based on Imagen's tweaks, and has been a hot topic on Twitter since it was released.

Researchers did a comparison, compared to other large-scale text-image models such as DALL-E2, Imagen, etc., only using DreamBooth's method can achieve a faithful restoration of the input image.

And this is the most important feature of DreamBooth - personalized representation. Given 3-5 images of an object taken at random by the user, one can get a novel reproduction of the object in different contexts, while preserving its key features.

Of course, the authors also say that this approach is not limited to a particular model, and that DALL・E2 can also achieve such functionality with some adjustments. Specifically, DreamBooth uses the method of adding a "special identifier" to the object.

In other words, originally, the image generation model received only one type of object, such as [cat], [dog], etc., but now DreamBooth adds a special identifier to this type of object, which becomes [V][object class].

The authors say that, limited by the number of input photos, the model cannot learn the overall features of the objects in the photos well, and may overfit them instead.
