Apple develops GAUDI, an "AI architect" that generates ultra-realistic 3D scenes based on text

By    5 Aug,2022

Nowadays, new text-generated image models are released every now and then, each with powerful results that amaze everyone, and the field is already on a roll.

huangj.jpg

However, AI systems such as OpenAI's DALL-E 2 or Google's Imagen can only generate two-dimensional images, so if text could be turned into three-dimensional scenes, the visual experience would be doubly enhanced.


Now, the AI team from Apple has introduced GAUDI, the latest neural architecture for 3D scene generation.


Apple has developed the 'AI Architect' GAUDI: generating ultra-realistic 3D scenes from text!


It captures complex and realistic 3D scene distributions, immersive rendering from a moving camera, and also creates 3D scenes based on text cues! The model is named after Antoni Gaudi, the famous Spanish architect.


NeRFs-based 3D rendering

Neural rendering, which combines computer graphics with artificial intelligence, has led to a number of systems that generate 3D models from 2D images. For example, 3D MoMa, recently developed by Nvidia, can create a 3D model from less than 100 photos in less than an hour. Google also relies on NeRFs (Neural Radiation Fields) to combine 2D satellite and street view imagery into 3D scenes in Google Maps, enabling immersive views. Google's HumanNeRF can also render 3D human bodies from video.

NeRFs are still mainly used as a neural storage medium for 3D models and 3D scenes, which can be rendered from different camera views, and have also started to be used for virtual reality experiences.


So can the power of NeRFs to realistically render images from different camera angles be used for generative AI? Of course they can, and there are already research teams experimenting with 3D scene generation, such as Google's Dream Fields, an AI system first introduced last year, which combines the ability of NeRFs to generate 3D views with OpenAI's CLIP ability to evaluate image content, resulting in NeRFs that can generate matching text descriptions.


1/2

POPULAR CATEGORY