NVIDIA research method '3D MoMa' lets content creators improvise with 3D objects -

NVIDIA paid tribute to the Jazz music with AI research that could one day enable graphics creators to improvise with 3D objects created in the time it takes to hold a jam session. The method, NVIDIA 3D MoMa, could empower architects, designers, concept artists and game developers to quickly import an object into a graphics engine to start working with it, modifying scale, changing the material or experimenting with different lighting effects.

As per a report in the NVIDIA website, the research showcased this technology in a video celebrating jazz and its birthplace, New Orleans, where the paper behind 3D MoMa will be presented this week at the Conference on Computer Vision and Pattern Recognition.

NVIDIA graphics research VP David Luebke said that inverse rendering, a technique to reconstruct a series of still photos into a 3D model of an object or scene, “has long been a holy grail unifying computer vision and computer graphics.”

“By formulating every piece of the inverse rendering problem as a GPU-accelerated differentiable component, the NVIDIA 3D MoMa rendering pipeline uses the machinery of modern AI and the raw computational horsepower of NVIDIA GPUs to quickly produce 3D objects that creators can import, edit and extend without limitation in existing tools,” he said.

A 3D object should be in a form that can be dropped into widely used tools such as game engines, 3D modelers and film renderers, in order to be most useful for an artist or engineer. That form is a triangle mesh with textured materials, the common language used by such 3D tools.

Game studios and other creators would traditionally create 3D objects like these with complex photogrammetry techniques that require significant time and manual effort. Recent work in neural radiance fields can rapidly generate a 3D representation of an object or scene, but not in a triangle mesh format that can be easily edited.

NVIDIA 3D MoMa generates triangle mesh models within an hour on a single NVIDIA Tensor Core GPU. The pipeline’s output is directly compatible with the 3D graphics engines and modeling tools that creators already use.

The reconstruction of the pipeline includes three features: a 3D mesh model, materials and lighting. The mesh is like a papier-mâché model of a 3D shape built from triangles. Using the mesh, developers can modify an object to fit their creative vision. Materials are 2D textures overlaid on the 3D meshes like a skin. And NVIDIA 3D MoMa’s estimate of how the scene is lit allows creators to later modify the lighting on the objects.

To showcase the capabilities of NVIDIA 3D MoMa, NVIDIA’s research and creative teams started by collecting around 100 images each of five jazz band instruments — a trumpet, trombone, saxophone, drum set and clarinet — from different angles. NVIDIA 3D MoMa reconstructed these 2D images into 3D representations of each instrument, represented as meshes. The NVIDIA team then took the instruments out of their original scenes and imported them into the NVIDIA Omniverse 3D simulation platform to edit.

In any traditional graphics engine, creators can easily swap out the material of a shape generated by NVIDIA 3D MoMa, as if dressing the mesh in different outfits. The team did this with the trumpet model, for example, instantly converting its original plastic to gold, marble, wood or cork.

Creators can then place the newly edited objects into any virtual scene. The NVIDIA team dropped the instruments into a Cornell box, a classic graphics test for rendering quality. They demonstrated that the virtual instruments react to light just as they would in the physical world, with the shiny brass instruments reflecting brightly, and the matte drum skins absorbing light.

These new objects, generated through inverse rendering, can be used as building blocks for a complex animated scene — showcased in the video’s finale as a virtual jazz band.