nvidia fugatto

the name: short for Foundational Generative Audio Transformer Opus 1, no one knows whether it is an acronym or a backronym, but that is besides the point.

Up to now, all we have is a paper, and some sample clips, and a youtube video, but boy does it look great

Fugatto supports all of the following features out of the box

Emergent properties
Large-scale data
Supports numerous tasks
Free-form Instructions
Open ended generation
Compositionality (ComposableART)
Multi-Modal Inputs

In comparison to other popular AIs that tick one or two of the boxes, this is an impressive software that ticks all of the above

One of the features that is different than anything else is what nvidia calls the avocado chair, when objects that traditionally make one sound make a sound of something else ! so when you say he makes the guitar speak, this time you might actually mean it literally !

The main competitors in Audio are

AudioBox
NExT-GPT
UniAudio
Audit
VoiceLDM

Services

Suno AI

nvidia says it has trained the model (of 2.5 billion parameters) using 32 H100 Tensor core GPUs (DGX)

More about fugatto

https://research.nvidia.com/publication/2024-11_fugatto-1-foundational-generative-audio-transformer-opus-1

https://blogs.nvidia.com/blog/fugatto-gen-ai-sound-model

Comments

Leave a Reply Cancel reply