Thursday, December 5, 2024

Introducing Amazon Nova, our new generation of foundation models

Must read

We will introduce two additional Amazon Nova models in 2025, including a speech-to-speech model and a native multimodal-to-multimodal—or “any-to-any” modality model. Our speech-to-speech model will understand streaming speech input in natural language, interpreting verbal and nonverbal cues (like tone and cadence), and delivering natural humanlike interactions, while our any-to-any model will be capable of processing text, images, audio, and video, as both input and output. It will simplify the development of applications where the same model can be used to perform a wide variety of tasks, such as translating content from one modality to another, editing content, and powering AI agents that can understand and generate all modalities.

Latest article