About a year ago, I saw a LinkedIn post that said AI music generators like Suno or Udio would become open-source in a few years. Well, it seems like it's already happening because the Chinese AI is entering music scene as well. Following DeepSeek that shook the entire AI world last week, a new player has appeared—YuE (乐), an open-source system capable of generating full songs offline, developed by Multimodal Art Projection (M-A-P) in collaboration with the Hong Kong University of Science and Technology.
YuE, which means "music" and "happiness" in Chinese, creates complete tracks from text prompts, generating vocals and instrumentation; it handles multiple genres, languages, and vocal styles, though currently only in mono, unlike the stereo output of Udio and Suno.
"YuE is a groundbreaking series of open-source foundation models designed for music generation, specifically for transforming lyrics into full songs (lyrics2song). It can generate a complete song, lasting several minutes, that includes both a catchy vocal track and accompaniment track. YuE is capable of modelling diverse genres/languages/vocal techniques," the GitHub page says.
Read also: "6-12 months ahead of Suno, in AI time"—Udio Users Make an Average of 864K New Songs Every Day
What sets YuE apart is its ability to function entirely offline, but that comes with high hardware demands. According to the tool's GitHub page, generating 30 seconds of audio takes about 150 seconds on an Nvidia H800 and around 360 seconds on a GeForce RTX 4090. For full-length songs, at least 80GB of video memory is recommended, making the process viable only on high-end GPUs like the Hopper H800, A100, or multiple RTX 4090s. Shorter clips can be rendered with 24GB.
This is a song sample made with YuE, shared by the devs on GitHub.
A recent update allows the AI to mimic the style of reference tracks. Future updates will bring tempo control (BPM), an improved interface, as well as reduced memory requirements through a transition to GGML. The team is now looking for collaborators to help with dataset creation and refinement.
YuE is built on Meta’s LLama architecture and has undergone a three-stage training process to ensure scalability, musicality, and text-based control. M-A-P has released models with 1 billion and 7 billion parameters, supporting English, Mandarin, Cantonese, Japanese, and Korean, alongside a separate model for upscaling audio to CD quality (44.1 kHz).
Licensed under Apache 2.0, YuE is free to use in commercial projects, provided M-A-P is credited. The developers even encourage musicians to reuse and monetise AI-generated material.
Read also: Copyright Challenges in AI-Generated Music: Who Really Owns the Melody?