The current CPU-based version on HuggingFace has slow inference, you can access the GPU-based mirror on ModelScope

0:00 / 0:00

Cite

@misc{zhou2025emelodygenemotionconditionedmelodygeneration,
    title         = {EMelodyGen: Emotion-Conditioned Melody Generation in ABC Notation with the Musical Feature Template},
    author        = {Monan Zhou and Xiaobing Li and Feng Yu and Wei Li},
    year          = {2025},
    eprint        = {2309.13259},
    archiveprefix = {arXiv},
    primaryclass  = {cs.IR},
    url           = {https://arxiv.org/abs/2309.13259}
}
Dataset
Valence: reflects negative-positive levels of emotion
Arousal: reflects the calmness-intensity of the emotion
The emotion you believe the generated result should belong to