Few-shot LoRA tuning for genre-specific music generation with semantic prompt matching


Abstract

This paper presents an efficient method for genre-specific music generation by applying Low-Rank Adaptation (LoRA) to the text encoder of MusicGen, a large-scale text-to-music generation model. Full fine-tuning of such models is computationally expensive and resource-intensive, making it impractical for lightweight applications or small-scale research groups. To address this, we fine-tune only a small number of parameters using LoRA, significantly reducing training cost while preserving the base model's capabilities. Furthermore, we propose a mechanism for automatically selecting the most suitable genre-specific LoRA adapter based on cosine similarity between the user's prompt and predefined genre labels in the text embedding space. This enables effective music generation even when the user does not explicitly mention a genre. Experiments conducted on the FMA dataset using jazz and hip-hop genres demonstrate that the proposed method improves alignment between prompts and generated audio, measured using CLAP-based text-audio similarity. The results show consistent performance gains over the baseline model, validating the effectiveness of LoRA in genre adaptation and the proposed adapter selection strategy. On average, applying our method increased CLAP-based text-to-audio similarity from 0.35 to 0.38 for jazz prompts and from 0.31 to 0.34 for hip-hop prompts. These improvements demonstrate that genre-adapted LoRA tuning yields more semantically aligned and stylistically appropriate music. Our approach enables flexible and efficient customization of music generation models with minimal resources across diverse genres and applications.
Ask to review this manuscript

Notes for potential reviewers

  • Volunteering is not a guarantee that you will be asked to review. There are many reasons: reviewers must be qualified, there should be no conflicts of interest, a minimum of two reviewers have already accepted an invitation, etc.
  • This is NOT OPEN peer review. The review is single-blind, and all recommendations are sent privately to the Academic Editor handling the manuscript. All reviews are published and reviewers can choose to sign their reviews.
  • What happens after volunteering? It may be a few days before you receive an invitation to review with further instructions. You will need to accept the invitation to then become an official referee for the manuscript. If you do not receive an invitation it is for one of many possible reasons as noted above.

  • PeerJ Computer Science does not judge submissions based on subjective measures such as novelty, impact or degree of advance. Effectively, reviewers are asked to comment on whether or not the submission is scientifically and technically sound and therefore deserves to join the scientific literature. Our Peer Review criteria can be found on the "Editorial Criteria" page - reviewers are specifically asked to comment on 3 broad areas: "Basic Reporting", "Experimental Design" and "Validity of the Findings".
  • Reviewers are expected to comment in a timely, professional, and constructive manner.
  • Until the article is published, reviewers must regard all information relating to the submission as strictly confidential.
  • When submitting a review, reviewers are given the option to "sign" their review (i.e. to associate their name with their comments). Otherwise, all review comments remain anonymous.
  • All reviews of published articles are published. This includes manuscript files, peer review comments, author rebuttals and revised materials.
  • Each time a decision is made by the Academic Editor, each reviewer will receive a copy of the Decision Letter (which will include the comments of all reviewers).

If you have any questions about submitting your review, please email us at [email protected].