AI를 활용해 무료로 음악 생성해보기

개요

요즘 AI를 이용한 음악 생성 모델로 폭발적인 인기를 끌고 있습니다.

특히, 유튜브에 올라온 OO 노래는 AI 모델을 활용해 몇백만이 넘는 조회수를 기록하고 있습니다.

따라서 AI 음악 생성 사이트에서 음악을 생성하려고 했지만, 웬걸 돈 내라고 합니다.

모델만든 분들의 고생한 걸 알지만 이걸 돈받고 할만한 가치가 있는가라는 궁금증과 내가 한번 만들어보지라는 생각으로 시작했습니다.

AI에 대한 지식이 부족한 관계로 이미 만들어져있는 모델들을 활용해 만들어봤습니다.

구성

제가 생각한 방법은 3단계로 나눠 구성했습니다.

1. 음악 자체를 만들어주는 모델

2. TTS로 보이스

3. 음악과 TTS를 디퓨전하는 모델

모델

사용한 모델은 다음과 같습니다.

1. 음악을 생성 모델은 페이스북 뮤직 젠(Facebook/music-gen)

2. 무료인 구글 gTTS

3. 디퓨전 모델은 있지만 테스트로 pydub으로 합치기

코드

Dockerfile

# 3.9 버전
FROM python:3.9

# 업데이트 및 각종 패키지 설치 뺴셔도됩니다. 몇개
RUN apt-get update && apt-get install -y \
    build-essential \
    python3-dev \
    gcc \
    libsndfile1 \
    ffmpeg \
    git \
    && apt-get clean

# 디렉토리 설정
WORKDIR /app

# requirements 디펜던시 파일 복사
COPY requirements.txt .

# requirements 설치
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt

# 오디오 크래프트 다운로드
RUN git clone https://github.com/facebookresearch/audiocraft.git /audiocraft \
    && cd /audiocraft \
    && git checkout main \
    && pip install -e .

# app파일 복사
COPY . .

# 내부 5000포트로 실행 로컬로
CMD ["flask", "run", "--host=0.0.0.0", "--port=5000"]

앱 app.py

from flask import Flask, request, jsonify, send_file, render_template_string
from pydub import AudioSegment
import os
import traceback
from audiocraft.models import MusicGen
import warnings
from scipy.io.wavfile import write
import numpy as np
from gtts import gTTS  # 구글 TTS 서비스

# 파이토치 워닝 메세지 무시
warnings.filterwarnings("ignore", category=UserWarning, module="torch.nn.utils")

app = Flask(__name__)

# 위치 지정 
BASE_DIR = os.path.abspath(os.path.dirname(__file__))

# 음악생성
def generate_instrumental(style_prompt):
    try:
        # 모델 가져오기
        model = MusicGen.get_pretrained("facebook/musicgen-medium")

        # 파라미터 설정
        model.set_generation_params(duration=10)  # 10초로 설정 음악 길이

        # 스타일기반으로 오디오 생성
        wav_output = model.generate([style_prompt])  

        # 넘파이로 변환
        wav_tensor = wav_output[0].detach().cpu()  
        wav_data = wav_tensor.numpy()  

        if len(wav_data.shape) > 1:
            wav_data = wav_data[0]  # 첫 번째 채널 사용(모노 오디오)

        # 16비트 PCM으로 변환
        sample_rate = 32000  # 32khz 셈플레이트
        wav_data = (wav_data * 32767).clip(-32768, 32767).astype(np.int16)  # 크기 조절, 자르기 및 변환

        # 저장
        output_path = os.path.join(BASE_DIR, "instrumental.wav")
        write(output_path, sample_rate, wav_data)

        return output_path
    except Exception as e:
        raise Exception(f"MusicGen error: {str(e)}")

# GTTS 이용
def generate_vocals(lyrics):
    try:
        output_path = os.path.join(BASE_DIR, "vocals.wav")

        # 입력받은 가사 텍스트 저장
        tts = gTTS(text=lyrics, lang="en")
        tts.save("vocals.mp3")

        # MP3 wav 변환
        tts_audio = AudioSegment.from_mp3("vocals.mp3")
        tts_audio.export(output_path, format="wav")

        return output_path
    except Exception as e:
        raise Exception(f"TTS error: {str(e)}")

# 음악 노래 합치기
def combine_tracks(instrumental_path, vocal_path):
    try:
        if not os.path.exists(instrumental_path):
            raise FileNotFoundError(f"{instrumental_path} not found.")
        if not os.path.exists(vocal_path):
            raise FileNotFoundError(f"{vocal_path} not found.")
        
        instrumental = AudioSegment.from_file(instrumental_path)
        vocals = AudioSegment.from_file(vocal_path)

        # 음악에 노래 보컬 올리기
        combined = instrumental.overlay(vocals)

        # 최종 출력
        output_path = os.path.join(BASE_DIR, "final_song.wav")
        combined.export(output_path, format="wav")
        return output_path
    except Exception as e:
        raise Exception(f"Error combining tracks: {str(e)}")

# GET PAGE 설정
HTML_TEMPLATE = """
<!DOCTYPE html>
<html>
<head>
    <title>Song Generator</title>
</head>
<body>
    <h1>Generate a New Song</h1>
    <form action="/generate_song" method="post">
        <label for="lyrics">Lyrics:</label><br>
        <textarea id="lyrics" name="lyrics" rows="4" cols="50" required></textarea><br><br>
        <label for="style">Style:</label><br>
        <input type="text" id="style" name="style" required><br><br>
        <button type="submit">Generate Song</button>
    </form>
</body>
</html>
"""

@app.route('/')
def index():
    return render_template_string(HTML_TEMPLATE)

@app.route('/generate_song', methods=['POST'])
def generate_song():
    content = request.form
    lyrics = content.get('lyrics')
    style = content.get('style')

    try:
        #음악 생성
        instrumental = generate_instrumental(style)

        #보컬 생성
        vocals = generate_vocals(lyrics)

        #음악에 보컬 오버레이
        final_song = combine_tracks(instrumental, vocals)

        # 노래 다운로드되도록 설정
        return send_file(final_song, as_attachment=True)
    except Exception as e:
        traceback.print_exc() 
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True, port=5000)

requirements

Flask==3.1.0
transformers==4.46.2
diffusers==0.31.0
torch==2.5.0
torchaudio==2.5.0
pydub==0.25.1
numpy==1.21.0
gtts==2.2.3

결과

final_song.wav

0.61MB

실행방법

1. 도커를 설치 한 뒤 하나의 디렉토리에서 모든 파일을 넣습니다.

2. docker build -t 명령어를 이용해 빌드합니다.

3. docker run -d -p 5000:5000 로 실행하시면 됩니다.

결론

끔찍한 노래가 나왔습니다. 또한 생성 시간이 무려 3분 이상이 걸렸습니다.(10초 노래 기준)

더 좋은 방법이 있겠지만 제가 생각한 방법으로는 실제로 서비스를 제공하기는 매우 어렵다는 걸 깨달았습니다..
또한 노래 자체 퀄리티뿐만 아니라 facebook 모델을 상업적으로 이용할 경우 문제가 발생할 수 있습니다.

똑똑한 분들은 논문 보고 모델을 만들어보시는 걸 추천합니다.

굳이 힘을 들여 만들기 싫은 분들은 좋은 모델 제공하는 사이트에서 돈 내고 쓰시는 걸 추천합니다.

다들 파이팅

플라스크==3.1.0 트랜스포머==4.46.2 디퓨저==0.31.0 토치==2.5.0 torchaudio==2.5.0 pydub==0.25.1 numpy==1.21.0 gtts==2.2.3

저작자표시 비영리 변경금지

'머신러닝 > 프로젝트' 카테고리의 다른 글

사칙연산 계산기 MLP 로 구현하기 (0)	2022.12.21
독버섯 분류 (0)	2022.12.21