Surviving Dependency Hell: A Field Guide to Installing Facebook AudioCraft on Ubuntu 20.04

A comprehensive log of installing Facebook’s AudioCraft locally on Ubuntu 20.04 using Miniconda. This guide covers solving the infamous MKL iJIT_NotifyEvent error, handling pkg-config for the av library compilation, and a deep dive into why Conda and Pip sometimes need to coexist redundantly.

The Motivation: Why Local?

I recently started working on a 30-second short film titled “A Lamp” (《第28次加班》). The script demands a very specific emotional shift: transitioning from melancholic isolation to warm healing at the exact 19-second mark.

While cloud-based solutions like Google Colab are great for quick tests, they lack the flexibility and persistence I need for an iterative creative workflow. I decided to deploy Facebook’s AudioCraft (specifically the MusicGen model) locally on my Linux server (Ubuntu 20.04) to generate modular BGM stems that I can stitch together in DaVinci Resolve.

However, installing it wasn’t as simple as pip install. I encountered several undocumented pitfalls involving PyTorch versions, Intel MKL libraries, and compilation dependencies. This post documents the exact steps to get it running, serving as a reference for my future self and anyone else stuck in dependency hell.

Prerequisites

OS: Ubuntu 20.04 LTS
GPU: NVIDIA GPU with CUDA support
Environment Manager: Miniconda (Highly recommended over system Python)

The Installation Journey

1. Environment Isolation

First, we create a clean environment. AudioCraft is sensitive to Python versions; the documentation recommends Python 3.9.

# Create a fresh environment named 'audiocraft'  
conda create -n audiocraft python=3.9

# Activate the environment  
conda activate audiocraft

2. The Foundation: FFmpeg

Audio processing requires FFmpeg. Installing this via Conda is safer as it manages the system-level binaries without messing up the OS package manager.

conda install "ffmpeg<5" -c conda-forge

3. The “Boss Fight”: PyTorch & The MKL Error

This is where things usually break.

If you simply follow standard tutorials to install the latest PyTorch, you might encounter this runtime error when importing the library:

OSError: .../libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

The Root Cause: This is caused by a version mismatch in the Intel MKL (Math Kernel Library). PyTorch is often built against an older MKL version, but Conda defaults to pulling the latest MKL (2024.1+), which removed the iJIT_NotifyEvent symbol.

The Fix: We must explicitly pin MKL to version 2023.2.0 and ensure compatibility between Conda and Pip.

# 1. Install PyTorch 2.1.0 with CUDA support (Adjust cuda version if needed)  
conda install pytorch==2.1.0 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

# 2. Downgrade MKL in Conda to a compatible version  
conda install mkl=2023.2.0 -c conda-forge

# 3. [CRITICAL] Re-install MKL via Pip to ensure Python finds it  
pip install mkl==2023.2.0

4. The Hidden Compilation Dependency: pkg-config

When installing audiocraft via pip later, it attempts to compile the av library from source. If your system lacks the development headers, you will see errors like:

Package libavformat was not found in the pkg-config search path.

The Fix: We need pkg-config so the compiler knows where the FFmpeg libraries are located.

# Install pkg-config tool  
conda install pkg-config -c conda-forge

# Export the path so the compiler can see the libraries installed in Step 2  
export PKG_CONFIG_PATH=$CONDA_PREFIX/lib/pkgconfig:$PKG_CONFIG_PATH

5. Installing AudioCraft

Now we are ready to install the main package.

python -m pip install -U audiocraft

Note on CPU Usage: During this step, if you run top, you might see a process named cc1 consuming 100% CPU. Do not panic. This is the GCC compiler building the C extensions for the av library. It is not a virus or a freeze; just let it finish.

Deep Dive: The “Two MKLs” Paradox

During the installation, you might wonder: “Why did I have to install MKL via Conda AND Pip? Don’t they conflict?”

I struggled with this. I installed MKL in Conda, but AudioCraft couldn’t “see” it. I had to install it again via Pip.

The Explanation: It comes down to how Conda and Pip manage environments:

Conda (The Contractor): Installs libraries (like .so files) into $CONDA_PREFIX/lib. These are system-level libraries.
Pip (The Interior Designer): Installs Python packages into site-packages.

Many Python wheels (pre-compiled packages) expect libraries to be present directly in site-packages or have specific rpath settings that ignore the Conda lib folder. By installing MKL via Pip after Conda, we essentially place a copy of the library exactly where the Python interpreter looks first, bypassing the complex path resolution issues.

It’s messy, but it’s currently the most reliable workaround for this specific PyTorch/MKL compatibility issue.

The Payoff: Generating Emotional BGM

With the environment set up, I can now generate the music for my film. My goal is to create two distinct stems and stitch them together at the “notification sound” moment in the video.
Here is the script generate_bgm.py I used:

import os
import torch
import torchaudio
from audiocraft.models import MusicGen, CompressionModel
from audiocraft.data.audio import audio_write
from audiocraft.models.encodec import EncodecModel
from audiocraft.modules.seanet import SEANetEncoder, SEANetDecoder
from audiocraft.quantization.vq import ResidualVectorQuantizer
from transformers import T5Tokenizer, T5EncoderModel

# ================= CONFIGURATION =================
# 1. Main MusicGen Model (Full folder)
MUSICGEN_PATH = '/mnt/EvoStorage/audiocraft_models/musicgen-large'
# 2. T5 Text Encoder (Full folder)
T5_PATH = '/mnt/EvoStorage/audiocraft_models/t5-base'
# ===============================================

print(">>> Applying offline monkey patches...")

# --- Patch 1: Intercept T5 Model ---
original_t5_tok_loader = T5Tokenizer.from_pretrained
original_t5_model_loader = T5EncoderModel.from_pretrained

def patched_t5_tok_loader(pretrained_model_name_or_path, *args, **kwargs):
    if pretrained_model_name_or_path == 't5-base':
        print(f"  [Patch] Redirecting T5 Tokenizer to: {T5_PATH}")
        return original_t5_tok_loader(T5_PATH, *args, **kwargs)
    return original_t5_tok_loader(pretrained_model_name_or_path, *args, **kwargs)

def patched_t5_model_loader(pretrained_model_name_or_path, *args, **kwargs):
    if pretrained_model_name_or_path == 't5-base':
        print(f"  [Patch] Redirecting T5 Model to: {T5_PATH}")
        return original_t5_model_loader(T5_PATH, *args, **kwargs)
    return original_t5_model_loader(pretrained_model_name_or_path, *args, **kwargs)

T5Tokenizer.from_pretrained = patched_t5_tok_loader
T5EncoderModel.from_pretrained = patched_t5_model_loader

import gc

def main():
    model = None # 初始化变量
    try:
        print(f"\n>>> Loading MusicGen from local path: {MUSICGEN_PATH}")
        model = MusicGen.get_pretrained(MUSICGEN_PATH)

        current_dir = os.getcwd()
        print(f">>> Output directory: {current_dir}")

        print(">>> Model loaded successfully. Generating music...")
        model.set_generation_params(duration=8)
        wav = model.generate_unconditional(4)

        for idx, one_wav in enumerate(wav):
            filename_stem = f'bgm_output_{idx}'

            # 拼接完整绝对路径，确保文件一定落在当前目录下
            full_path = os.path.join(current_dir, filename_stem)

            # 注意：audio_write 最好也加上 .cpu() 确保数据离开了显存
            audio_write(full_path, one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)
            print(f"Saved: {full_path}.wav") # 打印时手动加个后缀提示自己

        print(">>> All tasks complete!")

    except Exception as e:
        print(f"!!! An error occurred: {e}")
    
    finally:
        # ================= 显存清理核心代码 =================
        print(">>> Cleaning up GPU memory...")
        if model:
            del model  # 1. 删除模型对象引用
        
        gc.collect()   # 2. 强制 Python 垃圾回收
        torch.cuda.empty_cache() # 3. 清空 PyTorch 缓存
        print(">>> GPU memory cleared.")
        # ===================================================

if __name__ == "__main__":
    main()

Summary of Commands (TL;DR)

For those who just want to copy-paste:

conda create -n audiocraft python=3.9 -y  
conda activate audiocraft  
conda install "ffmpeg<5" -c conda-forge -y  
conda install pytorch==2.1.0 torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y  
conda install mkl=2023.2.0 pkg-config -c conda-forge -y  
pip install mkl==2023.2.0 
export PKG_CONFIG_PATH=$CONDA_PREFIX/lib/pkgconfig:$PKG_CONFIG_PATH  
python -m pip install -U audiocraft

Happy generating!

The Motivation: Why Local?#

Prerequisites#

The Installation Journey#

1. Environment Isolation#

2. The Foundation: FFmpeg#

3. The “Boss Fight”: PyTorch & The MKL Error#

4. The Hidden Compilation Dependency: pkg-config#

5. Installing AudioCraft#

Deep Dive: The “Two MKLs” Paradox#

The Payoff: Generating Emotional BGM#

Summary of Commands (TL;DR)#