Audio editors today usually expect template-like commands (e.g., “add birds”, “remove rain”), and most operate only on mono audio. In practice, users want to provide a single declarative instruction (“make it sound like a quiet sunny forest”) and let the system figure out the steps — while preserving spatial cues in stereo audio.
SmartDJ = Planner + Editor
- ALM Planner: perceives the original audio + interprets the goal → emits an edit recipe (atomic steps).
- LDM Editor: executes each atomic step sequentially in stereo latent space.
The intermediate plan is natural language, so it's inspectable, editable, and enables human-in-the-loop workflows.