Guiding Audio Editing with Audio Language Model

University of Pennsylvania  
SmartDJ (Ours) • Audio Editing Examples

High‑level audio editing examples

High-level instruction: “Make this sound like a workshop by the dock”

ALM inferenced atomic editing steps:

  • Remove the sound of metal knock
  • Add the sound of seagulls squawking at left by 3dB
  • Turn down the sound of motorboat running by 2dB
  • Add the sound of waves lapping at right by 2dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    High-level instruction: “Make this sound like a protest in a city”

    ALM inferenced atomic editing steps:

  • Turn up the sound of emergency siren by 3dB
  • Remove the sound of man speech
  • Add the sound of crowd chanting at front by 3dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    High-level instruction: “Make this sound like a serene beach”

    ALM inferenced atomic editing steps:

  • Remove the sound of whistling
  • Turn up the sound of wave crash by 4dB
  • Add the sound of seagulls calling at front by 3dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    High-level instruction: “Make this sound like a busy city street”

    ALM inferenced atomic editing steps:

  • Add the sound of distant sirens at left by 3 dB
  • Add the sound of footsteps on pavement at right by 2 dB
  • Turn down the sound of engine rev by 2dB
  • Remove the sound of bell ring

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    High-level instruction: “Make this sound like a cozy living room”

    ALM inferenced atomic editing steps:

  • Add the sound of fireplace crackle at left by 3dB
  • Turn down the sound of woman speech by 2dB
  • Remove the sound of cat meow

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    High-level instruction: “Make this sound like in an outdoor concert”

    ALM inferenced atomic editing steps:

  • Remove the sound of whistle
  • Turn down the sound of woman speech by 2dB
  • Add the sound of guitar strumming at left by 2dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    High-level instruction: “Make this sound like a busy office”

    ALM inferenced atomic editing steps:

  • Add the sound of phone ringing at right by 3dB
  • Turn up the sound of typewriter type by 2dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    High-level instruction: “Make this sound like a quiet workshop”

    ALM inferenced atomic editing steps:

  • Add the sound of soft hammering at right by 2dB
  • Remove the sound of tractor thud
  • Change the sound of object crumple to left

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    High-level instruction: “Make this sound like a city park”

    ALM inferenced atomic editing steps:

  • Add the sound of bicycle bells at left by 3dB
  • Add the sound of footsteps on gravel at right by 2dB
  • Remove the sound of engine rev
  • Turn up the sound of baby laugh by 2dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    High-level instruction: “Craft this sound like a peaceful farm night”

    ALM inferenced atomic editing steps:

  • Add the sound of crickets chirping at front by 2dB
  • Turn up the sound of snore by 3dB
  • Turn down the sound of goat bleat by 2dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    High-level instruction: “Make this sound like a busy daycare center”

    ALM inferenced atomic editing steps:

  • Turn up the sound of child cry by 3dB
  • Remove the sound of car engine
  • Add the sound of toys clattering at left by 2dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    High-level instruction: “Make this sound like a military training ground”

    ALM inferenced atomic editing steps:

  • Add the sound of cannon fire at right by 4dB
  • Turn up the sound of clank by 2dB

  • Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Atomic editing action: Add

    Edit instruction: “Add the sound of water falling at the front with -1 db”

    Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Edit instruction: “Add the sound of engine revs at the left with 0 db”

    Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Edit instruction: “Add the sound of music playing and people singning at the right with 0 db”

    Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Atomic editing action: Remove

    Edit instruction: “Remove the sound of baby crying at the front”

    Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Edit instruction: “Remove the sound of man speaking at left”

    Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Atomic editing action: Extract

    Edit instruction: “extract the sound of water pours, horn honks, and man speaks at the front”

    Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Edit instruction: “extract the sound of whistles

    Original

    ZETA

    AudioEditor

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Atomic editing action: Change sound direction

    Edit instruction: “Change the sound of woman speaking, food frying at the front to the right”

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Edit instruction: “change the sound of whistling and male speech to the left

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Atomic editing action: Turn up/down

    Edit instruction: “Turn up the sound of waves crashing, wind blows by 6 db”

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)

    Edit instruction: “Turn down the sound of typewriter by 6 db”

    Original

    Audit

    SmartDJ (Ours)

    Target (Ground truth)