Riffusion Release v0.3 – Stable Diffusion for audio by haykmartiros

Share This Article

Sed ut perspiciatis unde.

Send to HN

Riffusion is a library for real-time music and audio generation with stable diffusion.

Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.

This release contains a full rewrite of the Riffusion codebase to go from a hack to a quality software project.

Rename the repository from riffusion-inference to riffusion.
SpectrogramParams class that contains all conversion parameters, with sane defaults.
SpectrogramConverter class that converts between spectrogram tensors and audio.
SpectrogramImageConverter class that converts between spectrogram images and audio.
Leverage pydub AudioSegment in more places rather than raw numpy arrays.
Move common code into the util package.
Cache more computation and be careful about error checking.
Move third party integrations into the integrations package. Share most of the code so they greatly simplify.
pyproject.toml for tool configuration
Overhaul README with more descriptive instructions.

🚨 This release is API compatible with the web app, but code that used this repository directly will need to be updated.

Extensible command line interface for performing common tasks. See the README for details.

$ python -m riffusion.cli -h
usage: cli.py [-h] {audio-to-image,image-to-audio,sample-clips,print-exif} ...

positional arguments:
  {audio-to-image,image-to-audio,sample-clips,print-exif}
    audio-to-image      Compute a spectrogram image from a waveform.
    image-to-audio      Reconstruct an audio clip from a spectrogram image.
    sample-clips        Slice an audio file into clips of the given duration.
    print-exif          Print the params of a spectrogram image as saved in the exif data.

options:
  -h, --help            show this help message and exit

Extensible Streamlit app for interactive exploration of Riffusion. See the README for details.

Riffusion now can run on MPS and CPU backends in addition to CUDA. See the README for details.

Also adds graceful detection and fallback of devices.

Closes: #15

Add tools to encode and decode stereo audio as spectrograms, using the G and B channels for left and right.

Add the ability to store spectrogram conversion parameters in EXIF metadata of the images, and the ability to decode back to audio from those params. This allows more flexibility for usage without assuming default parameters.

The SpectrogramParams class has methods to convert to and from EXIF.

$ python -m riffusion.cli print-exif --image spectrogram.jpg
NUM_FREQUENCIES      =             512
STEP_SIZE_MS         =              10
MAX_VALUE            =      46801012.0
MIN_FREQUENCY        =               0
WINDOW_DURATION_MS   =             100
MAX_FREQUENCY        =           10000
PADDED_DURATION_MS   =

Riffusion Release v0.3 – Stable Diffusion for audio by haykmartiros

Riffusion Release v0.3 – Stable Diffusion for audio by haykmartiros

Share This Article

Newsletter

HackTech

Leave a comment Cancel reply

Editor's Choice

Riffusion Release v0.3 – Stable Diffusion for audio by haykmartiros

Riffusion Release v0.3 – Stable Diffusion for audio by haykmartiros

Share This Article

Newsletter

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter