Setup & Installation Guide

Follow this guide to configure your local environment, install dependencies, and run inference using the distilled DanceOPD student flow-matching model.

Prerequisites

Before installing, ensure your system meets the following software requirements:

Python 3.10 or higher
PyTorch 2.1.0 or higher (with CUDA support for GPU acceleration)
NVIDIA GPU with at least 12 GB of VRAM

Installation

Clone the repository from GitHub and install the required packages in your Python environment:

Terminal Setup

git clone https://github.com/worldbench/DanceOPD.git
cd DanceOPD
pip install -r requirements.txt

Running Inference

Use the following script to load the distilled student model and run text-to-image synthesis:

Python Generation Script

import torch
from diffusers import FlowMatchEulerDiscreteScheduler
from model import DanceOPDStudentPipeline

# Load the distilled student model
pipeline = DanceOPDStudentPipeline.from_pretrained(
    "worldbench/DanceOPD-Student", 
    torch_dtype=torch.float16
)
pipeline.scheduler = FlowMatchEulerDiscreteScheduler.from_config(
    pipeline.scheduler.config
)
pipeline.to("cuda")

# Generate an image using 20 steps (no CFG required)
prompt = "A professional dancer performing on stage, dramatic lighting"
image = pipeline(prompt, num_inference_steps=20).images[0]
image.save("output_generation.png")

Running Local Image Editing

To edit an existing image, specify the task context, input image, and instructions:

Python Editing Script

# Perform a local area edit on the generated image
edited_image = pipeline.edit(
    image=image,
    task="local_edit",
    instruction="change the dress color to a vibrant red",
    num_inference_steps=20
).images[0]
edited_image.save("output_edit.png")

Hyperparameter Guidelines

When executing distillation on your own datasets, we recommend the following training configurations:

Batch Size: 64 (distributed across 8 GPUs)
Learning Rate: 5e-5 with a linear warmup of 500 steps
Optimizing Steps: 50,000 steps for stable convergence
Sample Routing Ratio: 40% Text-to-Image, 40% Local Edit, 20% Global Style