Setup & Installation Guide
Follow this guide to configure your local environment, install dependencies, and run inference using the distilled DanceOPD student flow-matching model.
Prerequisites
Before installing, ensure your system meets the following software requirements:
- Python 3.10 or higher
- PyTorch 2.1.0 or higher (with CUDA support for GPU acceleration)
- NVIDIA GPU with at least 12 GB of VRAM
Installation
Clone the repository from GitHub and install the required packages in your Python environment:
Terminal Setup
git clone https://github.com/worldbench/DanceOPD.git cd DanceOPD pip install -r requirements.txt
Running Inference
Use the following script to load the distilled student model and run text-to-image synthesis:
Python Generation Script
import torch
from diffusers import FlowMatchEulerDiscreteScheduler
from model import DanceOPDStudentPipeline
# Load the distilled student model
pipeline = DanceOPDStudentPipeline.from_pretrained(
"worldbench/DanceOPD-Student",
torch_dtype=torch.float16
)
pipeline.scheduler = FlowMatchEulerDiscreteScheduler.from_config(
pipeline.scheduler.config
)
pipeline.to("cuda")
# Generate an image using 20 steps (no CFG required)
prompt = "A professional dancer performing on stage, dramatic lighting"
image = pipeline(prompt, num_inference_steps=20).images[0]
image.save("output_generation.png")Running Local Image Editing
To edit an existing image, specify the task context, input image, and instructions:
Python Editing Script
# Perform a local area edit on the generated image
edited_image = pipeline.edit(
image=image,
task="local_edit",
instruction="change the dress color to a vibrant red",
num_inference_steps=20
).images[0]
edited_image.save("output_edit.png")Hyperparameter Guidelines
When executing distillation on your own datasets, we recommend the following training configurations:
- Batch Size: 64 (distributed across 8 GPUs)
- Learning Rate: 5e-5 with a linear warmup of 500 steps
- Optimizing Steps: 50,000 steps for stable convergence
- Sample Routing Ratio: 40% Text-to-Image, 40% Local Edit, 20% Global Style