DanceOPDGitHub

Setup & Installation Guide

Follow this guide to configure your local environment, install dependencies, and run inference using the distilled DanceOPD student flow-matching model.

Prerequisites

Before installing, ensure your system meets the following software requirements:

  • Python 3.10 or higher
  • PyTorch 2.1.0 or higher (with CUDA support for GPU acceleration)
  • NVIDIA GPU with at least 12 GB of VRAM

Installation

Clone the repository from GitHub and install the required packages in your Python environment:

Terminal Setup
git clone https://github.com/worldbench/DanceOPD.git
cd DanceOPD
pip install -r requirements.txt

Running Inference

Use the following script to load the distilled student model and run text-to-image synthesis:

Python Generation Script
import torch
from diffusers import FlowMatchEulerDiscreteScheduler
from model import DanceOPDStudentPipeline

# Load the distilled student model
pipeline = DanceOPDStudentPipeline.from_pretrained(
    "worldbench/DanceOPD-Student", 
    torch_dtype=torch.float16
)
pipeline.scheduler = FlowMatchEulerDiscreteScheduler.from_config(
    pipeline.scheduler.config
)
pipeline.to("cuda")

# Generate an image using 20 steps (no CFG required)
prompt = "A professional dancer performing on stage, dramatic lighting"
image = pipeline(prompt, num_inference_steps=20).images[0]
image.save("output_generation.png")

Running Local Image Editing

To edit an existing image, specify the task context, input image, and instructions:

Python Editing Script
# Perform a local area edit on the generated image
edited_image = pipeline.edit(
    image=image,
    task="local_edit",
    instruction="change the dress color to a vibrant red",
    num_inference_steps=20
).images[0]
edited_image.save("output_edit.png")

Hyperparameter Guidelines

When executing distillation on your own datasets, we recommend the following training configurations:

  • Batch Size: 64 (distributed across 8 GPUs)
  • Learning Rate: 5e-5 with a linear warmup of 500 steps
  • Optimizing Steps: 50,000 steps for stable convergence
  • Sample Routing Ratio: 40% Text-to-Image, 40% Local Edit, 20% Global Style