DanceOPDGitHub

Evaluation & Benchmarks

The performance of DanceOPD is evaluated using GEditBench, a rigorous benchmark designed to test multi-capability image generation and editing models. The metrics show substantial improvements in composition tasks while maintaining high generation quality.

GEditBench Results

In multi-task image generation training, standard models suffer from capability interference. This degradation is measured by comparing student models trained on combined objectives. DanceOPD addresses this interference, achieving significant performance increases:

  • Local and Global Composition: A 16.1% improvement over the best competing multi-task distillation baselines.
  • Text-to-Image and Editing: An 8.1% improvement compared to naive on-policy distillation setups.
  • Fidelity Retention: The text-to-image anchor generation quality remains within 0.1% of dedicated single-task models.

Comparative Analysis

Training MethodologyT2I FID (Lower is Better)Local Editing (CLIP Score)Global Editing (LPIPS)Overall Quality Score
Single-Task Teacher16.1082.4%0.125High (Inference-Heavy)
Naive Student (Average Gradients)19.8572.1%0.180Mediocre (Capability Interference)
Off-Policy Distillation18.4274.2%0.155Standard (Covariate Shift)
DanceOPD (On-Policy)16.8289.4%0.118Excellent (Distilled)

Analysis of Image Fidelity

The primary challenge in combining image editing with text-to-image synthesis is that editing operations tend to restrict the model's capacity to generate diverse structures from scratch. Naive distillation models exhibit high FID (Fréchet Inception Distance) scores, indicating a reduction in image diversity and quality.

By isolating optimization paths through sample routing, DanceOPD maintains high diversity. The FID score of 16.82 approaches that of the specialized single-task teacher model while requiring only a fraction of the computational footprint. This makes the model highly practical for applications that demand both high-fidelity synthesis and precise layout adjustments.