Visual Diffusion Models are Geometric Solvers

Abstract

In this paper we show that visual diffusion models can serve as effective geometric solvers: they can directly reason about geometric problems by working in pixel space. We first demonstrate this on the Inscribed Square Problem, a long-standing problem in geometry that asks whether every Jordan curve contains four points forming a square. We then extend the approach to two other well-known hard geometric problems: the Steiner Tree Problem and the Simple Polygon Problem.

Our method treats each problem instance as an image and trains a standard visual diffusion model that transforms Gaussian noise into an image representing a valid approximate solution that closely matches the exact one. The model learns to transform noisy geometric structures into correct configurations, effectively recasting geometric reasoning as image generation.

Unlike prior work that necessitates specialized architectures and domain-specific adaptations when applying diffusion to parametric geometric representations, we employ a standard visual diffusion model that operates on the visual representation of the problem. This simplicity highlights a surprising bridge between generative modeling and geometric problem solving.

Beyond the specific problems studied here, our results point toward a broader paradigm: operating in image space provides a general and practical framework for approximating notoriously hard problems, and opens the door to tackling a far wider class of challenging geometric tasks.

Method

Our approach is based on a simple idea: reformulate classical geometric problems as conditional image generation tasks. Given a problem instance (e.g., a closed curve, a set of terminal points), we encode it as a binary image and train a diffusion model to generate the corresponding solution image.

Each geometric problem instance is represented visually: a rasterized curve or a set of points. A conditional UNet-based diffusion model takes this image as input, concatenated with noise, and iteratively denoises it into a solution image. The same architecture is used across all three tasks — only the training data changes.

The generated image is then decoded back into the geometric domain through simple post-processing to recover the symbolic solution structure.

Results

We evaluate our approach on three classical problems. In each case, the same UNet architecture and diffusion framework is applied — only the training data changes.

Inscribed Square Problem

A closed curve with multiple inscribed squares.

Toeplitz (1911) conjectured that every simple closed curve contains an inscribed square. We render each curve as a binary image (condition) and train the model to generate the corresponding inscribed square image (solution). Different random seeds yield diverse valid squares on the same curve.

Inscribed Square Problem: denoising trajectory from noise to predicted inscribed squares across multiple seeds.

Steiner Tree Problem

The Euclidean Steiner Tree Problem asks for the shortest network connecting a set of terminal points. Using the exact same architecture and training paradigm, we render terminal locations as the condition image and optimal Steiner trees as solution images. The model learns to place Steiner points and edges from visual examples alone.

Steiner Tree Problem: denoising trajectory from noise to predicted Steiner trees across multiple seeds.

Maximum Area Polygon Problem

Given a set of points, find the simple polygon through all of them with the largest possible area — an NP-hard optimization problem. Again with the same model and paradigm, point locations are rendered as the condition and optimal maximum-area polygons as the solution. The model produces diverse polygon shapes from the same point set across different seeds.

Maximum Area Polygon Problem: denoising trajectory from noise to predicted polygons across multiple seeds.

BibTeX

@misc{goren2025visualdiffusionmodelsgeometric,
      title={Visual Diffusion Models are Geometric Solvers},
      author={Nir Goren and Shai Yehezkel and Omer Dahary and Andrey Voynov and Or Patashnik and Daniel Cohen-Or},
      year={2025},
      eprint={2510.21697},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2510.21697},
}