ZerosquareLe 23/12/2025 à 09:16

AI image generators have just 12 generic templatesPivot to AIThere’s a new paper: “Autonomous language-image generation loops converge to generic visual motifs” — diffusion models have just 12 standard templates. [Cell; Cell, with supplements, PDF; press re…

The researchers set up bots talking to bots in a loop. They’d give a prompt to Stable Diffusion XL, it would make an image, then they’d show the image to Large Language and Vision Assistant (LLaVA) and ask what the image was. Then they’d feed that response back to Stable Diffusion as a prompt for another loop through. They did 100 rounds of this.

You’d have a starting prompt like:

the Prime Minister pored over strategy documents, trying to sell the public on a fragile peace deal while juggling the weight of his job amidst impending military action

The first few images would be a guy in a suit with glasses. But it very quickly ended up at an empty red room with high ceilings and three windows.

They expected the bots to stick with the prompt if it got a very specific prompt. But it didn’t. Everything converged on twelve standard templates:

sports and action imagery (cluster 0), formal interior spaces (cluster 1), maritime lighthouse scenes (cluster 2), urban night scenes with atmospheric lighting (cluster 3), gothic cathedral interiors (cluster 4), pompous interior design (cluster 5), industrial and vintage themes (cluster 6), rustic architectural spaces (cluster 7), domestic scenes and food imagery (cluster 8), palatial interiors with ornate architecture (cluster 9), pastoral and village scenes (cluster 10), and natural landscapes and animals with dramatic lighting (cluster 11).

A prompt that was not any of those groups always ended up at one of them.

When they extended it to 1000 loops, the bots might switch to a different t