How we get clean design cutouts from a generative model
A print-on-demand pipeline needs transparent artwork. A t-shirt design has to live on white cotton, black cotton, and a hundred shades in between. The artwork itself has to ship as a transparent PNG with a clean alpha channel — no halo, no fringe, no ghost of the background bleeding into the design's edges.
The hard part is that generative image models don't produce transparent PNGs. They produce JPEG-grade RGB on a background. Whatever you do to remove the background later has to fight whatever the model decided to put there. After a lot of failed experiments, we landed on a technique that works for ~95% of designs on the first attempt. This post is what it is, and what we tried first.
The goal
The output is a transparent PNG of the artwork, edges crisp, alpha properly anti-aliased at the boundary. The artwork lands on a Printful product, gets composited onto a mockup, and then onto a real garment. Any fringe, smear, or color contamination from the original background is visible in production.
We need this to be reliable, automated, and run on every concept the pipeline generates. There is no human in the loop reviewing edge quality.
What didn't work #1: asking the model for transparency
Ask for "transparent background" or "PNG with alpha channel" and you get a checkerboard — the gray-and-white pattern design tools use to depict transparency. The model has seen this all over its training data and produces the visual metaphor. Background removers can strip it, but the result is ragged, with gray bleeding into the artwork's edges.
What didn't work #2: flat solid backgrounds
Next attempt: instruct the model to generate against a single solid color, picked to maximally contrast with the artwork. Use a chroma key remover.
Two failure modes:
- The model doesn't actually produce flat solid color. It produces something close, with subtle texture, gradients, and lighting from the artwork bleeding into the background. Chroma keying treats anything within a tolerance band as "background" and anything outside as "foreground" — but with subtle texture, the tolerance has to be wide, and now you're keying out parts of the artwork that share a hue.
- Color contamination at the boundary. The model treats the background as part of the image. Light from the artwork reflects onto the background. The boundary pixels are a blend of foreground and background colors. When you key out the background, the boundary pixels get the wrong alpha — and you can see the contaminating color in the cutout.
You can paper over this with edge-aware mattes and Photoshop tricks, but at scale it's brittle.
What didn't work #3: distinctive patterns
If the matter struggles with subtle texture, give it something unmistakable. We tried concentric rings, checkerboards, in black-and-white and in colors picked to contrast with the artwork.
Same failure mode as #2, in some ways worse. A pattern shares too much essence with the artwork — high-frequency, graphical, edge-heavy. Background removers separate something graphic from something not. When the background is itself a deliberate graphic, that distinction collapses, and the matter keys arbitrarily on whichever pattern element happens to be near the boundary. Edges came out worse than with flat color, not better.
The signal here pointed at the answer: the background needed to look fundamentally unlike the artwork — different in style, different in spatial frequency.
The technique that worked
Generative models are great at producing rich, naturalistic scenes. They're bad at producing flat color. So: ask for a rich scene that maximally contrasts with the artwork's palette, then use a real background remover that handles natural imagery well.
The pipeline:
- Curate a small palette of high-contrast natural scenes. Five is enough.
- Pick the scene whose representative color is maximally distant from the artwork's color palette in HSL space.
- Generate the artwork over that scene as the background.
- Strip the background via Photoroom.
The scenes are deliberately chosen to span the hue wheel. Whatever palette the artwork has, at least one scene will sit far from it.
The five scenes
This is SCENE_PALETTE from packages/python/merchsage/merchsage/concepts/backgrounds.py:
| Hex | Description |
|---|---|
#2D5A27 |
Oblique view of a dense coniferous forest patch on a hillside — no sky, no clouds — heavily blurred/defocused with soft, diffused natural daylight |
#C8A23D |
Close-up of a dry wheat field at golden hour — no sky, no horizon — heavily blurred/defocused with warm, diffused amber sunlight filtering through the stalks |
#3A6B7C |
Close-up of smooth river stones submerged in shallow clear water — no sky, no surface reflections — heavily blurred/defocused with cool, diffused overcast daylight |
#A0522D |
Close-up of layered sandstone rock face with natural iron-oxide striations — no sky, no vegetation — heavily blurred/defocused with soft, diffused warm daylight |
#7B5EA7 |
Close-up of a dense lavender field in full bloom — no sky, no paths — heavily blurred/defocused with soft, diffused cool daylight casting gentle violet shadows |
A few non-obvious choices:
- Heavily blurred / defocused. This is critical. A sharp natural scene gives the background remover too many edges to confuse with the artwork's edges. A defocused scene reads as a soft color field with low spatial frequency — easy to subtract.
- No sky, no horizon, no clouds. Skies are bright and uniform; they create artificial flat regions where chroma keying behavior re-emerges. The instructions explicitly forbid them.
- Diffused light. Direct sunlight produces hot specular highlights that compete with the artwork. Diffused light gives uniform exposure across the frame.
- Hue spread. Forest green, wheat gold, river blue, sandstone red-brown, lavender violet. Five hues, ~72° apart on the wheel. Whatever the artwork is, at least one scene will be in opposition.
Greedy max-min selection in HSL space
For a given concept, we want the scene whose representative color is furthest from any color in the artwork palette.
def _hsl_distance(hsl1, hsl2):
"""Squared distance between two HSL tuples (wrapping hue)."""
dh = min(abs(hsl1[0] - hsl2[0]), 1 - abs(hsl1[0] - hsl2[0]))
ds = hsl1[1] - hsl2[1]
dl = hsl1[2] - hsl2[2]
return dh**2 + ds**2 + dl**2
The hue distance wraps at 1.0 — red and magenta are close, even though their numeric hue values are far apart. The selection is a simple greedy max-min:
for each scene, compute the minimum HSL distance to any artwork color. pick the scene with the maximum of those minima.
In other words: pick the scene that is far from the artwork's closest color, not its average. This is the right objective because the failure mode of background removal is "this background pixel got confused with that artwork pixel" — what matters is the worst pair, not the typical pair.
The HSL space here is preferable to RGB or LAB. Hue captures the perceptual axis humans (and Photoroom's matting model) actually disambiguate on. Saturation and lightness are secondary signals — they don't dominate.
Stripping with Photoroom
Once the artwork is generated against the selected scene, Photoroom's API does the actual matting. Two reasons we picked it:
- It produces clean alpha at edges, including hair-like fine detail. Most chroma keyers don't.
- It handles natural imagery well. The scene is full-frame nature; Photoroom doesn't get confused by it because nature is what it's trained on.
The remaining pipeline is mundane: alpha threshold to clean up sub-1% alpha noise, crop to the artwork's bounding box, save as PNG. Up to 20 concurrent Photoroom calls per pipeline run.
Catching the misses
Photoroom gets us most of the way, but it isn't perfect. A scene texture occasionally bleeds into a thin negative-space region. The model sometimes paints the artwork with edges that share a hue with the scene, and the matte cuts in too far. A halo of warm pixels can ring an element after the cutout. We can't ship those.
The fan rater downstream is the cleanup pass. It gets two images per design:
- Image A: the original generation, scene background still present.
- Image B: the artwork after the scene has been stripped.
Its job is to compare them and flag structural artifacts in B that came from A's scene. The vocabulary is specific: a remnant is a blob, halo, smear, or patch — something with shape and location that isn't part of the intended design. A ring of grass-green pixels around a coffee cup is a remnant. A semi-transparent smudge of sky in empty space is a remnant.
What isn't a remnant matters as much, because without these exclusions the rater over-flags:
- Scene-lighting tints baked into design colors. Generating against a wheat field warms the foreground hues; against river stones it cools them. Those tints persist into the cutout. They look like contamination but aren't — they're the model's rendering, and they're in every design.
- Soft anti-aliased edges. A 1–2px alpha gradient is how a clean cutout should look. Penalizing it produces crunchy, aliased designs.
- Intentional elements the artwork description names — stars, dots, glow rings. Without this clause the rater flags legitimate stylistic flourishes as scene bleed.
Designs the rater flags as having visible remnants get thrown away.
Results
Roughly 95% of designs come out of Photoroom clean on the first pass. The rater catches most of what doesn't. By the time designs reach production, visible cutout artifacts show up about 1 in 300.
The takeaway
The generative model does what it's good at: producing a rich naturalistic scene. The classical CV pipeline (color-distance scene selection + Photoroom matting) does what it's good at: separating foreground from a non-degenerate background.
Most of the time, when a generative pipeline gives bad output, the answer isn't a better prompt or a better model. It's recognizing which step in the pipeline is asking the model to do something it's bad at, and replacing that step with a deterministic one.