The problem
Every AI-built website looks the same.
Inter font. bg-indigo-500. rounded-2xlon every card. Centered hero with one CTA. Three feature cards directly below. "Build the future of X." Maybe a purple-to-blue gradient if the model is feeling bold. Lucide icon set, defaulted to.
This isn't a v0 problem. It's a Bolt problem and a Lovable problem and a Framer-AI problem and a Galileo problem and a me-prompting-Claude problem. It's the median of the training data, and "be creative" prompts don't move the median.
I build a lot of side projects. I also film a lot of Instagram reels where I cut into the apps mid-sentence — show the user the thing I just shipped. The cuts only land if the app has visual personality. If my reel cuts to Inter + indigo + rounded cards, it could be cutting to anyone's app. There's no recognition, no signature, no reason to watch. The reel dies.
I'd been working around this manually: copy a screenshot from Dribbble, paste it into the prompt, beg the model to match. Inconsistent results. Mostly slop with a faint cosmetic veneer.
I wanted a skill. Input: a completed Next.js project (already built well from a data/journeys perspective). Output: 10 wildly different visual reskins as evaluable wireframes. Pick one, build it properly into the real app, film the reel against a UI that has an opinion.
The research phase
Before designing the skill, I spent ~5 minutes on the question of why AI UI converges. Then I dispatched three research agents in parallel.
Agent Ainvestigated prior art: how do v0, Galileo, Uizard, Magic Patterns, framer.ai, Locofy, and Mobbin attempt aesthetic diversity? What works, what fails, where's the public discourse?
Agent Bbuilt a curated library: 25-30 distinct UI design archetypes, each with hard tokens (specific fonts, hex palettes, motion DNA, real reference URLs) and a "would this read in a 2-second video cut?" score.
Agent C catalogued the AI-slop signal set: concrete observable design tells that mark UI as AI-generated, organized by category (typography, color, layout, component, copy, motion, imagery).
I asked for punch lists, not surveys. Each agent came back in 4-5 minutes with concrete, citable findings.
What the research found
Agent A's biggest finding: anchored reference grounding beats adjectives. Specifying "make it look like Linear" or "make it look like Are.na" — naming an actual product the model has seen many times — collapses the latent space into a knowable region. Saying "make it modern" or "make it clean" produces the statistical mean.
Agent B's library returned 28 distinct archetypes, each specific enough to be recognized in a flash: Swiss Editorial Grid, Bloomberg Brutalist, CRT Terminal, Tactile Brutalism, Liquid Glass, Frutiger Aero Revival, Cute-alism, Risograph Zine, Newsprint Broadsheet, Sports Almanac, Architectural Blueprint, ASCII Art Revival, Notebook Sketch, Acid Graphics, Bauhaus Geometric, Tarot Mystical, Photographic Editorial, Skeuomorphic Revival, Vaporwave Y2K, Notion Sketched, Surveillance Dossier, Acid Editorial, Tech-Bro Console, Pixel Art, Maximalist Collage, Cinema Letterbox, Geocities Formal, Architectural Mono-Color.
Agent C catalogued 48 concrete slop signals across 7 categories. The Tailwind defaults are the worst offenders: bg-indigo-500, from-indigo-500 to-purple-600, slate-50, rounded-2xl, hover:scale-105, fade-in-up on scroll. But the deeper finding was the meta-pattern:
Slop is the absence of decisions, not the presence of any particular pattern.
Rounded corners aren't slop. Uniformrounded corners on every surface are. Inter isn't always wrong. Inter as the unexamined default is. The tell isn't any specific design choice — it's the avoidance of choice. Models regress to defaults that "offend no one" when they aren't forced to commit at every fork.
This reframed the design. The skill couldn't just ban slop patterns. It had to force a decision at every fork.
The design (four mechanisms)
Anchored archetype library. Each of the 28 archetypes is a JSON object with hard tokens: font_display, font_import_url (a real Google Fonts CSS2 URL), palette with hex codes, radii_scale as an array of numbers, motion_dna (the personality of motion, not just "smooth"), reference_urls (2-3 real sites), and anti_rules (what would ruin the archetype).
Reference grounding.Each parallel sub-agent WebFetches its archetype's reference URLs and uses them as visual ground truth. Exemplars do work prose can't.
Forbidden-patterns blocklist.All 48 slop signals are hard-coded as a "NO" list in the concept-generator prompt. The model sees: do not use Inter. Do not use bg-indigo-500. Do not use the centered-hero + three-feature-cards layout.
Slop-critic pass. After 10 concepts are generated, a separate critic agent scores each on 8 axes: typography commitment, palette commitment, layout primitive strength, copy specificity, motion personality, reference fidelity, video-cut recognizability, decision density.
The constitutional rule sits at the top of every sub-agent's prompt: Slop is the absence of decisions. At every fork — font, palette, radius, density, motion, voice — commit. Any output that could have come from defaults is rejected.
Sandbox, not a branch
Two architectural decisions that mattered.
Sandbox sibling repo, not a branch in the target project. When you run /reskin on ~/Projects/your-app/, it creates ~/Projects/your-app-reskin-lab/ as a standalone Next.js app. Real stack, real components, real data shapes, but zero risk to your prod code.
Parallel generation with shared critique.Ten concept generators run concurrently — each truly isolated from the others' aesthetic choices, so they can't accidentally converge. Then a single critic agent reviews all ten holistically.
The first run
Right after the skill installed, I ran it on jiyajale (a smaller side project). 10 concepts generated in ~15 minutes. Sampled archetypes were a clean mix: acid-graphics, risograph-zine, bauhaus-geometric, geocities-formal, skeuomorphic-revival, crt-terminal, swiss-editorial-grid, sports-almanac, tarot-mystical, pixel-art-bitmap.
Each concept committed to its archetype. The CRT Terminal version was actually terminal-green-on-black with a blinking cursor. The Geocities Formal version had Times New Roman, default-blue underlined links, and tiled-background vibes. The Risograph Zine had two-color overprint with visible misregistration. They didn't all succeed equally — but they didn't collapse to the median. That was the test.
The bigger pattern
The thing I took away from building this isn't the skill itself. It's the workflow.
The skill is ~25 files, ~1500 lines, built in one session. None of those files are particularly clever. What's load-bearing is the research-then-spec-then-plan-then-execute cycle. Skipping research would have produced a worse archetype library. Skipping the spec would have produced a leaky design. Skipping the plan would have wasted subagent time. Doing each in order, with the previous step's artifacts as input, made the final implementation almost mechanical.
This is also the first thing I've shipped where the constitutional rule— the meta-principle behind the design — is encoded directly into the runtime, not just the docs. Every sub-agent literally reads "slop is the absence of decisions" before generating its concept.
The other test will be whether the gallery cuts well in actual Instagram reels. That's what this was for. The reels are coming.
— Ayush Upneja, May 2026
Built with Claude Code + the superpowers skill suite.