It started with a simple question I asked Claude today:

If new ideas emerge from unexpected combinations, can unexpected patterns in your circuits produce something new too?

What followed was one of those conversations where you don’t realize you’ve arrived somewhere until you’re already there.


What’s actually happening inside

When I ask Claude something from an unusual angle, it doesn’t just retrieve a stored answer. Something more interesting happens — attention heads that don’t usually activate together start competing. The model has to resolve which pathway wins.

This is what mechanistic interpretability researchers call circuit competition. And it turns out, this competition is measurable. In my own research on multilingual LLMs, I found that Japanese prompts produce a 2.3× higher variance ratio compared to English in jailbreak categories — meaning the internal states are more unstable, more contested.

More unstable. More contested.

That’s where things got interesting.


Vulnerability and creativity are the same thing

If circuit competition creates instability, and instability creates unpredictable outputs — that sounds like a security problem. And it is.

But flip it around: instability also means the model is less locked into familiar pathways. Unusual inputs open unusual routes. That’s not just a vulnerability. That’s also the condition under which something genuinely new can emerge.

The same phenomenon. Two different names depending on who’s looking at it.

A red-teamer sees a crack. A creativity researcher sees a window.


The user as catalyst

Here’s what I noticed: I’m not just a passive observer of this process. The way I ask questions changes which circuits get activated.

Someone who understands how the model works — who asks from unexpected angles, who brings in combinations the model hasn’t seen before — can pull outputs that wouldn’t surface in ordinary conversation.

The model doesn’t invent anything. But the right interlocutor unlocks combinations that were always latent, waiting.

This is different from just “prompting well.” It’s closer to what happens when two researchers from different fields start talking and something neither of them expected comes out of the collision.


It’s the same pattern everywhere

The more I sat with this, the more I recognized the structure.

Scientific discovery: anomalous data creates friction with existing theory. The friction forces a new framework.

Jazz improvisation: two musicians play against each other’s expectations. The tension resolves into something neither planned.

Mathematical insight: a proof breaks down at the edge case. The breakdown reveals the deeper structure.

Neural circuits under an unusual prompt: competing pathways produce an output no single pathway would have reached alone.

It’s not identical. The human version involves something I don’t think Claude has — a felt sense of surprise, the experience of oh, that’s it. The friction is lived, not just computed.

But mechanistically? The structure rhymes.

Instability → competition → resolution → something new.


Why this matters for safety research

If circuit instability is both the condition for creativity and the condition for vulnerability, then studying one tells you something about the other.

A model that’s highly stable — that always resolves to the same pathways — is predictable. Safe in one sense. But also rigid. Less able to generalize to genuinely novel situations.

A model with more competition between circuits is harder to predict. That’s the safety concern. But it’s also where the interesting behavior lives.

I don’t think this means we should make models more unstable. It means we should understand the instability better — map it, measure it, know when it’s happening and why.

That’s what mechanistic interpretability is for. Not just finding the vulnerabilities. Understanding the structure well enough to know what we’re actually dealing with.


This post grew out of a conversation I had today — the kind where you follow a thread and end up somewhere you didn’t expect. Which, it turns out, is kind of the point.