Jannik Brinkmann

I am a Ph.D. student interested in the structure and interpretation of natural and artificial systems. I am advised by Christian Bartelt and affiliated with the Interpretable Neural Networks group, led by David Bau. I currently red-team frontier models at OpenAI, building on our Agents of Chaos study of the risks of multi-agent systems.

Selected Publications

SafetyICLR 2026
A jailbreak built for one model often breaks another. We trace that transfer to shared internal representations: models that encode concepts alike inherit each other's vulnerabilities.
InterpretabilityComputational Linguistics 2025
We propose a perspective on interpretability grounded in causal mediation analysis.
InterpretabilityNAACL 2025 · Oral
Even models trained almost entirely on English are fluent in other languages. We trace that fluency to shared, language-agnostic representations of grammar that are reused across languages.
ToolsICLR 2025
The internals of the largest open models are effectively out of reach for most researchers. NNsight and NDIF open them up through a transparent interface for running interventions at scale.
InterpretabilityNeurIPS 2024
Sparse autoencoders decompose activations into candidate features, but there's no ground truth for whether a decomposition is faithful. Board-game models offer one: the true state of the board is known.

Positions

Mar 2026 – Present
OpenAIContractor
Multi-agent red-teaming of frontier models in collaboration with OpenAI's safety team.
Jun 2025 – Oct 2025
J.P. Morgan AI ResearchResearch Intern
Synthetic data generation and post-training methods for mathematical reasoning in language models, resulting in a NeurIPS 2025 workshop paper.
Nov 2024 – Mar 2025
NYU Center for Data ScienceVisiting Researcher
Safety and adversarial robustness of language models with Prof. He He, resulting in a paper on the cross-model transferability of jailbreak attacks.
Apr 2024 – Aug 2024
Northeastern UniversityVisiting Researcher
Structure and interpretation of neural networks with Prof. David Bau, resulting in a journal publication and conference papers at ICLR, NAACL (Oral), and NeurIPS.