Circuit Tracer – An open-source AI model internal decision tracing tool developed by Anthropic

What is Circuit Tracer？

Circuit Tracer is an open-source tool developed by Anthropic for studying the internal mechanisms of large language models (LLMs). It generates attribution graphs to reveal the internal steps a model takes when producing specific outputs. These graphs help researchers trace the model’s decision-making process, visualize relationships between features, and test various hypotheses.

Circuit Tracer supports several popular open-source models, such as Gemma and LLaMA, and provides an interactive visualization interface powered by Neuronpedia, enabling users to explore and analyze model behavior easily.

Key Features of Circuit Tracer

Attribution Graph Generation: Reveals the model’s decision paths, showing the influence relationships between features and nodes.
Visualization & Interactivity: Provides an intuitive, interactive interface for viewing and manipulating attribution graphs, aiding understanding and sharing.
Model Interventions: Allows users to modify feature values and observe output changes to validate model behavior.
Multi-Model Support: Compatible with mainstream models such as Gemma and LLaMA, enabling comparative research.

Technical Principles of Circuit Tracer

Transcoders: Pretrained transcoders are used to generate attribution graphs. A transcoder is a neural network component that converts internal model features into more interpretable forms. With transcoders, Circuit Tracer can capture and represent relationships between internal model features and nodes.
Direct Effect Computation: Circuit Tracer computes the direct impact of each non-zero transcoder feature, transcoder error node, and input token on other non-zero transcoder features and the output logits.
Graph Pruning: The generated graphs are pruned to remove nodes and edges with minimal influence, retaining only parts that significantly affect model decisions. Users can customize pruning parameters (e.g., node and edge thresholds) to control graph complexity and clarity.
Interactive Visualization Interface: A web-based interface allows users to directly view and manipulate attribution graphs in their browser. Features include node labeling, grouping, and annotation to help users better understand and analyze the model’s internal mechanisms.

Project Links for Circuit Tracer

Project Website: https://www.anthropic.com/research/open-source-circuit-tracing
GitHub Repository: https://github.com/safety-research/circuit-tracer

Application Scenarios for Circuit Tracer

Model Behavior Research: Analyze a model’s decision-making process through attribution graphs to understand the internal logic behind specific outputs.
Multilingual Model Analysis: Investigate internal representations in multilingual models (e.g., LLaMA) and explore cross-lingual processing mechanisms.
Multi-Step Reasoning Studies: Examine model behavior in multi-step reasoning tasks to reveal the step-by-step logic and reasoning process.
Model Optimization and Improvement: Use intervention features to test different hypotheses, verify whether model behaviors meet expectations, and improve model architecture.
Education and Communication: Present complex model decision processes visually using the interactive interface, making it easier for teaching and knowledge sharing.