概要
The Logit Lens skill enables developers and researchers to peer into the 'black box' of LLMs by mapping hidden states from intermediate layers back into the vocabulary space. By applying the final layer norm and unembedding head throughout the network, this skill reveals how predictions evolve from generic tokens in early layers to specific, factual outputs in later stages. It is an essential tool for mechanistic interpretability, debugging unexpected model behaviors, and understanding the precise layer where a model retrieves specific information or makes a logical leap during its forward pass.