Constrained Decoding Visualizer

At each generation step, the model produces a distribution p_t over all tokens. A grammar constraint defines which tokens are valid given the current parse state. The feasible mass Z_t = sum of p_t(i) for valid tokens i tells us how much the constraint "agrees" with what the model wanted to say.

The constrained distribution q_t(i) = p_t(i) / Z_t for valid tokens (zero otherwise) is what we actually sample from. When Z_t is low (red), the constraint is fighting the model — few valid tokens carry probability mass. When Z_t is high (green), the model naturally wants to produce grammar-valid tokens.