Watch how grammar constraints reshape a language model's token distribution, step by step.
ptqt
At each generation step, the model produces a distribution p_t over all tokens.
A grammar constraint defines which tokens are valid given the current parse state.
The feasible mass Z_t = sum of p_t(i) for valid tokens i
tells us how much the constraint "agrees" with what the model wanted to say.
The constrained distribution q_t(i) = p_t(i) / Z_t for valid tokens (zero otherwise)
is what we actually sample from. When Z_t is low (red), the constraint is fighting the model
— few valid tokens carry probability mass. When Z_t is high (green), the model
naturally wants to produce grammar-valid tokens.