It's not expensive per-se; A single element-wise multiplication of the output vector.
The real "expense" is that you need to prepare masks for every element of your grammar as they are expensive to recompute as needed; LLM tokens do not cleanly map onto elements of your grammar. (Consider JSON: LLM tokens often combine various special characters such as curly braces, colons, and quotes.)
This isn't that hard to compute, it's just more work to implement.
> is masking expensive?
It's not expensive per-se; A single element-wise multiplication of the output vector.
The real "expense" is that you need to prepare masks for every element of your grammar as they are expensive to recompute as needed; LLM tokens do not cleanly map onto elements of your grammar. (Consider JSON: LLM tokens often combine various special characters such as curly braces, colons, and quotes.)
This isn't that hard to compute, it's just more work to implement.