They don't charge anything extra for code execution, you just pay for input and output tokens. The above example used 10 input, 1,531 output which is $0.15/million for input and $3.50/million output for Gemini 2.5 Flash with thinking enabled, so 0.536 cents (just over half a cent) for this prompt.
This is so much cheaper than re-prompting each tool use.
I wish this was extended to things like: you could give the model an API endpoint that it can call to execute JS code, and the only requirement is that your API has to respond within 5 seconds (maybe less actually).
I wonder if this is what OpenAI is planning to do in the upcoming API update to support tools in o3.
I imagine there wouldn’t bd much of a cost to the provider on the API call there so much longer times may be possible. It’s not like this would hold up the LLM in any way, execution would get suspended while the call is made and the TPU/GPU will serve another request.
They need to keep KV cache to avoid prompt reprocessing, so they would need to move it to ram/nvme during longer api calls to use gpu for another request
This common feature requires the user of the API to implement the tool, in this case, the user is responsible to run the code the API outputs. The post you replied suggests that Gemini will run the code for the user behind the API call.
I wish Gemini could do this with Go. It generates plenty of junk/non-parseable code and I have to feed it the error messages and hope it properly corrects it.
My llm-gemini plugin supports that: https://github.com/simonw/llm-gemini
I ran that just now and got this: https://gist.github.com/simonw/cb431005c0e0535343d6977a7c470...They don't charge anything extra for code execution, you just pay for input and output tokens. The above example used 10 input, 1,531 output which is $0.15/million for input and $3.50/million output for Gemini 2.5 Flash with thinking enabled, so 0.536 cents (just over half a cent) for this prompt.