More

steren · 2026-02-07T04:25:29 1770438329

At Google we call it KTLO ("Keep The Lights On")

steren · 2025-12-25T01:17:31 1766625451

Check out this sample of using gVisor to spin up code sandboxes (potentially running on Cloud Run): https://github.com/GoogleCloudPlatform/cloud-run-sandbox

steren · 2025-12-13T21:15:17 1765660517

I've also been creating HTML tools over the years when I couldn't find client-side only websites for PDF to SVG, CSV to Sheet, Audio to Video, Video to MP4.

I list them at https://client-side.app/

steren · 2025-10-12T22:44:18 1760309058

Same thing I was working on in 2018 already: Google Cloud Run (https://cloud.run/) We just kept shipping, and shipping, and shipping...

steren · 2025-08-15T05:50:13 1755237013

I myself have wondered why VisionOS Safari wasn't more leaning into the idea that the DOM has semantics (e.g. <header> <footer> <nav>) and CSS is already able to convey depth information.

I love the idea. Good luck with the project. (I maintain a <stereo-img> web component, see https://stereo-img.steren.fr/, and I had fun adding Ray tracing to DOM elements: https://rtx-on.steren.fr/)

steren · 2025-08-05T21:01:52 1754427712

> I would never want to use something like ollama in a production setting.

We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon.

ekianjo · 2025-08-05T21:40:47 1754430047

you need to benchmark against llama.cpp as well.

apitman · 2025-08-05T22:03:36 1754431416

Did you test multi-user cases?

jasonjmcghee · 2025-08-06T07:48:17 1754466497

Assuming this is equivalent to parallel sessions, I would hope so, this is like the entire point of vLLM

sbinnee · 2025-08-06T09:27:41 1754472461

vllm and ollama assume different settings and hardware. Vllm backed by the paged attention expect a lot of requests from multiple users whereas ollama is usually for single user on a local machine.

steren · 2025-06-25T19:37:07 1750880227

Because Gemini CLI is OSS, you can also find the system prompt at: https://github.com/google-gemini/gemini-cli/blob/4b5ca6bc777...

steren · 2025-06-04T20:21:34 1749068494

Also, a GPU instance needs 5s to start. The test depends on how large the model is. So a "very small weak model" can lead much faster than 20s

steren · 2025-06-04T17:47:07 1749059227

> Google's pricing also assumes you're running it 24/7 for an entire month

What makes you think that?

Cloud Run [pricing page](https://cloud.google.com/run/pricing) explicitly says : "charge you only for the resources you use, rounded up to the nearest 100 millisecond"

Also, Cloud Run's [autoscalling](https://cloud.google.com/run/docs/about-instance-autoscaling) is in effect, scaling down idle instances after a maximum of 15 minutes.

(Cloud Run PM)

mythz · 2025-06-06T01:39:55 1749173995

Because the pricing when creating an instance shows me the cost for the entire month, then works out the average hourly price based on that. This is just creating a GPU VM instance, I don't see how to see the cost of different NVidia GPUs without it.

If you wanted to show hourly pricing, you would show that first, then calculate the monthly price from the hourly rate. I've no idea if the monthly cost includes sustained usage discount and what the hourly cost is for just running it for an hour.

steren · 2025-06-06T02:47:20 1749178040

> Because the pricing when creating an instance shows me the cost for the entire month

Are you referring to the GCP pricing calculator?

> This is just creating a GPU VM instance

Maybe you are referring to the Compute Engine VM creation page? Cloud Run is a different GCP service.

The Cloud Run Service creation UI doesn't show the cost.

steren · 2025-06-04T16:30:40 1749054640

Cloud Run PM lead here.

Sign up for new GPU types at https://docs.google.com/forms/d/e/1FAIpQLSdZk5sCsDUjAoYQX-sq...