Hacker Newsnew | past | comments | ask | show | jobs | submit | steren's commentslogin

At Google we call it KTLO ("Keep The Lights On")

Check out this sample of using gVisor to spin up code sandboxes (potentially running on Cloud Run): https://github.com/GoogleCloudPlatform/cloud-run-sandbox


I've also been creating HTML tools over the years when I couldn't find client-side only websites for PDF to SVG, CSV to Sheet, Audio to Video, Video to MP4.

I list them at https://client-side.app/


Same thing I was working on in 2018 already: Google Cloud Run (https://cloud.run/) We just kept shipping, and shipping, and shipping...


I myself have wondered why VisionOS Safari wasn't more leaning into the idea that the DOM has semantics (e.g. <header> <footer> <nav>) and CSS is already able to convey depth information.

I love the idea. Good luck with the project. (I maintain a <stereo-img> web component, see https://stereo-img.steren.fr/, and I had fun adding Ray tracing to DOM elements: https://rtx-on.steren.fr/)


> I would never want to use something like ollama in a production setting.

We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon.


you need to benchmark against llama.cpp as well.


Did you test multi-user cases?


Assuming this is equivalent to parallel sessions, I would hope so, this is like the entire point of vLLM


vllm and ollama assume different settings and hardware. Vllm backed by the paged attention expect a lot of requests from multiple users whereas ollama is usually for single user on a local machine.


Because Gemini CLI is OSS, you can also find the system prompt at: https://github.com/google-gemini/gemini-cli/blob/4b5ca6bc777...


Also, a GPU instance needs 5s to start. The test depends on how large the model is. So a "very small weak model" can lead much faster than 20s


> Google's pricing also assumes you're running it 24/7 for an entire month

What makes you think that?

Cloud Run [pricing page](https://cloud.google.com/run/pricing) explicitly says : "charge you only for the resources you use, rounded up to the nearest 100 millisecond"

Also, Cloud Run's [autoscalling](https://cloud.google.com/run/docs/about-instance-autoscaling) is in effect, scaling down idle instances after a maximum of 15 minutes.

(Cloud Run PM)


Because the pricing when creating an instance shows me the cost for the entire month, then works out the average hourly price based on that. This is just creating a GPU VM instance, I don't see how to see the cost of different NVidia GPUs without it.

If you wanted to show hourly pricing, you would show that first, then calculate the monthly price from the hourly rate. I've no idea if the monthly cost includes sustained usage discount and what the hourly cost is for just running it for an hour.


> Because the pricing when creating an instance shows me the cost for the entire month

Are you referring to the GCP pricing calculator?

> This is just creating a GPU VM instance

Maybe you are referring to the Compute Engine VM creation page? Cloud Run is a different GCP service.

The Cloud Run Service creation UI doesn't show the cost.


Cloud Run PM lead here.

Sign up for new GPU types at https://docs.google.com/forms/d/e/1FAIpQLSdZk5sCsDUjAoYQX-sq...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: