Warning: $14k BigQuery charge in 2 hours

jeffparsons · on Feb 20, 2024

Warning: most cloud providers (Google, Amazon, Microsoft) require you to accept unlimited liability to use their services.

If you're running a business and you have lawyers, then fair enough — just play the game. But for individuals, it seems crazy that so many of us accept this sort of thing. Good luck contesting the charge with your credit card company when you already agreed to a contract that said Google could bill you thousands of dollars and then you used thousands of dollars worth of their service.

Big cloud providers are not your friend. They do not care if they destroy the lives of you and your family, unless it's happening so often that it's making mainstream news.

My advice is to go and delete your cloud accounts, and only use services that offer hard spending caps, and ideally prepaid accounts.

Maybe this doesn't leave many options. Oh well. Maybe if you can't afford big lawyers then you also can't afford the risks of using big cloud.

romeros · on Feb 20, 2024

This is just a single data point but I had a surprise bill with Google. I talked to the support and got it waived off.

I used Amazon EC2 instances for years and I always felt in control. There were never any surprises. I knew even in the worst case situation I would be okay because I had faith in the Amazon support. With Google I felt insecure. I never played with any of Google cloud services since then.

Amazon's customer first policy is really true. They try their absolute best to make sure there are no surprises to a great extent. Even the UI is very intuitive.

theolivenbaum · on Feb 20, 2024

Same here - incidentally was also one of the weirdest interactions with customer support I've ever had. I suspect the first point of contact was some sort of LLM/chatbot that desperately wanted to make sure I was feeling fine and that there was nothing to worry about. When I was forwarded to the billing support team the interaction went back to normal - couple of messages back and forth and some homework to set the real budget limit (the quota is just for alarms) and they waved the charge.

tw04 · on Feb 20, 2024

>Amazon's customer first policy is really true.

Which part of customer first drove their egress fee policies?

zer00eyz · on Feb 20, 2024

The part that was ALWAYS there.

Egress is basically all outbound traffic. The fee was always this. Dont act shocked when it doesn't go down when you have buyers remorse.

reaperman · on Feb 21, 2024

Same here. GCP waived off a surprise bill of $4,500 when I accidentally left a TPUv1 running for a month many years ago on a personal project (I was just toying around with the new TPU for an hour or so in my own free time, and didn't realize that unlike a GPU, the TPU has to be shut off separately from the CPU/VM or else it keeps charging by the hour.

LunaSea · on Feb 21, 2024

Amazon definitely also has it's share of billing issues.

A personal example would be that we reserved an instance based on information given by our AWS account manager. Said instance turned out to have issues linked to my original question to the account manager who answered incorrectly.

The reserved instance team then refused to refund us but also refused to tell how much they would prorate if we were to upgrade instead.

Basically a protection racket.

SOLAR_FIELDS · on Feb 21, 2024

I simply don’t accept this argument, primarily because the way AWS handles NAT gateway fees is really only explainable as something that is designed to be predatory

nothttparchive · on Feb 20, 2024

Yeah, I have spent much more than $14k to date and would have spent much more over time, losing my business isn't rational. I think it's just another "Google can't do customer support to literally save their life" example.

AdamJacobMuller · on Feb 20, 2024

All of the cloud services I have are setup only with privacy.com cards. I have each individual cards limited to just above what the monthly expected spend is. Even if there's a (reasonable) spike I can see it and I have to take manual action before the charge will go through.

Can not recommend privacy.com enough.

myself248 · on Feb 20, 2024

That's not what privacy.com does or is for. They advertise it, but I've had transactions blow right through the façade. Specifically, the New York Times, after my trial subscription ended and I watched the stupendously-expensive charges bounce, they kept trying and eventually tried a different way and it went through.

I emailed support, and here's what I got back:

> Hi, $firstname. I've been reviewing your dispute and wanted to touch base with you to explain what happened.

> It appears that the disputed charge is a "force post" by the merchant. This happens when a merchant cannot collect funds for a transaction after repeated attempts and completes the transaction without an authorization — it's literally an unauthorized transaction that's against payment card network rules. It's a pretty sneaky move used by some merchants, and unfortunately, it's not something Privacy can block.

rootusrootus · on Feb 21, 2024

They have a page that says pretty much exactly that, as well: https://support.privacy.com/hc/en-us/articles/360012288214-F...

What's interesting is that they seem to be glossing over the truth. It's not unauthorized, per se, it's using a prior authorization code. And it's intended for processing offline transactions. It seems like 'force' is an industry term and a bit hyperbolic when used in lay discussion.

More discussion here: https://www.tidalcommerce.com/learn/force-sale-credit-card

sidewndr46 · on Feb 21, 2024

It's the equivalent of a payment processor claiming a dog ate their homework.

12345hn6789 · on Feb 21, 2024

>It's literally an unauthorized transaction that's against payment card network rules. It's a pretty sneaky move used by some merchants, and unfortunately, it's not something Privacy can block

Have you found a site that does "block" this? Did you communicate with your credit card company about this? I am wondering

myself248 · on Feb 21, 2024

Use a prepaid card that you bought at a grocery store a few cities away from your hometown with cash while wearing a mask and not bringing any phones with you or driving a car that logs its location or beacons any identifying signals.

I think that might finally allow you to pay for the New York Times on your own terms and not worry about their hounds sniffing you down.

sidewndr46 · on Feb 21, 2024

Having talked to credit card issuers about this, what they told me was to close the account. They said they had no way to ever stop the charges from coming in.

globnomulous · on Feb 21, 2024

In my case, even closing an account wasn't sufficient. A charge posted to a credit card I'd closed more than a year prior, and the card issuer was legally obligated to process the charge because of the renewal contract that apparently I had signed with the merchant. This led a single late payment, which, in turn, caused my credit score to tank by ~90 points just as I was applying for mortgages. I try not to think about what that, and waiting a year, until mortgage rates climbed to nearly 6%, will have cost me if I'm lucky enough to outlast my thirty-year-fixed mortgage.

Edit: and the dark Lord surely reserves a particularly unpleasant circle of hell for loan officers who encourage borrowers to consider a 5-1 variant rate because "we know rates will fall next year."

sidewndr46 · on Feb 21, 2024

This is interesting because it means potentially your estate would be liable for such charges indefinitely after your death.

ihattendorf · on Feb 20, 2024

Doesn't stop them from trying to collect after the transaction is declined. It's not a prepaid service, you're agreeing to pay the charges _after_ you've used the service.

Will they pursue? Do they have enough info to purse? Who knows, but they can if they want to.

lolinder · on Feb 20, 2024

This is very much not what privacy.com is for, and it won't protect you from $14k in BigQuery bills. There is no clause in the GCP contract (or any other contract, for that matter) which says "if your payment method is invalid when we go to collect what you owe us, we forfeit all right to be paid."

For small charges they might just give up because it's not worth it, but when dealing with a $14k bill you should assume that they will at the very least hand the debt off to a collections agency if you try to just ignore it.

gpvos · on Feb 21, 2024

You're still liable to Google/whoever for the full amount, so it is only a temporary reprieve. Which can be useful, but does not solve the main problem.

nothttparchive · on Feb 20, 2024

Yup, I'm already having to pay legal fees - which is why you have a biz lawyer on retainer to start with - but I'm not sure I have any standing.

jeffparsons · on Feb 20, 2024

IANAL, but if this happened to me I would be gathering as many examples as I could of this having happened to other people. The angle being: Google knows this is a huge issue. Effectively, they know that they have (presumably accidentally) created a really dangerous trap for small players, and have chosen to do nothing about it.

In some jurisdictions I think that reduces the legitimacy of their claim that you actually owe them money.

EDIT: Even better, focus on the examples where Google "forgave" the debt; you could argue that those examples prove that Google knows it's at least partly their fault.

nothttparchive · on Feb 21, 2024

The FTC is already investigating: https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/202...

I think we (the developer community) need to start pushing back against this abuse, it's getting out of control.

The thing that bothers me the most is I caught this $14k charge b/c I'm a small fry and that money matters to me. How many big accounts just wouldn't notice that? I can't help but think a very non-trivial % of all cloud revenue is just obscure fees that nobody notices - engineers doing the engineering, accounts receivable pays the bills, and the cloud providers get fat.

Affric · on Feb 20, 2024

I would love to see an example of this working.

I know that if it did work it would change the opportunity cost of forgiving debt in these cases dramatically

jeffparsons · on Feb 21, 2024

I honestly think it would be better if they didn't have the option to "forgive the debt" — at least without following up by eliminating the trap that created said debt.

How often is one of these accidental debts created? How often do customers just pay up because it's small enough that it's not worth fighting? How often does AWS (or Google or whoever) decide whether to forgive the debt based on PR damage control rather than the legitimacy of the debt? Jeez I hope someone leaks those numbers one day.

It reminds me of all those horror stories of hospital visits in the USA, where the first bill you receive is just a test to see if they can squeeze that much out of you, but if you know what you're doing or just can't pay then the actual bill is way lower. It's all just yucky.

If big cloud providers couldn't selectively choose which of these debts to enforce, I bet there would be a media shitstorm and then they would suddenly discover that it's not all thaaaaat hard to implement real time billing and hard caps after all.

ghaff · on Feb 21, 2024

Well, the "trap" is the lack of hard limits which, if implemented, would enable some companies to blow up their businesses. Which arguably is a better outcome than people who can't afford it getting big bills. But it is a tradeoff even aside from the providers arguably collecting some money people didn't intend to give them.

braza · on Feb 20, 2024

To be honest, even the official guide [1] for BG does not have any information about how to make some info about query cost, budget, and service limits mechanisms [2].

I think the HTTP Archive team could set something in that regard.

PS: When I was an instructor for some cloud training in AWS, the first 2 hours were only to set up billing and budgets to avoid any kind of situation like this. No one would start training without all those locks in place in the first place.

[1] - https://github.com/HTTPArchive/httparchive.org/blob/main/doc... [2] - https://cloud.google.com/bigquery/docs/best-practices-costs

nothttparchive · on Feb 20, 2024

Yeah, I'm basically just having to write this off so it sucks for me (a lot - I'm bootstrapping a start up), but I'm more worried about other people (especially students) getting caught up in what feels like a scam given the language on the website not, ya know, mentioning the risk of being charged $14k.

hobofan · on Feb 20, 2024

The getting started guide linked by the website states:

> Note: The size of the tables you query are important because BigQuery is billed based on the number of processed data. There is 1TB of processed data included in the free tier, so running a full scan query on one of the larger tables can easily eat up your quota. This is where it becomes important to design queries that process only the data you wish to explore

Could this be a bigger warning? Sure.

Is something a scam just because they don't explain the general implications of entering your payment information to a usage-billed product? Not really.

olliej · on Feb 21, 2024

There's "scam" in the sense of "it didn't do what they said"/"charged me more than they said", and there's a more colloquial "scam" where the UI is designed to obscure the cost of a task (quintessential dark pattern stuff). I don't think the reporter is saying "they lied about big query", they're saying the UI is set up to make extremely expensive mistakes very easy, and it's set up to hide the actual cost of the query.

Estimating the total cost of a query is obviously fraught, but from the UI and other comments it sounds like BigQuery knows up front how much data a query will require, and there's at least a minimum cost per TB, so the UI could just say "this will cost at least $X" but the UI has a very basic "this will process X PB of data" text. So they're charging by TB but showing the usage in PB which is a) a 1000x smaller number, and b) visually similar to "TB".

It's very hard to see that as anything other than "design to obscure cost" given that there's no reason to not say "this will cost $X" when the cost is per TB, even if they don't the pricing is per TB but they're showing PB, the checkbox and the textual description are smaller that other text on the page, and there's no ability to specify a cost cap.

ghaff · on Feb 20, 2024

I understand the argument against hard circuit-breakers (yeah, seems like a good idea, but had a good traffic spike and I'm down). But it makes even me cautious with respect to scenarios where I could just fat finger something. There are some controls but there are no guarantees in most cases.

nothttparchive · on Feb 20, 2024

This website makes it seem like this “public” dataset is for the community to use, but it is instead a for-profit money maker for Google Cloud and you can lose tens of thousands of dollars.

Last week I ran a script on BigQuery for historical HTTP Archive data and was billed $14,000 by Google Cloud with zero warning whatsoever, and they won’t remove the fee.

This official website should be updated to warn people Google is apparently now hosting this dataset to make money. I don’t think that was the original mission, but that’s what it is today, there’s basically zero customer support, and you can lose $14k in the blink of an eye.

Academics, especially grad students, need to be aware of this before they give a credit card number to Google. In fact, I’d caution against using this dataset whatsoever with this new business model attached.

gnfargbl · on Feb 20, 2024

The real issue here is that you didn't quite understand what BigQuery was when you pressed the button.

What it is, roughly, is a publicly-accessible data supercomputer. If you lost $14k in a blink of the eye, then I would think you consumed at least $4k of Google's actual resources -- maybe $7k. Maybe more. That thing can move some serious data, and you apparently moved around over 2PB.

Google bears some significant responsibility for not making the cost transparent to you, it's true. But on the the other hand, don't they bear some significant credit for making such an awesome power available to a lowly peon with a credit card?

buremba · on Feb 20, 2024

This happens because Google hides the query cost behind its abstracted "TBs scanned" (for their data format, not even open-source so it's hard to estimate in advance) or even worse "slots" mechanism. Only a fraction of people try to understand how much these slots cost and most of them are the people who got an unexpected bill after using BigQuery and became more aware of how the product works.

If GCP would return the query cost in the API and show it directly in the console when you run a query, it would be much easier for their users but unfortunately, it's not Google's interest for obvious reasons.

nothttparchive · on Feb 20, 2024

Exactly, even after seeing the issue I can't make heads or tails of what the hell a "TBs scanned" is relative to row counts, etc. Likewise, it seems to place a lot of assumptions on knowing what tables include - and on a dataset you didn't build yourself how can you know the tables are optimized to lower your costs? Hell, how can you even know what the costs are?

gnfargbl · on Feb 21, 2024

"TBs scanned" is the number of tebibytes of stored data that the system had to scan to serve your query. This is how BQ is billed, in the on-demand model.

The console shows you this number (in very small letters) after you have entered the query but before you press go. In the on-demand billing model, which is what you were using, you can multiply this number by $6.25 to understand your query cost, exactly.

It's a design that's hostile to new customers, I agree. But it is comprehensible.

gpvos · on Feb 21, 2024

There should be a cost estimate displayed prominently by default, and an option to turn it off for power users who know what they're doing (but keep the current less-prominently displayed amount of data estimate).

judge2020 · on Feb 20, 2024

Do you run httparchive, or did you make your username "httparchive" just because it's the subject of your post?

abeyer · on Feb 20, 2024

+1

If the latter... I'm not sure that it's explicitly against the rules, but coopting a name of something as your handle just to complain about it is in poor taste and probably should be.

SushiHippie · on Feb 21, 2024

> The worst part is you posting this to hackernews under the username ‘httparchive’ to make it look like it was the httparchive posting this themselves.

This was the last comment in TFA, so it seems like they just used it because it was the topic...

SushiHippie · on Feb 23, 2024

Dang changed the username to "nothttparchive"

https://news.ycombinator.com/item?id=39451976

MrDarcy · on Feb 20, 2024

Did the cost estimate calculator provide an inaccurate estimate?

https://cloud.google.com/bigquery/docs/best-practices-costs

Estimate query costs

BigQuery provides various methods to estimate cost:

Use the query dry run option to estimate costs before running a query using the on-demand pricing model. Calculate the number of bytes processed by various types of query. Get the monthly cost based on projected usage by using the Google Cloud Pricing Calculator.

cornel_io · on Feb 20, 2024

When I use the BQ interface, it estimates the bytes for each query in real time before I run it, does that turn off if the query is too big? I guess that isn't directly a cost estimate, but if I saw hundreds of TB I'd think twice before hitting "Run"...

mike_d · on Feb 20, 2024

> Google is apparently now hosting this dataset to make money

Public datasets are hosted for free by Google (Amazon has a similar program) to take the burden off public projects.

You didn't pay for the data, you paid for the query you ran against it.

Jgrubb · on Feb 20, 2024

Well sure, but how do you query the data they're hosting for free without using google services?

anon84873628 · on Feb 20, 2024

Well, sure. But it is convenient to have lots of sample data. Also you get the first TiB per month free in BQ.

Also note that anyone can make a dataset available for public use, where they pay the storage and the consumer pays the compute. The official Google datasets are just curated and maintained by Google itself.

neurostimulant · on Feb 21, 2024

If you're going to make a throwaway account to criticize a website, you shouldn't use that website name as your username. That makes you look like a troll even if you have legitimate complaints.

throwaway888abc · on Feb 20, 2024

[flagged]

nothttparchive · on Feb 20, 2024

I used this data when I was a grad student, back when there wasn't a fee for it, so I'm mostly concerned students will get hit with charges that will make it so they can't buy groceries.

The website has the Internet Archive logo on it, and it looks like a public resource for researchers, and it used to be free to use.

The point of this is for the HTTP Archive to make it clear this is a paid product from Google Cloud, not a "public service".

MattGaiser · on Feb 20, 2024

That is pretty clearly documented in the setup instructions.

https://github.com/HTTPArchive/httparchive.org/blob/main/doc...

There are multiple notes about cost. In particular, this one stands out.

> Note: The size of the tables you query are important because BigQuery is billed based on the number of processed data. There is 1TB of processed data included in the free tier, so running a full scan query on one of the larger tables can easily eat up your quota. This is where it becomes important to design queries that process only the data you wish to explore

nothttparchive · on Feb 20, 2024

yeah, but don't "eat up your quota" seems rather tame, whereas "you can get billed $14k with no warning" is the truth.

rvnx · on Feb 20, 2024

It can be confusing, since the httparchive itself is provided for free by AWS S3 (where AWS is the one footing the bill).

johnnyo · on Feb 20, 2024

So, you gave someone your credit card number without understanding how or what they were going to charge you for?

yjftsjthsd-h · on Feb 20, 2024

You have to give them a credit card in order to use the free tier, and they refuse to implement any features that would let you add safeguards (beyond setting an alert so you can find out after you've already spent the money).

Edit: I apologize; they did in fact add something beyond alerts: https://cloud.google.com/billing/docs/how-to/notify#cap_disa... ...which is less them implementing a feature and more telling you how to badly implement it yourself. I don't believe this changes the gist of my comment, but it is worth pointing out in the interest of precision.

Edit 2: Per https://news.ycombinator.com/item?id=39447499 , GCP actually does have a way to cap some resources. It still strikes me as the most "how can we technically claim to be supporting that feature request while still making it as easy as possible to spend more money than you intended to" but there it is.

threeseed · on Feb 20, 2024

Welcome to the cloud.

There are countless companies who specialise in managing cloud costs because of how difficult it is to know when and for what you are going to be charged. Especially for things like data transfer.

And by default they don't have a daily spending limit so it's very easy to see a major cost over-run at the beginning.

catchnear4321 · on Feb 20, 2024

the data is a public service. the platform allowing you to query it is not.

you can print at a public library. each page costs a small amount. the printer in that case is the service. and if you print out millions of pages, you may owe hundreds of thousands of dollars.

slow down a bit, lest you blow off your other foot.

summerlight · on Feb 20, 2024

I frequently see this kind of surprising billing anecdotes across many cloud providers. Why don't they provide a way to set a hard budget limit applied for the entire account. I tried to see what can be done for GCP and this seems pretty daunting.

https://medium.com/@steffenjanbrouwer/how-to-set-a-hard-paym...

onion2k · on Feb 20, 2024

The reasons are probably quite complicated, because some of them are bound by hard technical limits to how quickly a system can react and thus make a hard limit actually a hard limit, but realistically that's largely solvable just by making it a softer hard limit (eg you set a limit of $1000 and the terms say you pay that plus whatever is used before the limit kicks in. More that $1000 but way less than $14000).

All of those technical reasons aside though, the commercial reason is obvious - people's mistakes and overages are a great source of revenue and profit. Companies refund the times where it'd be enough to lose the customer, or when it hits HN, but they make more money every time someone pays up. They have no incentive to fix it. It's part of the business model.

londons_explore · on Feb 20, 2024

There is also the fact that if a company has critical systems go down because GCP hit some hard budget limit, it will be reported in the press as "Netflix down globally due to issue with Google Cloud".

Google doesn't want the bad press. Most real companies would prefer to have a big bill when their product surges in popularity than have unexpected downtime at the worst time.

Turing_Machine · on Feb 21, 2024

> Most real companies would prefer to have a big bill when their product surges in popularity than have unexpected downtime at the worst time.

That doesn't preclude it being an option.

beejiu · on Feb 20, 2024

There are no conceivable "hard technical limits" that make such a system difficult. It's 100% commercial.

londons_explore · on Feb 20, 2024

oh there are - billing systems at scale almost exclusively work on logs. Logs can take minutes or hours to aggregate and transmit to a central place.

Ever notice how your "1GB" data plan sometimes lets you use 5GB if you happen to be roaming in another country and downloading something fast over 5G...? Same reason.

oaiey · on Feb 21, 2024

They are also checking your account .. and as easy as you can lock an account, as easy you can soft lock it via a flag because the billing system says enough. And the few cents in between they swallow easily (as with so many other inaccuracies).

They do not want it, that is the only reason.

thehappypm · on Feb 21, 2024

It’s really not part of the business model.

Sure, this guy fat-fingering $10k sounds amazing.

But GCP deals with businesses paying for years of service. Multi million dollar deals are common.

Google and AWS and the like could give a flying fuck about anything under $100k.

deelowe · on Feb 20, 2024

Because then we'd see articles about how the next start up missed their opportunity whenever their site unexpectedly got discussed on the latest Rogan episode and subsequently was taken offline by the limits being tripped.

ghaff · on Feb 20, 2024

There's no "right" answer. In one case, it's checked the wrong box and got a $14K bill. In the other case, it's I checked the wrong box and my startup missed its one window. There are in-between levels of alerting etc. for both populations but they're probably unsatisfactory for the extreme conditions.

To be clear: I'd be very in favor of the major cloud providers having a "DO NOT! DO NOT! use this for production mode and your content could be deleted at any time if you screw up. But I suspect most people wouldn't use that."

sfn42 · on Feb 20, 2024

I don't see the problem. Don't set a budget limit if you don't want your app to go offline. Lots of people wouldn't mind if their app went offline for a bit. They'd prefer to not suddenly get a $10,000 bill

onion2k · on Feb 20, 2024

Companies could make the limit optional and pass 100% of that downside to the customer. 99.9% of customers would opt in.

thatoneguy · on Feb 20, 2024

Google AppEngine used to have that but — presumably in the interest of additional profit — they removed it. Now I have to make do with an alert that warns me long after I could be hypothetically bankrupted, i.e. in seconds.

nothttparchive · on Feb 20, 2024

I learned of that billing limiting mechanism after the $14k was charged to my account. As designed.

klysm · on Feb 20, 2024

What incentive to cloud providers have to give you that ability? I think they greatly appreciate the ability to accidentally spend a lot of money

rvnx · on Feb 20, 2024

An unhappy customer won't come back.

The OP is probably a good person with strong interest in data science and building projects.

If it'd be "oh here's your $500 charge, upgrade your quota for more, 'ok fair enough, I did a mistake'", but $14k is not ok without explicit quota upgrade.

justinclift · on Feb 21, 2024

> An unhappy customer won't come back.

Unfortunately, if the customer has written their applications in such a way that they're effectively locked to the platform... they won't have much choice until they can dis-entangle themselves.

After that though, yeah. ;)

summerlight · on Feb 20, 2024

To prevent a bad PR like this? When it goes viral, most customer supports escalate it to a higher level then they just eventually cancel the bill.

JCharante · on Feb 20, 2024

What bad PR? This is just a kid who can't read and thinks they can process 2PB of data for absolutely free. No GCP customer will care.

IshKebab · on Feb 20, 2024

Google does not care about bad PR like this. It doesn't affect their biggest customers.

braza · on Feb 20, 2024

tbh, I have worked with AWS for at least 10 years, and recently their field support are quite prone to help avoid those scenarios (e.g. helped to save hundreds of thousands in a single-digit million account).

This was one of the main selling points for all portfolio companies of the group to adopt AWS in their digital transformation projects.

0cf8612b2e1e · on Feb 20, 2024

Limited use for a nobody who wants to run <$100 / year cloud spend and does not have account managers.

I would love to kick the tires on some AWS stuff, but the threat of unlimited ruin is not worth it. Sure, maybe the gods would take pity on me and wipe the debt, but far easier to just run with someone who caps costs. My toy project can gladly go down if the alternative is a huge unexpected bill.

ToucanLoucan · on Feb 20, 2024

My cynical self sees it as how cloud providers aim to make the most money: by making billing oblique and waiting for buzzword-happy project leads to mandate stuff be put on their service without understanding what the end billing will be.

I can't say that's for certain what it is. I just know a hallmark of any business with recurring charges that are otherwise incomprehensible is so they can hit you with the charge after the fact, and you have little recourse to avoid paying it without a ton of work for yourself or your team.

Bjartr · on Feb 20, 2024

Because they aren't sufficiently incentivized to make giving them your money harder.

slyall · on Feb 21, 2024

The problem is that part of your bill may include things that cost money even if nothing is "running".

Would you like Amazon to delete all your files, disks and backups once you hit your limit?

Also Static IPs, Load balancers, DNS zones?

MrDarcy · on Feb 20, 2024

Google does provide a way, project owners can set a custom quota to limit costs.

yjftsjthsd-h · on Feb 20, 2024

So your comment made me go look it up, and if you squint hard that's kind of true...

https://cloud.google.com/billing/docs/how-to/notify#cap_disa...

Notice that their "solution" is to tell you how if you want you can spin up effectively your own custom service to watch spend and if it goes over some threshold delete the entire project[0] after some delay. This is the malicious compliance version of letting you add a limit.

[0] At least, that's how I interpret "This example removes Cloud Billing from your project, shutting down all resources. Resources might not shut down gracefully, and might be irretrievably deleted. There is no graceful recovery if you disable Cloud Billing. You can re-enable Cloud Billing, but there is no guarantee of service recovery and manual configuration is required."

MrDarcy · on Feb 20, 2024

I meant this, linked directly off the big query cost estimation docs:

https://cloud.google.com/bigquery/docs/custom-quotas

> Custom quota is proactive, so you can't run an 11 TB query if you have a 10 TB quota. Creating a custom quota on query data lets you control costs at the project level or at the user level.

yjftsjthsd-h · on Feb 20, 2024

Oh, good catch! Yes, that does look like something that can be coerced into limiting it. Having actually tried to click through, it is very much not as simple as "don't spend more than $X"; the doc points to https://console.cloud.google.com/iam-admin/quotas and you have to find and set the right quota, but yes that can probably help.

beejiu · on Feb 20, 2024

Isn't it the quota limit that you need to set? https://cloud.google.com/bigquery/quotas#query_jobs

twism · on Feb 20, 2024

the setup of the budget limit isn't complcated. the linked article goes thru putting the monitors/alerts on pubsub and etc. which isn't mandatory.

Havoc · on Feb 21, 2024

Here is a gigantic footgun powerful enough to nuke yourself back into the Stone Age.

To learn how to use it you’ll have to try it. Learning by doing. Trial and error.

And no you may not use blanks while learning use of footgun. No training wheels. No precautions. Has to be live - full send unlimited risk.

Actually - here are some safety squints (billing alerts) - just to give you some illusion of control.

Good luck! Yours truly big cloud

justinclift · on Feb 21, 2024

Fits in well with everyone's natural first query (SELECT * FROM everything), so people can see the type of data it's returning in order to narrow it down.

everforward · on Feb 21, 2024

Not specifically because of BigQuery, but I have taken to adding " LIMIT 10" to that for my default query because of accidentally locking up 10TB databases too many times.

justinclift · on Feb 21, 2024

Good thinking. :)

renonce · on Feb 21, 2024

So now you say on your webpage your pricing is $1/TB or something. Great. But there is a caveat: the amount you pay depends on some complex factors such as the size of the table or the duration of your code. If the factors are so simple that no more than grade-level arithmetic is required to calculate my costs then that would be fine. But what if it gets a little bit more complex than that, such as “table size is 1PB and cost per 1TB is $1”? Did you know 1PB=1000TB rather than 0.001TB? What about “you need another $10 query to figure out the size of the table”? Or “the cost depends on number of function calls that your code makes and if you accidentally recursed yourself too many times you can’t limit it”? Or “The server is $5/mon but IP is $1/h and outbound traffic is $10/GB and if someone download something 1TB on your server you will pay $10000 within 2 hours”? At some point the factors related to billing is going to become non-trivial and every sentence in a long 10 page document could have 100x’d your costs, what makes this service different from a scam? You could have allowed me to set a billing cap so I won’t have to pay anything beyond $10, so that “$10” is everything I have to care about, could you?

handoflixue · on Feb 21, 2024

> At some point the factors related to billing is going to become non-trivial

First response on the OG link covers this with a screenshot: the size of the query is previewed beforehand, and you have to check a checkbox to acknowledge it. (I dare say listing it in PB instead of $$$ is still a scummy move, etc. - but they do resolve about half of your concerns right there)

neilv · on Feb 20, 2024

Not the same thing, but: some pre-Web Usenet programs would have warnings before "expensive" operations:

> Version 4.3 patch 30 of rn’s Pnews.SH (September 5, 1986, published to support the new top-level groups) introduced the “thousands of machines” message:

> > This program posts news to thousands of machines throughout the entire civilized world. You message will cost the net hundreds if not thousands of dollars to send everywhere. Please be sure you know what you are doing.

-- https://retrocomputing.stackexchange.com/questions/14763/wha...

mikeortman · on Feb 20, 2024

The dataset IS free to download, but running a query against it on Google Cloudis what costs $$$. BigQuery is basically renting servers to scan through the data, which is the fee

treffer · on Feb 20, 2024

The complaint says there should be a warning that processing fees can be high. Go to the front page and check out the links. Nothing really about cost. Someone follows that path and 14k gone without a word about it. That's the path that people are sent down from the website. It explicitly talks about using BQ for analysis.

A simple "running queries over the whole dataset can cause significant costs due to the size of the dataset" should be enough. And I think that's a valid and fair point.

The whole part of accusing Google should just be ignored.

darth_avocado · on Feb 20, 2024

The setup instructions mention what you’re asking.

https://github.com/HTTPArchive/httparchive.org/blob/main/doc...

treffer · on Feb 20, 2024

I can't even find "cost" on that page. Only one rather tiny side note that you could get past the free tier quota.

I don't think that's a proper warning on costs.

IshKebab · on Feb 20, 2024

> The whole part of accusing Google should just be ignored.

I don't know. Google could trivially solve this problem by imposing an opt-out warning on potentially expensive queries.

"It looks like your query might cost $14k. Are you sure?"

But money.

anon84873628 · on Feb 20, 2024

It probably wasn't a single query costing $14k, but more like 1k costing $14.

threeseed · on Feb 20, 2024

Given how small the dataset is there is no query that justifies a $14k charge.

AWS charges $27/hour for a server with 3TB of memory. Enough to run the queries in memory.

darth_avocado · on Feb 20, 2024

BQ charges you based on the volume of data being scanned. I think this is a situation which involves scanning the whole dataset again and again without fully understanding how it works. I’ve worked with much larger datasets on BQ (petabyte scale) and managed to not spend more than $1000 in an hour. Also, BQ tells you how much data will be processed BEFORE you run the query, which makes it easier to understand the cost implications.

Again, you could fit the whole dataset in memory in an EC2 instance and do your thing.

Symbiote · on Feb 20, 2024

It's easy to make an enormous query by joining to other data (or to the same data), or reading a lot of data.

A regex query on response_bodies would churn through 2.5TB of data every time it's run.

MattGaiser · on Feb 20, 2024

> Last week I ran a script on BigQuery for historical HTTP Archive data and was billed $14,000 by Google Cloud with zero warning whatsoever,

This comment kind of suggests that you do not understand how BigQuery bills. The archive pays for the storage, but you have to pay for the queries. You would also have had to attach a billing account to run those queries. Running BigQuery searches is not free.

Expensive lesson, but on the surface this one appears to be your error.

madsbuch · on Feb 20, 2024

It seems excessive to allow USD 14k spend on a newly created account, or and account with no prior big spend. If I was Google, I would not allow it without explicitly raising limits or increasing quotas. Otherwise there is a big chance there customer can not pay and they just lost that resource – unless you don't really have an expense for that resource and you use predatory pricing.

rvnx · on Feb 20, 2024

It's like predatory telcos who charge you "roaming data fees: $4,500, but took bad you didn't check your online bill before"

https://arstechnica.com/gadgets/2009/04/users-62000-data-bil...

nothttparchive · on Feb 20, 2024

Yes and no, I ran the script before and the fee wasn't that high (they jacked it up last summer). Usually I have to jump through a ton of hoops just to add more CPU cores to my VMs so I "trusted" that GCP would warn me if I ever made an error.

One of the bigger issues is they charged my card before I literally had any notice what the bill was - it wasn't even in the dashboard yet. I would have terminated the script ASAP had I gotten *any* warning.

darth_avocado · on Feb 20, 2024

I am sorry but this seems to be more of a “TLDR; didn’t read;” situation. The http archive clearly mentions that the data is available for offline processing or for querying online on BQ. And in the “Getting started” section of the instructions, it is mentioned multiple times on how BQ will charge you. And even if it wasn’t mentioned anywhere, it’s a little presumptuous to assume a tool for processing data will not charge you money for literally processing TBs of data again and again.

> Note: BigQuery has a free tier that you can use to get started without enabling billing. At the time of this writing, the free tier allows 10GB of storage and 1TB of data processing per month. Google also provides a $300 credit for new accounts.

> Note: The size of the tables you query are important because BigQuery is billed based on the number of processed data. There is 1TB of processed data included in the free tier, so running a full scan query on one of the larger tables can easily eat up your quota. This is where it becomes important to design queries that process only the data you wish to explore

> When we look at the results of this, you can see how much data was processed during this query. Writing efficient queries limits the number of bytes processed - which is helpful since that's how BigQuery is billed. Note: There is 1TB free per month

https://github.com/HTTPArchive/httparchive.org/blob/main/doc...

MyFedora · on Feb 21, 2024

This comment reminds me of unsafe pedestrian crosswalks in car-centric cities.

Sure, a crosswalk may have an extensive system to warn drivers of pedestrians, but that doesn't change the fact a driver hits a pedestrian there at least once a month. It only has to happen once to ruin someone's life.

For cloud providers, the obvious solution is hard budget limits. Ask people to set a hard budget limit before they get the opportunity to drown themselves in debt. Free up some workload off of the support team in the process.

Hard budget limits change the process to avoid these charges almost entirely. Warnings only inform a few people that they're aware the process lands people in debt, and to please use the broken process correctly to avoid the severe financial consequences.

nothttparchive · on Feb 20, 2024

Yes, sure there's stuff I could have done better, and stayed up all night looking at the fine print. But that's not the point - this is *warning* to other people who see the Internet Archive logo, the words "public", and for some dumb reason also trust Google. I'm hoping this doesn't happen to others, I learned a costly lesson.

dabernathy89 · on Feb 20, 2024

I'm on OP's side - even if I knew I'd be paying to run some queries against this dataset, I never would have thought it could reach 5 figures in such a short time. And you can't argue that the billing is straightforward. The "Getting Started" guide for the HTTP Archive doesn't even describe what indexes are available/commonly used for limiting the scanned rows.

_tom_ · on Feb 20, 2024

If google provides a credit limited to $300 for new accounts, then it has the ability to limit spend.

It should make this available.

To be fair: I'm sure they don't provide this limit to make money, because this is a rare case, but to avoid the far more common case of established business going offline because someone forgot to update a limit.

HermitX · on Feb 21, 2024

My view on such matters is that it's best to find a solution that has a fallback option, which usually is an open-source software. That is to say, if you choose a cloud service, it's preferable that it is built on some open-source software. This way, if the costs become uncontrollable in the end, you can still fall back to an open-source software. For instance, CelerData has built its cloud service on StarRocks, and it's said that many users have used it to replace Snowflake and Big Query. Of course, you could also opt for Elasticsearch's cloud service, and if problems arise, you can replace it with OpenSearch.

Jgrubb · on Feb 20, 2024

BQ lesson #1 - don't select * on a gigundous dataset.

Lesson #2 - if you select * on a gigundous dataset, make sure it's on your employers bill.

justinclift · on Feb 21, 2024

Also, only do this on your employers bill if you no longer want to be employed.

If your employer is a small business and you nuke them from orbit by doing this a few times, it's unlikely to go well.

olliej · on Feb 21, 2024

It's interesting that in the post there's a maintainer pointing out that there's a very tiny little checkbox that says "this will process X PB of data". Given that your account knows how much your queries cost per TB it does seem like an "dark UI" design to not just say how much the query will actually cost.

Similarly that checkbox being a tiny part of the UI, and not allowing people to set up cost limits on a query (or not having them at the account level), does seem very much like an "encourage people to overspend" UX. I'm sure "overspend to the level of a $14k bill to an individual" is not intended, but that's a reasonably predictable occasional outcome for this design.

So on the one hand, yes they did click a checkbox saying they were aware of the amount of data being processed, but OTOH the UI seems to be specifically designed to encourage this kind of mistake.

mekoka · on Feb 20, 2024

This is a matter of a user not having read some fine prints, which doesn't mean that they're necessarily at fault. The only way to know which of the user, httparchive.org, or Google BQ is most responsible is to know how often similar situations arise in this specific context (i.e. using BQ by way of httparchive.org).

nothttparchive · on Feb 21, 2024

Update: Google has been helping me out now, thankfully. Hopefully we can make sure this doesn't happen to others.

dang · on Feb 21, 2024

Can you please pick a different username that we can rename your account to? Some users are complaining that your current username is misleading (e.g. here: https://news.ycombinator.com/item?id=39447421)

Edit: since I didn't hear back from you, I've consed a 'not' onto the username 'httparchive'. If you prefer a different name, feel free to contact us at hn@ycombinator.com.

francoismassot · on Feb 20, 2024

BigQuery is just too costly...

Do you know if the dataset is public? We should just offer a cheap alternative and ditch BigQuery.

bearjaws · on Feb 20, 2024

Had this happen (but more like $100) analyzing the GitHub dataset on GCP.

Honestly I was already concerned when it was taking more than 5 minutes to return a result.

Once I saw how slow it was I found my error, not querying the sample dataset that was a fraction of the size, to make sure my filtering worked.

tills13 · on Feb 20, 2024

Sorry but I can't get over the hyperbole in

> This website makes it seem like this “public” dataset is for the community to use, but it is instead a for-profit money maker for Google Cloud and you can lose tens of thousands of dollars.

you didn't understand what you were doing. HA's datasets are public and free. It is not a "for-profit money maker for Google Cloud". Sorry, sucks for you but blaming the restaurant when you bit off more of the steak than you can chew is not how this works.

justinclift · on Feb 21, 2024

Wow. No guard rails whatsoever on queries like this?

Their UI clearly has all the info needed in order to put guard rails in place (aka big scary warning dialog in red), as it's already giving a non-obvious warning about the expected data usage.

Blaming users for this seems like a bastard act. Talk about causing further reputational damage... :( :( :(

falling_myshkin · on Feb 21, 2024

i don't disagree with the premise that Google should be responsible and explicitly acknowledge that the average computer-interested person trying out bigquery has no clue how sharp of a knife it is and they actually do need to be protected from themselves. I was in this boat only a few months ago. One thing I will say though is that I think the documentation is actually quite comprehensive, and personally after taking the time to RTFM and actually understand things like columnar storage, partitioned and clustered tables, etc., I was able to optimize costs quite a bit for our use case and am quite pleased with the product overall. Just takes time to learn, it's a (necessarily imo) intricate machine.

nomilk · on Feb 20, 2024

Could you explain the steps you went through that led to you using BigQuery? The reason I ask is most of us probably use GCP and only ever interact with BigQuery via GCP. But it seems your entry point was a bit different to most (e.g. seems you might have clicked on a link to GCP from HTTP Archive, or perhaps something else?).

FWIW I use BigQuery a lot and as a rough guide I assume about 1c per GB scanned. So if I query a dataset that's 1TB, that's about $10. If the same data were stored on a relational db, the same query would take about a day (or at least a good part of a day). Because BigQuery returns a result so quickly (e.g. <1 minute) it can be easy to miss the insane amount of work it did to get there. So I could see someone accidentally putting that ~1min (but 1TB!) query into a loop or something, and boom, there's your $15k bill. Accidents happen.

Also FWIW, I've found although the big 3 cloud's pricing is tricky (since there are so many services), I find them much better than the PaaS built on top of the big 3 clouds. My suspicion is that the PaaS's have a strong incentive to obscure their pricing because customers can typically see what their costs are (e.g. if they buy some compute from AWS at $0.16/hr and sell it for $1.40/hr, that can be seen as a bit of a rip, hence they try to obscure it). But I think the big 3 are not too bad at this practice. It really bugs me when anyone deliberately obscures their prices, and it's often an indicator of more shady practices to come.

londons_explore · on Feb 20, 2024

Just ask support. In almost all cases, they'll cancel the charges.

justinclift · on Feb 21, 2024

It's already been pointed out that they've been asked, but are refusing to cancel the charges.

mulmen · on Feb 20, 2024

What was the query you ran?

nothttparchive · on Feb 20, 2024

I was doing historical evaluation for a few sites, so I was running a query for each month going back to 2016 for each site. I've done this before with no real issues, and if I knew the charges were rapidly exploding I'd have halted the script immediately - but instead it ran for 2 hours and the first notice I got was the CC charge.

Symbiote · on Feb 20, 2024

My guess is you were querying all the data each time.

If you instead filter out the rows you are interested in (e.g. the particular "few sites" by their URL) and put that in a new table, querying the resulting, tiny table will be very cheap.

eklitzke · on Feb 20, 2024

I haven't looked at the exact schema for this dataset but for this type of query pattern to be efficient the data would need to be partitioned by date.^[1] I'm guessing that it's not partitioned this way and therefore each of these queries that was looking at "one month" of data was doing a full table scan, so if you queried N months you did N table scans even though the exact same query results could have been achieved even without partitioning by doing one table scan with some kind of aggregation (e.g. GROUP BY) clause.

[1]: https://cloud.google.com/bigquery/docs/partitioned-tables

mulmen · on Feb 20, 2024

Can you be more specific? What filtering did you apply? How many columns did you select?

nothttparchive · on Feb 21, 2024

SELECT page, url, payload FROM `{table}` WHERE page like '%{site_domain}/%' AND url like '%[EXAMPLE.COM]%'

mulmen · on Feb 21, 2024

I wouldn’t expect either of those filters to utilize a partition key if one exists. So yeah, you probably did a full table scan every time. Is the partitioning documented somewhere?

nothttparchive · on Feb 21, 2024

Yeah, 'LIKE' ops usually give you a full table scan, which is brutal. If it was my own data I'd chop the fields up and index them properly - which is the issue here, it's not your data, so you don't get a say in the indexes, but you do have to pay per row scanned even if you can't apply an index of your own.

mulmen · on Feb 21, 2024

Seems like an ideal case for pre-processing. You still have to do one full scan but you only have to do one scan.

I’m not familiar with your use case or BigQuery but in Redshift I’d just do a COPY to a local table from S3 then do a CREATE TABLE AS SELECT with some logic to split those URLs for your purpose.

You might even be able to do it all in one step with Spectrum.

gelatocar · on Feb 20, 2024

Have you tried getting in touch with GCP to see if they would refund the charge? I've heard plenty of stories of cloud services refunding large one-off accidental spends like this one.

zmarty · on Feb 20, 2024

"Last week I ran a script on BigQuery for historical HTTP Archive data and was billed $14,000 by Google Cloud with zero warning whatsoever, *and they won’t remove the fee.*"

nathants · on Feb 21, 2024

there is very little reason to use services like big query. they are all insanely over priced.

always use plain ec2 spot and s3.

lots of smaller instances. fewer larger instances. single massive instance. whatever. fancy sql thingy, awk and grep, or whatever else.

do your data processing with ephemeral spot priced compute and persist as little data as possible to s3.

$2-5/hour gets an insane amount of ec2 spot. egress aside, no surprise bill is possible.

empathy for op though. not a fun day. just a bump in the road though, keep on trucking!

adrianmsmith · on Feb 21, 2024

BigQuery is an amazing product and there are good reasons to use it.

One place I worked at had a table with 100 billion rows. And some other tables as well. If a manager asked for an ad-hoc query, it was 5 minutes of writing a SQL query including JOINs (which didn't need to worry about which fields were indexed etc. e.g. you could write WHERE then a regex), and $15 and 5 minutes later I'd have the answer. Apparently 100s of VMs were started and stopped to answer that query, but it all happened automatically, at very low cost.

scotty79 · on Feb 21, 2024

If estimated cost for is so easy to calculate why not put it in the UI?

"Running this query will touch 2PB of data so you'll be charged $20000" for it..

bunbun69 · on Feb 22, 2024

1) not easy to calculate 2) the query needs to be run to see the true cost

scotty79 · on Feb 22, 2024

The person responding to complaint was quick to point out that size of data displayed in the UI relates directly to estimated cost. I see no reason this estimated cost shouldn't be shown in the UI as well.

mrkramer · on Feb 22, 2024

I thought you could cap(limit) your spending e.g. "if I reach $1000 in costs, abort/pause operations".

matrix2596 · on Feb 21, 2024

I would like these APIs to have prepaid options. Then you can control your max budget. Even OPENAI doesnt have that option.

nojvek · on Feb 21, 2024

Does it work if you use a prepaid credit card with the cloud providers that only have a set amount of money?

Megustatrw · on Feb 20, 2024

I'm saying this for a decade now.

There is no hard billing cut, people normally tell me that no one wants that because customers and service disruption but I want that

I want a hard budget cutoff

A VM with max settings doing crypto mining can be very expensive and adding more of them is easy.

And you know what else is super hidden costly? Log digestion for metrics. Egress. Auto scaling .

You can create costs for someone by just downloading their assets _a lot_

KomoD · on Feb 20, 2024

How big of a query do you have to make to get charged $14k...? Isn't it billed by data transfer

Jgrubb · on Feb 20, 2024

No, by amount of data scanned to answer the query. This would be about 2.2PB worth of data.

Partitioning matters y'all!

KomoD · on Feb 20, 2024

Ah, ok.

anon84873628 · on Feb 20, 2024

On-demand pricing in the US charges $6.25 per TiB (and the first TiB per month is free).

So $14k would be about 2,240 TiB.

I wonder what sort of partitioning and clustering is used for the tables.

kshmir · on Feb 20, 2024

Just set quotas by default…

fabian2k · on Feb 20, 2024

The cloud isn't something I'd ever use my private credit card on, there are just too many ways to screw it up if you're not very careful and know what you're doing. I don't think I would have hit this particular issue, but that is mainly because I've read a bunch of stories of this kind and BigQuery is one of the things I associate with "can get very expensive very quickly" based on those.

I know the explanations and justifications for it, but for personal use a service where I can't put a hard limit on usage is simply not acceptable for me. It's just not worth the risk.

blibble · on Feb 20, 2024

if your country lets you set up and maintain an LLC easily this can be a reasonable way to manage the risk

a catastrophic mistake might result in the company going bust and all the pain associated with that

but shouldn't lose you your home (assuming you acted properly, the project using the cloud provider has to be in the aims of the company, etc etc etc)

speak to a lawyer

crysin · on Feb 20, 2024

IANAL but this can be risky in the US still because if you're not careful and demonstrate a clear separation of your business funds and your personal funds it can let those pursuing you for money owed to pierce the veil, thus losing a huge benefit of the LLC.

questionacount · on Feb 20, 2024

Is there a guide or someone I should talk to about how to do this?

I’ve long wondered what I can do with an LLC to protect me from debts like this but I don’t know how to get more information about it. Particularly as I’d be the sole owner I don’t really understand what the llc does/doesn’t do.

If you had just 1000$ (and made a few hundred a year) is it worth doing?

Arnt · on Feb 20, 2024

Maybe, maybe not. It depends on the risk you're trying to contain, not just the routine income.

The short, short version is: You have to have a reason for the LLC that isn't just "contain some risks". Something like "this is a legal entity for my side project bilombinaboloa.com, that I'm hoping will one day become a company and make me Rich" will work, "I pay my expenses via this and take my income directly" will not.

abeyer · on Feb 20, 2024

Read your sibling comment... basically doing this solely for the purpose of trying to avoid debt won't work unless the creditor is just too lazy to pursue it.

blibble · on Feb 20, 2024

the word "solely" is doing a lot of work there

if you're experimenting with side project that you think has potential commercial value then this is why limited liability as a concept was invented

(as always, speak to lawyer)

abeyer · on Feb 20, 2024

Of course. If you have a viable business and the debt is related to that, that's exactly what the corporate veil is for. If you just want to hedge your bets on your personal GCP bill for hobby stuff, not so much.

blibble · on Feb 20, 2024

speak to a lawyer

obviously the advice will cost you but may be the best couple of hundred bucks you ever spent

it's not magic though, you do have to conduct yourself properly as a company (be that non-profit or otherwise)

nothttparchive · on Feb 20, 2024

It wasn't personal use, for business - but I'm bootstrapping a startup, so it's a very tough lesson to learn.

bugbuddy · on Feb 20, 2024

There is a really easy fix to this problem: setting billing limits. This can be done with almost all cloud providers and it takes almost no time. These incidents just show a lack of professionalism on the part of the person incurring the costs. I personally did on the first day I setup a cloud computing account when I was still doing my BS in college. It is not that hard folks. Set the billing limits.

fabian2k · on Feb 20, 2024

The main reason I'd use a personal account for one of the big cloud providers would be to learn stuff. At that point a lack of professionalism is kinda expected, because learning stuff is the whole point.

And my understanding is that almost none of the way of setting limits are actual hard limits, but only alerts and some hacked-together emergency abort scripts. Correct me if I'm wrong, but can you actually limit the cost robustly for services that spend that much money in an hour or so? Doesn't help much if I get an email about it and read it two hours afterwards.

bugbuddy · on Feb 20, 2024

I understand the down votes but I would still say that being aware of the rough estimate costs of each service you are using is an integral part of an engineer’s job. After all, we care a lot about CPU cycles and those are measured in femto dollars.

fabian2k · on Feb 20, 2024

That's not sufficient, you also must not make mistakes.

I have very limited cloud experience, but I did make a mistake that lead to a rather slow but constant cost. The amount was small enough to not be relevant in a professional context, but the memorable part was that I could not pinpoint the source easily with the AWS tools and my limited understanding of them. The categories and labels were too broad, and it took a bit until I figured out what went wrong. There are certainly better tools to investigate this, but I didn't know them. In the end it was simply luck that the mistake still fell into an area of insignificant amounts of money, but it could have easily been significantly more if a few parameters had been different for the same mistake

joepie91_ · on Feb 20, 2024

You can call someone an 'engineer' with the associated responsibilities when they are getting paid for what they are doing like an engineer, in a setting that provides them with the protections of an engineer.

Until that point, they are just an individual who got screwed by disguised billing practices.

yjftsjthsd-h · on Feb 20, 2024

> It is not that hard folks. Set the billing limits.

Excellent idea. Please describe how to create an account on AWS or GCP that is not allowed to spend more than $100/mo. Since it is "a really easy fix" and "takes almost no time" it should be easy to explain, right?

kstrauser · on Feb 20, 2024

https://aws.amazon.com/getting-started/hands-on/control-your... tells you how to set up a budget, with notifications when you're getting close.

That's probably enough for 99% of people, and if you're highly motivated, you could make that trigger an SNS notification that trips a circuit breaker.

yjftsjthsd-h · on Feb 21, 2024

No, that's really not good enough. I don't want to need to be "highly motivated" in order to set a limit, I want to say this thing cannot use more than this many dollars each month, no conditions no exceptions no questions. If I make a fun little side project and it hits the front page of HN, I don't want to quibble about whether I cut it off in time or some hacked together little script turns things off correctly, I want it capped.

nonfamous · on Feb 20, 2024

You can in Azure, easily. New Azure free accounts (which most learners start with) have spending limits enabled by default. https://learn.microsoft.com/azure/cost-management-billing/ma...

ghaff · on Feb 21, 2024

There are limitations to what you can get with spending limit accounts, but Azure has (always?) had more options for people looking for hard billing caps than the other two big providers.

While you can footgun yourself with hard limits I tend to think that learners/hobbyists should, in general, be able to access at least many services with an ironclad guarantee that they can't be billed for over a certain monthly amount or a total number.

I'm much more inclined to shrug if a startup screws themselves over with a hard spending limit than if a student screws themselves over because of a lack of one.

yjftsjthsd-h · on Feb 21, 2024

So honestly if that's true I might have to try Azure, thanks. However, when the claim was "This can be done with almost all cloud providers" I feel comfortable wanting an answer for the other two of the big three.

blibble · on Feb 20, 2024

> This can be done with almost all cloud providers and it takes almost no time.

well, other than the three market leaders (GCP, AWS and Azure)

bugbuddy · on Feb 20, 2024

True these giants make their own lives easier and don’t implement much billing controls into the infrastructure. It is your money so it is your responsibility to protect it. Use billing alerts, hacks, and research things carefully before jumping with both feet.

joepie91_ · on Feb 20, 2024

> It is your money so it is your responsibility to protect it.

This is victim blaming.

dekhn · on Feb 21, 2024

Is victim blaming inherently wrong? What about situations where the victim caused their own victimization?

epanchin · on Feb 20, 2024

To be brutally honest, it’s badly considered queries like yours that mean these services cannot be free.

threeseed · on Feb 20, 2024

This comment is hilariously insane.

The idea that Google would give away BigQuery for free if only people would write better SQL queries.

charcircuit · on Feb 20, 2024

There are plenty of APIs on the internet that are free that query a database for information. If queries are too expensive it's not viable to run for free.

yjftsjthsd-h · on Feb 20, 2024

Or, GCP could implement cost/resource/use limits, which would allow them to give away whatever they wanted for free without any concern about people over using it, while also allowing people to avoid shooting their own feet off.

mulmen · on Feb 20, 2024

I don’t disagree but how does that work exactly? When you hit the quota the query gets cancelled? That’s definitely already a feature of Redshift Spectrum with WLM. Does BigQuery offer something similar?

yjftsjthsd-h · on Feb 20, 2024

My first choice would be something like "this query will cost $13953, which exceeds your default cap of $100; please click the confirm button if you really want to run it". (The dollars could be CPU-minutes or whatever if you want to use resource based limits, which might play nicer with a free tier)

Edit: rereading, I think this is actually for non-interactive scripts, in which case yes it should just cancel the query

Edit 2: https://news.ycombinator.com/item?id=39447499 was kind enough to point out that the resource-based version of this might actually exist, which is nice

anon84873628 · on Feb 20, 2024

https://cloud.google.com/bigquery/docs/best-practices-costs#...

You can set the size limit for individual queries. Plus the custom quotas and everything.

Part of the problem is that the OP wrote a script with a loop. So say you set the limit to 50 GiB per query, but then write a script that runs a 49 GiB query 1000 times...

That type of batch process should be designed much more carefully to consider costs.

justinclift · on Feb 21, 2024

> ... the OP wrote a script with a loop.

Are you sure?

The article doesn't say anything about a loop, and the estimated usage by the Google responder makes it seem like the cost is from a single "SELECT *".

mulmen · on Feb 21, 2024

According to https://news.ycombinator.com/item?id=39447465:

> I was doing historical evaluation for a few sites, so I was running a query for each month going back to 2016 for each site. I've done this before with no real issues, and if I knew the charges were rapidly exploding I'd have halted the script immediately - but instead it ran for 2 hours and the first notice I got was the CC charge.

So looks like a loop of ((6 * 12) + 2) * #sites iterations with a full table scan every time.

justinclift · on Feb 21, 2024

Thanks, that does add further detail after all. :)

nothttparchive · on Feb 20, 2024

I've forgotten more Sql than most people ever learn. Time is also valuable and I make trade-offs. Should I spend hours (eg. $$$) to optimize or run a non-optimized query in the background for a different cost? Well, I didn't think the time/benefit/cost equation favored tuning, if I had known that I'd have spent time on tuning. If you offer something for "free" and then change the cost, and don't have any alerting mechanisms to inefficient queries, it's impossible to evaluate trade offs.

johnnyo · on Feb 20, 2024

Can you post what a $14,000 SQL query looks like?

If nothing else, it can be an example in my SQL 101 course.

anon84873628 · on Feb 20, 2024

It's rarely interesting logic that makes it expensive. Because the per-query charge is not based on compute cycles but the amount of data scanned. This is sufficient:

`SELECT * FROM super_wide_table_with_lots_of_text WHERE NOT filter_on_partitions_or_clusters`

Select * is dangerous because it's a column store. You really need to look at the schema and select only the things you want. And when exploring the data it's important to use sane limits and pull from a single partition.

nothttparchive · on Feb 21, 2024

Here you go!

SELECT page, url, payload FROM `{table}` WHERE page like '%{site_domain}/%' AND url like '%[EXAMPLE.COM]%'

---

There's no LIMIT on it b/c I actually need all the results.

dabernathy89 · on Feb 20, 2024

This would make a great educational blog post

kstrauser · on Feb 20, 2024

> Time is also valuable and I make trade-offs.

I'd say!