Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How Uber tests payments in production (alvaroduran.com)
227 points by ohduran on Aug 7, 2024 | hide | past | favorite | 144 comments


Isn't it what's everybody does in the industry?!

Every single place that I ever worked at in a past 20 years tests payments using real cards and real API endpoints. Yes, refunds cost a few pennies and sometimes can't be automated, but most payment providers simply do not offer testing APIs of a sufficient quality.

Situations when a testing endpoint has one set of bugs not found on production and vice versa used to be so ubiquitous in mid-2000 to mid-2010s, that many teams make a choice agains using testing endpoints altogether - it's too much work to work around bugs unique to the environment that no real customers actually hit. And now the whole generation of developers grew in a world of bad testing APIs of PayPal, Authorize.net, BrainTree, BalancedPayments (remember them?), early Stripe, etc. So, now it became an institutional knowledge: "do not use testing endpoints for payments".

To be exact, people often start using testing endpoints for early stages of development when you don't have any payment code at all, but before the product launch things get switched to production endpoints and from that point on testing endpoints aren't used at all. Even for local development people usually use corporate cards if necessary.

I have a suspicion that things may be different in the US, with many payment providers' testing environments simulate a typical domestic US scenario: credit cards and not debit, no 3d-secure, no strict payment jurisdiction restrictions, etc.


I've worked on adding Google Pay / Apple Pay to the mobile app of a large European ecommerce company, and that's more of less how we went about it.

Start with the sandbox / test environments, once you get reasonable responses end to end, release the thing behind a feature flag. Backend moves slowly, so you add stuff to the mobile app that really belongs to the backend, but f it, because it's the only way you meet the deadline, you will convince someone on the backend later (aka never) that it's backend responsibility. Ask (pressure) the developers into buying some cheap stuff from your shop with their own credit cards, because the company is a behemoth and approving real credit cards for testing would just take ages and you want to release yesterday. It annoys you, but you realize that 5x5 euros is worth getting this done rather than start fighting a losing battle against company processes. Cancelation is possible, but it will take a couple of days. If there's any issue, you debug it across a bunch of teams and/or companies. Things start to work most of the time, time to release the stuff to x percent of your users. Check analytics and error logs frequently. Some production users got their payments through, increase rollout percentage. You discover more and more undocumented error codes, you improve the error messages to the users so that they don't retry 10 times with a card without sufficient funds. After a couple weeks, things start to stabilize, you move on to a new feature...

The test environments were so complicated and had so many caveats that whenever I had to do something, I had to re-read the docs and our notes to know all the "traps" we already discovered. For the people who didn't work on this payment feature from start to finish, testing in the officially recommended test channels were hopeless.


>Ask (pressure) the developers into buying some cheap stuff from your shop with their own credit cards

This is illegal. I've always refused such "requests" and asked for a company expense card.


If the employee can get it refunded with an expense report, like any other work expense then in my experience (in California), it’s not illegal. I’ve made plenty of work expenditures with my personal CC that I get reimbursed from the company with an expense report. But *pressuring* employees to do it is plain wrong (and may be illegal).


Coming back here, not sure if anyone still reads this thread a day later...

I am not sure about the legality of it here in Germany, but I'm not really sure I could even prove it.

Pressuring us into buying these items are hard to prove, nobody said we need to do anything. You want to be someone who makes sure the feature can launch on time and works correctly, or you want to complain (rightfully) that using your own cards should not be necessary for testing a feature?

It was, though, implied 1. we need to make sure the product works and shipped on time, 2. you can't do it without using your own cards.

I know people on the team who simply didn't test, but as it was a feature I was mainly responsible for and genuinely interested in, I wanted the launch to be successful.

We also eventually got the money back (most of the money? didn't check them all).

In the end, it was in total about 25 euros, and that's not a sum that I would sue my employer over, especially as I was "happy enough" at the company.


It is wrong and definitely illegal in California:

>Here in the state of California, labor laws define that an employer cannot require a team member to take on expenses that are an integral part of the job.

https://www.asmlawyers.com/what-your-california-employer-can...


Two comments. (From a non-CA legal jurisdiction context.)

You can always ask (but not pressure or require). Make sure you make it clear there's no downside to them refusing.

Another option I've used is to hand cash to co workers and ask them to spend it on their credit card for testing. I've rarely had anyone refuse that. (A few very junior staff members who were maybe right on the credit limit on their cards I suspect.)


There are always downsides:

your personal card can get fraud-flagged, which is a huge pain to fix.

It can also get banned by Stripe/Braintree/etc, which will really mess things up until your bank issues you a new card number.

Never use a personal card for testing, maybe with the exception of being the sole proprietor of the business or if it's a hobby project.


Illegal in what jurisdiction?


California is an example, but I’m unclear about other states. Honestly, I’m unclear across the board because there are a lot of employee made purchases that are conditions of employment (phone, computer), and it could be argued that this purchase is a similar necessity especially if it’ll be refunded.


Any job that asks you to buy your own equipment, especially a computer, is a scam (unless you are a freelancer, in which case you should already have equipment)


[flagged]


I don't think so.

It would be illegal not to reimburse the expense in, say, California; https://casetext.com/statute/california-codes/california-lab...


Im in China, fun fact, it is also illegal ! Now is it enforced, probably only a little bit more than in California !


Why not change the backend yourselve? Don’t you have access to the repo?


>Isn't it what's everybody does in the industry?!

We built against Stripe's sandbox and never had to test in production, so I never had to use a real credit card. It may have happened when going live for the first time, but that would have been two/three charges with one time payments and recurring payments (hardly what you'd call robust testing). Issues observed in production can usually be reproduced in the sandbox. There's some other caveats between the environments though that I'm forgetting but I don't think we ran into those.

We also had an ACH payment provider (add your bank account, verify it, we deduct from it, etc.) that also had a sandbox and had no issues there either.


> We built against Stripe's sandbox and never had to test in production, so I never had to use a real credit card.

Did it, got bit in the ass when some workflow was disabled in production and not in their sandbox. I don't recall the exact thing but always fun to push into production all sure of yourself to have to rollback fast and ponder why you get some fun message in your logs. At least the error message was clear about what we were doing being available only on the sandbox.

It was some years ago so I don't remember if it was in Stripe Connect or during the mandatory 2FA rollout.


> got bit in the ass when some workflow was disabled in production and not in their sandbox

What Stripe workflow was this? Or was it specific to your codebase?


Stripe is the best I've used, but it has a ton of issues to this day:

    A) Account settings are specified separately in prod / staging
    B) Differences between those environments are not automatically reported in a useful way
    C) Only one staging env per customer. Want to check what a new setting will do? Every developer is getting that setting turned on.


> C) Only one staging env per customer. Want to check what a new setting will do? Every developer is getting that setting turned on.

Stripe Sandboxes[1] aim to solve this problem!

(Disclaimer: I work for Stripe but not on this feature)

[1] https://docs.stripe.com/sandboxes


"up to six" is definitely an improvement, but still a long way from "ephemeral test environments on demand".


We also built our Stripe integration using only the testing tools they provide. The first real payment in production was completed by our product guy, but only as a sanity check. Maybe we got lucky, or maybe Stripe is just that easy to integrate with, but we haven't been surprised by any production behavior.


A "sanity check" is a test. Alway always always test that real prod payments end up in the bank account they're supposed to before whoever can rollback goes home for the day. _Always._ And document the existence/process/details/outcome of that "sanity check" test.

DAMHIK.


Not documenting the existence/process/details/outcome is exactly the distinction that I intended to make when I called it a "sanity check". I was really just offering a data point, not a comprehensive guide to best practice.


Slightly off-topic, but has anybody else seen an issue with stripe's live checkout flow when you collect a phone number? People paying with Apple Pay will frequently have the first 2 or 3 digits doubled! So like 313105551212 will come through. Didn't seem to happen in the test environment.

(Oh also sometimes it defaults users (in the U.S.) to Anguilla as their country code (also +1) but then gives users an error their phone number is invalid.)


Same at my current and previous company - the Braintree and Stripe sandboxes worked fine for us, and post-release it was just a case of monitoring both integrations (and doing a test sale on prod if it was a quiet period).


> Situations when a testing endpoint has one set of bugs not found on production and vice versa used to be so ubiquitous in mid-2000 to mid-2010s

Honestly that's still the case, at least with Adyen. At $pastJob we had a pretty robust regression suite that'd regularly run into breaking changes in their test environment (against _old_ versioned api endpoints!). We seriously questioned whether or not anyone else used them since they were never aware until we submitted tickets. This also falls apart as soon as you use any of their "special" features that require an account rep to enable, they just don't work in the sandbox.

Another pain point is that the test payment methods offered are static. You can't setup cards for specific scenarios, e.g. tokenize, successful payment, then it expires -- you can only test an expired card as a one-off.


I just ran a couple hours ago into such a bug with Adyen, a couple payment methods have completely different behaviours in test vs production, and we ended up having to test in prod anyway.


Stripe considers using a real card to test a zero-tolerance fireable offence.

I, as the name on record for an organization with Stripe, made an actual, legitimate payment to said organization (and did not refund it). Stripe's automated system caught the payment and terminated my account automatically by the time I woke up the next morning; fortunately I was able to reach a real human and explain that in addition to working for the non-profit, I was also donating to it, and they begrudgingly restored the account.


My company has hundreds/thousands of ecommerce clients, of which a huge portion use Stripe and we do real card testing every time we deploy code that could affect payments and its never been an issue.


It’s pretty clearly stipulated in their TOS, for what it’s worth.


I don't see anything referring to testing in https://stripe.com/en-ca/legal/ssa Other than that you are provided test keys and live keys.


Stripe will take pretty much any excuse to terminate an account and pocket the money. At this point it might as well be part of their business model.


But why? Why testing with real payment methods is a bad idea for stripe?


> but most payment providers simply do not offer testing APIs of a sufficient quality.

Moreover, sometimes they vehemently oppose testing via real payments and reserve the right to cancel the contract should this happen.

To this day I have a distaste for working with payments.


How do those places ever pass a PCI audit? One of the first things the auditor asks is "please show me proof that your testing is never done with real credit cards"

(Unless they're getting their test environments PCI certified, which sounds like a waste of money.)


>Every single place that I ever worked at in a past 20 years tests payments using real cards and real API endpoints. Yes, refunds cost a few pennies and sometimes can't be automated, but most payment providers simply do not offer testing APIs of a sufficient quality.

I think this means their own real-money credit cards, that they spend a few bucks on for testing purposes. Not customer credit-card data.


Most companies are never PCI audited because they're using a provider who already has been (like Stripe)


At volume you can no longer self-certify to be PCI SAQ-D. There are limits based on transaction count or volume.


Makes sense. What are those volume limits like? I suspect the lion's share of companies like Stripe are those under those limits (since the largest companies are willing to trade simplicity for better rates)

edit: according to a quick search, it looks like it's 6M transactions/year to require an audit vs self-assessment


Don’t do it in staging/test environment. As a sibling commented stated: smoke test in production with corporate cards.


My last company, Rainforest QA, developed issuing virtual credit-card numbers for just this purpose - kinda like the ones used for privacy.com. Customers use them for testing in prod. Simple, effective - and testing the exact same flow as customers.

Before this, we found a lot of teams using either their own corp cards, or pre-paid visa type things. All, a pain to manage balance wise.

Seemingly, the biggest problem left with doing this is production metrics; these transactions in prod tend to affect the main metrics - either your own, or payment related things.

I can't find anything current, but - https://www.businesswire.com/news/home/20180417005414/en/Rai... covers it.


Is doing a smoke test in prod with corporate cards bad practice?

We are rolling out subscriptions with Stripe and an internal business unit is will actually be using the service so they put it on a company card. Basically they're our first live customer to test all the prod systems. No refunds or anything.


No, it is not bad practice. Only developers who don't care about money actually coming through the door -- the same ones that get caught up in a local maxima trying to perfect the imperfect -- say that it's bad and that you should not do it.


Regardless of what developers think, the payment providers generally forbid it. For example, Stripe says:

> Don’t use real card details. The Stripe Services Agreement prohibits testing in live mode using real payment method details. Use your test API keys and the card numbers below.

https://docs.stripe.com/testing


I have to imagine they’d only care if you were running significant volumes of test transactions and refunding thems, like if you were using live credentials in a dev environment.

Either way I’d be hard pressed to deploy significant changes to payment-related code in production without at the very least seeing a real $1 charge go through and everything work as expected. The risk of a ToS enforcement for this seems much lower than the risk of some bad logic in an if (env == ‘prd’) making customers unable to give you money.


This is my read on it as well, but I'd really like an official clarification.

Rare production smoke tests are in a gray area. They may be technical violations but they're allowed to happen as long as they stay infrequent and above-board (company card, small amount, no chargeback, etc.).


I think you're misunderstanding here; people are talking about a smoke "test" using a real credit card against the real production payment system, using production API keys/authentication/etc, with real money moving around. No payment provider forbids that.


I can't find any clarification on this by a search alone. When I went looking for it in the actual services agreement, I couldn't even find any clause about testing at all.


That's precisely what they all forbid?


They forbid using a valid credit card to make a purchase on a production system?


Among other things, you're not allowed to use your own credit card to make a purchase where the money will come back to you, because credit cards want to charge cash advance rates for that.


But it's not a purchase. You're not exchanging real goods or services for that money (unless your smoke tests run a lot deeper than mine). Your motives may be benign, but from a legal/regulatory perspective, it's a suspicious transaction.


Yes. By the letter of the agreement, you are not to use your cards to do test purchases against your account.

You ocassionally see complaints about payment processors when microbusinesses do this and get banned. So it is something that does get checked ocassionally. (There's a top level comment about this)

I think the payment processor doesn't want you to do it because you may issue many transactions and then refund them which incurs cost, or you may be using it for manufactured spend which incurs issuer ire. Maybe it's a brown M&M thing; if you didn't read that part of the agreement, you didn't read anything else, and they may as well kick you out early and avoid hassle.


Generally speaking, no one is getting banned from Stripe for the occasional transaction tested in production, come on. If this was the case, virtually every company I've ever worked at would be banned from Stripe. It's reasonable to confirm your system actually works once deployed outside of test environments.

No disagreement from me that is what the letter of the Stripe service agreement says, but what happens in reality is clearly different. I take that rule as trying to encourage people to use the very good test environments Stripe offer, or to limit scale of test transactions in production, rather than trying to shutdown a paying user (the company) for trying a legitimate transaction in prod with a legitimate card. I have no idea why you would want to risk the first ever transaction in prod being performed by a real customer - why leave it to chance that it is not setup correctly?

I have also been on calls with Stripe support staff where we tried a card transaction in production for testing purposes, FWIW.


Also very interested. I wonder if we could get a comment from a Stripe person or a recently ex (like @patio11 ) to clarify what’s allowed and what’s just ignored.


It's not allowed, to make sure that all customers are always in violation of the agreement :)


Their lawyers are going to tell them to go ahead and speak plainly here.

Yeah no.


This is also true for cryptocurrency. There's always a testnet, you use it whenever possible cause xact fees are high, but it never works the same as real.


I built software for the self-service automats of a very large European rail company, and I remember spending whole afternoons testing every possible scenario listed in our testing book.

Those required manually purchasing tickets to many different cities, using the credit card terminal on the machine. They even had fake Discover, VISA, Mastercards you were to use to make those test purchases (to this day I don't know who came up with those fake cards and how that part of the system worked).

But no, not everybody tests payment in production. It was a while ago, but I don't think they'd have switched the entire philosophy by that much since then.


PA DSS/PCI DSS standards say you aren’t supposed to test with real cards. It’s a certification criteria. I had to make my own

PCI DSS (Payment Card Industry Data Security Standard) Requirement 6.4.3 of the PCI DSS states:

"Production data (live PANs) are not used for testing or development." This requirement is aimed at ensuring that real cardholder data (Primary Account Numbers or PANs) is not used in non-production environments, such as testing or development environments, to minimize the risk of exposure and unauthorized access.

PA DSS (Payment Application Data Security Standard) Requirement 6.3.4 of the PA DSS states:

"Production data (real PANs, track data, or other real cardholder data) is not used for testing or development."


From any engineering post I have seen about Uber it seems like some deceptive marketing / hiring tactic when it is analyzed . They seem to repurpose standard practices into a blog post (this one might sounds like it has been derived from one).


Absolutely not. Production is still an impenetrable fortress at a lot of places, or at least it's perceived to be.


I seem to remember about 4111 1111 1111 1111 times that I tested a payment system with a card that wasn’t real, although I acknowledge that when I was done convincing myself that I was a good programmer, I would almost always be disavowed of this notion after using a real number.


Testing against the test environment works well for us. Even for terminal in person payments. Even makes it much easier to simulate edge cases.


Standard practice is to use testing api for development and the real api for verification.


Doing penny testing yourself is different from letting a chunk of your user base test it


"Isn't it what's everybody does in the industry?!"

Everybody, some do it manually, some let their QA people use their private credit cards - or so I've heard.


Eh, no? I've never tested payment code using real payments. Ever. The idea of doing it with real payments is pretty out there in my book even :)

Then again, every payment provider/bank I've integrated with, had decent testing end-points and we often even support them in production. i.e, you can select a staging/testing env of you provider to test order flow or whatever.


Then you just had the customer test it with a real payment.

That's pretty out there in my book.


Amen. If you don’t test it, you will test it with your customer and will eventually fail.


Nooope! Services like stripe let you test the payment workflow in their test environment which works exactly like the production one but without the payments going outside Stripe.

Let us not normalise bad practices


I see several comments calling this piece "fluffy" without much real insight - I have to respectfully disagree - I'm 48 and wrote my first code at 8, still write code for my self at 48, have managed teams, held all manner of roles and done some startups. This article is solid gold.

I'm surprised people think this article doesn't have much important to say. I suspect their code probably crashes a lot in production, and will still kill many startups or otherwise end up destroying significant amounts of shareholder value.

They think the article is banal and obvious. They will not really take the key insights to heart and truly live it.

Crowdstrike is the perfect example of this!!!

And for every crowdstrike there are tons of startups that don't make the news but ends up burning their early adopter users through inability to deal with bugs properly, delay their own success unnecessarily or even turns what would have been massive business successes into technical morasses. Imagine failing to capture your businesses full potential because of a bad approach to software defects!


You don’t really get at what you think the substance of this piece is. It’d be helpful if you pointed that out instead of just going on about how phenomenal it is.


It’s a parody of the writing style of the article itself, all excitement and noise, saying little to nothing.


I think he was being sarcastic, but can't tell exactly.


To me that's the mark of a high quality sarcastic reply.


Haha, that’s masterful. I had no idea but reading it again now it feels so obvious. :D


This article can be rewritten into one line:

“Not all bugs can be found until you deploy to production. So deploying to production can be called ‘testing in production’”


The (somewhat obvious) parts about staged rollouts and selection criteria for initial deployments are useful. If CrowdStrike had rolled to a small demographic first. Billions of dollars could have been spared the shredder.


Sometimes it's not so simple. If your prod is already broken, a slow rollout becomes a liability. CrowdStrike didn't have any real reason for a global push, but if it were to patch a 0-day already being exploited, customers might rather risk downtime than breaches.


Eventually you’re going to make a change that completes 100% of the production rollout. That change should be tested too, as any change is a new opportunity to break something.


I bet staged rollout are on some poor PO's backlog. We don't have the bandwidth for such niceties! :')


Everyone has a test environment, the lucky ones have a separate production environment.


I thought the colour and anecdotes were useful towards conveying the message. Sometimes only after you've experienced something for yourself does the reduced pithy one liner make sense and resonate.


From my experience building medium scale ecommerce systems, along with innumerate payment integrations of various flavours, this isn’t unreasonable, for a few reasons.

Firstly, payment service providers honestly suck at providing a coherent staging environment. Either it’ll be out of date, or ahead of production, or full of garbage data that you can’t clear that breaks their outputs, or just plain not representative of the production environment. You’ll have stuff check out perfectly in staging only to be a hot mess on their live environment.

Secondly, if you’re doing this stuff at scale, it’s not as simple as “make an API call and get a result” - you’ve got your egress and ingress to worry about, at different levels (NAT, load balancing, packet routing, http(s) proxies), and there’s a host of stuff that can go wrong for subtle reasons.

We used to (for they are now just a shopify shop since my departure a decade ago) do exactly as is described - test in staging as much as it is useful, and then go live with an immediate test built into the deployment toolchain, with automatic rollback in case of failure for any reason.

It worked. The only payment issues we ever had after having the realisation that testing on staging was damned near meaningless, were on the side of the payment gateway.


> test in staging as much as it is useful, and then go live with an immediate test built into the deployment toolchain, with automatic rollback in case of failure for any reason

I'm curious about the logistics of automatically testing payments after a deployment?

Does your automated test place an order with a valid, working credit card? Does your test include going through 3D Secure too? Do you then automatically cancel the order? How do you make sure that whole unusual process doesn't get blocked by unusual activity fraud detection? Whose credit card is it? Have broken tests ever lead to the test order getting fulfilled?


> Does your automated test place an order with a valid, working credit card?

Yep. Organisation owned card for the purpose, details used by selenium for the test. Details stored securely, I might add, as PCI/DSS and ISO27k1 were important to us.

> Does your test include going through 3D Secure too?

Yeah. Same bank card always being used meant we could automate the flow.

> Do you then automatically cancel the order?

It went through the whole despatch process, including label production with couriers etc., and was then cancelled as that tests everything including CANCEL/VOID and the whole critical flow.

> How do you make sure that whole unusual process doesn't get blocked by unusual activity fraud detection?

By using the same card over and over, placing an order for a normal item, talking to our bank when it did occasionally get flagged.

> Whose credit card is it?

The businesses.

> Have broken tests ever lead to the test order getting fulfilled?

Yup. We had a few that appeared on our doorstep, both due to our error, and client error.


People are calling this article fluffy but I agree with you that if people haven't worked with a lot of payment gateways before then it's good advice never to fully trust the test environments. Many years ago I had a big launch turn to chaos when test worked perfectly but the payment gateway's validation on production turned out to be different and declined every payment without returning a meaningful error.

A lot of work has gone into getting a customer to make a purchase, so it's the worst time to fail. Nothing beats testing on production with a real card.


Extremely fluffy piece. 20% in and not one valuable piece of information


Whenever I see an article with more than a few sentences that seem to be arbitrarily bolded, I know it isn't worth reading. Haven't had a failed case so far.


How would you know if you did?


you read HN comments and confirm that you did well to skip the rtfa part


I'm beginning to think that Substack the new Medium, and this cannot and will never be solved.

It would be better and respectful of the readers time to get to the point of the article rather than stuff the article with more words wasting the readers time.

When I come across articles which are needlessly long, I either skip them or I use a summarizer and leave the page.

There will always be clickbait elaborate content like this, (clickbait title, actual answer at the end of the article 90% of the time) but it just trains the reader to just scroll to the end of the article for the answer most of the time, achieving the opposite of what the article writer wants.


Stopped reading after this

> The reason I know this is because I’ve built and maintained systems that handle close to 100,000 payments a day.

That's 1.16 payments per second.


1 payment per second. 1 payment per uber ride. 10 dollars per ride. 864k per day. 365M per year. It's not a small system and could be some mixture of one market at Uber or some %age of rides (eg one payment provider)

(it could be 2 payments per ride or drivers get batched payouts but w/e).

There's obviously bigger payment platforms (eg Stripe or GPay / Apple Pay or Amazon) but not all of us work in payments either


The author works at Kiwi.com, and says there are '70,000 times a day a customer clicks the Pay button on our platform' which is a little under 1 QPS.

Uber, by comparison:

'Trips during the quarter grew 21% YoY to 2.8 billion, or approximately 30 million trips per day on average.'

That would be about 350 payments per second if load was evenly distributed.


> Extremely fluffy piece. 20% in and not one valuable piece of information

What? Your mind wasn't totally blown by the advice "Instead, the lesson should be this: to test your payment systems in sandbox for an amount of time that’s reasonable. And not a second more."? /s


We worked with a payment processor to implement billing for our services via credit card. According to the payment processor, the QA environment for one of the major credit cards had been broken for a while, so we tested in production.

We were testing billing customers who were going to pay us, so putting a small charge on a corporate card that was going to come back to us wasn't a big deal, I just remember being slightly surprised that testing something like credit card payments was done against the production environment.


Yes, this article is probably longer and fluffier than it needs to be but there are some real truths here.

Payments are one of the original service orientated architecture systems, in production your payment is processed by at least three or four parties each of which will call several systems or sub-systems to process a payment.

This method clearly works for Uber, who have a lot of payments going through their systems most of which are of a relatively small value. Dropping a payment and either asking the user to pay via a different option or simply writing off the revenue for a handful of transactions is probably workable for them.

I have the opposite, the number of transactions we process is relatively low, but the average value of these transactions is high, well in excess of 1000 USD. This leads to the following issue:

1. Screwing up a payment and asking the user to try again can be a big hit to user confidence. 2. We can't write off even a single payment/transaction, they're too high value to write-off. 3. Processing fees and refunds for making test transactions in production are too expensive. If a test costs more than $10 (to test in production we must test with production transaction values) that's going to rack up quickly.


Uber had, and probably still has, a sophisticated setup for directing prod traffic for specific requests to/from developer laptops, for isolating test tenancies in prod services, for simulating trips using test tenancies, for automatically detecting and rolling back deployments based on everything from the usual observability metrics to black box testing against prod, and last but not least, good unit test coverage.

I bet their payments team runs code before it gets deployed. The article seems to imply that Uber engineers don't bother to test code before they land it, when in reality they do test it, and they also catch other stuff afterwards too.


> software is not like other machines. Most machines, in time, rot and decay. But software is just information: if it’s correct, it stays that way. Hardware does need replacement, but the correct software that runs on it keeps running.

Unless you have some empowered person or group in your organization, levels above your team, that is allowed to constantly move the goalposts because of “cybersecurity!!1” and even the most mundane internal-only systems have the be kept to the latest versions of everything ever just so their scanning software shows “green”. Probably because their own OKRs are based on how many green circles they keep or something.

They’re cyberaccountants.


> First, you have to copy all production data. It’s expensive, and a reckless breach in privacy and security, but it’s doable.

So, what does "doable" mean in this context? We unnecessarily increased the attack surface for production data and until today haven't suffered a data breach because of it?

A staging env with actual prod data now needs be treated as a production environment. A system is only as secure as its weakest link, so an attacker will have an easier time getting into that "staging" environment where things are tested out, no?



I tried explaining to people that you’re dealing with systems that are so antiquated places accept Diner’s Club cards. Accepting a credit card at all was a big deal because you literally copied a number and hoped it worked. People have cards that don’t have email associated with them. Furthermore there a ton of settling nuances. It’d be like building a browser if you were an alien who was given RFC specs.

I’ve worked with giant companies working directly with providers. Testing legalese and reality are far apart. In no scenario would we have the customer “test” a major new feature rollout. We’d have a budget and someone would make a real purchase then donate to charity the good or usually it was office candy for a month. I doubt the budget was even touched. We likely had provisions the prevented a $10k charge on a $15 product, that never happened. The only issue was that it’d skip normal QA (India has weird rules), and usually actually be a frivolous purchase or purchases on corporate and private cards.


i've built tons of very intricate payments systems over the past 10 years and i honestly have no idea how "payments engineer" is even worthy of a distinct job title. it's a thing people do in the course of building products. ridiculous


A couple large corporations I worked for had two instances of prod, geographically isolated with one acting as a fallback in case the primary went down. This isn't particularly novel at all, but what I was always interested in was using a similar setup for testing production prior to flipping the release live.

Effectively you'd just have prod and staging with identical deployment configuration. The benefit would be promoting the exact staging release to prod as soon as tests pass.

That said, I've never tried this and I'm sure there are good arguments for avoiding the added complexity of regularly flipping production between two different environments.


This sounds like canarying, which is fine


Are canary releases handled this way? I always thought they were effectively a public staging, with the next prod release generally meaning a rebuild from the same codebase as the canary release rather than a full switch over from one prod environment to the next.

Edit: its worth noting that I'm specifically thinking about software along the lines of a hosted service or web application where you could swap it out on hardware you own. Native apps, like the actual web browser, wouldn't fit this model since the binaries live on the client.


Usually I've seen it as, you have some system with replicated jobs, and you update the code or config for a few of the jobs and wait before doing the rest. It is sort of a public staging. You can also canary native binaries pushed to clients, which is a different mechanism but the same idea, you're 99...% sure the change is safe but still better not to globally release it.

I guess your case isn't the same idea since you're not testing with real usage, it's your own staging tests. But I don't see anything wrong with that.


Just an idea, can't you just swap staging and production? So, actually the system you've tested goes live by switching nothing more than a pointer (no deployment involved).

Won't raising support cost at some point suggest it's cheaper having two swappable live systems than the alternative?


I agree with others calling this fluffy. I bounced after this:

> to test your payment systems in sandbox for an amount of time that’s reasonable. And not a second more.

For an amount of time that is reasonable? and not a second more? what is this dribble?


What do people with smaller companies do to test with real cards? The terms of credit cards usually disallow using your own card to make a purchase from yourself.


Using a personal card to purchase a product from a company you either work for or own isn't disallowed.


They do it anyway


> I really like how Charity Majors put it: “staging is just a glorified laptop”. Only production is production.

Production is also just a glorified laptop.


Everybody has a testing environment. Some people are lucky enough enough to have a totally separate environment to run production in.


I expected a more deep and detailed article, but my hopes were trashed just after the first, poor, introductory section.


To be honest, errors in payment processing are hard to create and reproduce in test. Plus there are errors that apparently never occur anywhere except in production.

So yeah, "testing" in production is normal for all payment systems.


Pure graphomania.

Look, ma, I'm a blogger! Wait, no scratch that - I'm a WRITER!


cc: 4242424242424242

cvc: 424

exp: 2/4/24

Fond memories of speed-running the checkout flow in Stripe sandbox.


Payment systems are the blogs of the early 2000s.


I couldn't bother myself to read the whole article. Got GPT-4 to summarize the main points. Not as much insight as I thought I would get going in.

1. *Testing in Staging vs. Production*: - Most engineers prefer testing in staging due to a sense of control. - There's a misconception that it's an either/or situation between staging and production testing. In reality, both are necessary.

2. *Importance of Production Testing*: - Staging environments can’t replicate all possible real-world scenarios. - Production testing is essential to identify complex, real-world issues missed in staging.

3. *Uber's Approach to Testing*: - Uber tests its payment systems in production. - They have developed tools (Cerberus and Deputy) to facilitate transparent interaction with real systems and gather responses effectively.

4. *Every Deployment as an Experiment*: - Every deployment is treated as a hypothesis to be validated against business metrics. - Metrics and monitoring are crucial to determine the success of deployment.

5. *First Rollout Region*: - Uber chooses a specific first rollout region to minimize risk and impact. - Initial rollouts are conducted in regions that are small but significant for practical monitoring.

6. *Canary Deployments*: - Uber conducts canary deployments to a subset of users to detect and mitigate potential issues early. - This approach helps in identifying and fixing issues with minimal impact.

7. *Examples of Issues Discovered Early*: - Uber detected significant issues with GooglePay during its cautious rollout in Portugal, which would have been difficult to identify in a staging environment alone.

8. *Philosophy on Software Quality*: - True robustness and resiliency come from real-world usage and the continuous fixing of encountered issues. - Only production can provide the real stakes and conditions needed for thorough validation.

9. *Author and Newsletter*: - Alvaro Duran, author of “The Payments Engineer Playbook”, emphasizes the importance of sharing and learning from real-world experiences in payments systems. - Encourages readers to engage with the content and share it with colleagues for broader impact.


Articles like these need TL;DR Testing in prod is a tale as old as time.

It would have been more insightful to cover the underlying infra/tech that enables this seamlessly.


> For Uber, every deployment is an experiment

Blindly experimenting without a clear hypothesis is a great way to ship statistical noise.


my hot take is to test in every environment... what a concept. the even deeper hot take here is to reimplement mocks of your integrated environments AND THEN IMPLEMENT THEIR SYSTEMS! the process of good testing has a side effect of eventually eliminating technical debt, because those same set of tests that ensure your application is working can test if your reimplementation of your upstream integration is working! ta da you are now a growth company.


Reminder that if you test a live payment on a new Stripe deployment, you will get INSTANTLY banned. Don't do a live test with a credit card in your name !!


It seems entirely natural to do this. What should you do instead?


With stripe, the testing environment is sufficiently powerful that you don't need to test in production. With the test environment, you should have enough confidence that the integration will work. If you feel the need to do a payment after going live, ask a friend to do it, not someone from your household.


HTF does stripe know you from Adam?

Or do they just ban the card used to make the first payment on every integration (while ringing a bell and high fiving each other)?


Stripe knows your name, which you had to submit to go live. If your first payment is with a credit card in your name, particularly if it's a large amount (which the fraud system flags as money laundering), you will get banned with 100% certainty. Ask a friend who doesn't share your last name.


> For Uber, every deployment is an experiment

Me: Let's do that!

Boss: Ummm...


Testing in Production

The Crowdstrike philosophy /s


TLDR: some bugs can only happen in a real production environment, so expect them and be ready when deploying. Thinking your deploy will be ok because staging env passed all tests is delusional.


Yes, exactly this. I test staging before every deployment, and prod after every deployment. Thirty $2 credit card payments per month on my personal credit card is a small price to pay for the piece of mind that the next $800 order won't fail.


[flagged]


That is a blatantly incorrect summary. Might you have dismissed the article too early?


The summary is generated by AI.


Why post an AI summary if you know it's wrong?


Sorry - it seems a fairly accurate summary. Quite impressed by Llamma 3 there

I mean there is even a pull quote in the article:

Not all bugs can be found until you are in production - therefore some testing must be done in production [and by implication you need to test carefully and be able to rollback]


Or, one could test in production-parallel deployment. Clone all requests to a parallel test system, use the same production data for enrichment and validation for both, the current production system and the new one. And automatically compare the outputs from both systems for those fields that have to be the same between the systems, and test the expected changed outputs automatically.

Once there are no errors in the new system, you start switching over the systems in a controlled manner where the new system increasingly takes on the production role, and the old one still processes cloned requests for a while as a sanity check…

This way you don’t need an unrealistic staging environment, and you are not introducing any errors into production.

It worked more than 20 years ago when I architected this for a system that had to process 50M transactions every hour.


If you rely on a card-processor or a banking API this has some limitations.


Nothing’s stopping you from cloning those responses as well… Compare calls, clone responses.


Charge customers twice?


I’m sure my former employer would have loved that.

But no, you don’t send two requests. You compare the calls as they are generated, but you only send one - from the production system. And then you clone the response for the tested system.


Yeah, that’s an obvious workaround that I somehow overlooked lol. I hope I haven’t made any decisions during my career based on that particular lack of imagination.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: