Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Startups? I work for a finance firm, and while we certainly have a need for large farms of servers to store data, my current team keep talking about web request latencies as an important infrastructure concern when the literal maximum number of users is in the tens of thousands.

Which is, you know, 1 maybe 2 machines with nginx plus 1 for redundancy. Our internal services are slow because people cohost them with batch jobs, and no other reason.



It's so frustrating, because when it all boils down to it, the story is really quite simple:

1) What metrics reflect your customer experience? Monitor them, alarm on them. Target them as a priority for improvement. Check in on them weekly at a minimum.

2) What metrics make up the metrics that reflect the customer experience? Monitor them as well, and consider whether alarming on them is the right thing to do. Use these to direct what you target to solve 1.

3) What do you need for SPOF resilience? If you've got something critical, you need a minimum of two servers running it, and no more than, say, 40% total CPU usage so that you've got sufficient overhead to cope with a single host failure and any unexpected work increase.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: