As for autoscaling, our hands are tied as long as we're running on Nomad. Right now our autoscaler is nothing more than some ruby that loops over data from prometheus and changes counts in Nomad. It's slow and buggy, but worse we don't have control over where Nomad places VMs or which ones it stops when scaling down.
We're working on a replacement for Nomad (called flyd) that gives us full control over VMs. Once apps are running on that we can do a lot of cool things. Better autoscaling is one, but I'm really excited about suspending idle VMs that our proxy wakes up on demand. That'll cover most use cases without forcing customers to worry about counts or blowing through a budget.
I’d love to hear more about this move away from Nomad.
We haven’t had too good a time with nomad, but not sure if it’s just our limited understanding. It doesn’t help that there are very few people out there that know it.
We'll write about it when the time comes. To be fair, Nomad and Consul have served us well. Most of our troubles stem from abusing them in ways they weren't designed to handle.
We're working on a replacement for Nomad (called flyd) that gives us full control over VMs. Once apps are running on that we can do a lot of cool things. Better autoscaling is one, but I'm really excited about suspending idle VMs that our proxy wakes up on demand. That'll cover most use cases without forcing customers to worry about counts or blowing through a budget.