I was positively surprised by EFS a while back, great performance for a good price as long as you didn't store a lot (expensive $0.30/GB storage)
Didn't notice they moved to a new "Elastic Throughput" pricing, the old one being "Legacy"... it now charges $0.03/GB reads and $0.06/GB writes on top of the same $0.30/GB storage, terrible pricing.
They also have two archive tiers infrequently-accessed files can age into, which means that the larger volumes I use are paying 0.30/GB for only about half of the data.
I will have to see if their NFSv4 implementation has improved any. When I added support for EFS to my company's NFSv4 client we ran into a couple of performance bottlenecks and just general spec non-conformance.
Specifically, we noticed lack of support for these features:
- session trunking (and, in general, multiple channels)
- multiple concurrent requests on a channel (ala ca_maxrequests)
- callbacks
ca_maxoperations was quite low (10, I think?), and they limit the number of parallel clients that you can access EFS with at once.
What this amounts to is that you can reach acceptable performance on file reads/writes, but metadata heavy access patterns have no hope of reaching the advertised IOPs. It's a shame because frankly metadata performance is something NFSv4 excels at.
As others have mentioned among non-native services (not sure if that's acceptable to you, but it is to some) there are many 3rd party solutions that work better with NFSv4.
One the one hand, nifty! And I’m sure that serious engineering went in to building out a system that can handle the amount of fan-out that might happen.
On the other hand, the pricing is absurd. A very nice 30.72TiB NVMe SSD for datacenter use is $3400. Storing that much data on EFS will pay for itself in less than two weeks. (I read the GB-month fee in the docs several times. I don’t think it’s a typo.). If you read the data four times, you will pay as much as that SSD costs.
This is so absurdly expensive that, for many workloads, it will be much cheaper to just replicate the data into big SSDs on each server. (Of course, doing this on AWS may be a bit unpleasant.)
And EFS’s performance seems lackluster. One of those SSDs has maybe 10x the IOPS and 1/10 the latency of EFS.
The only time it makes sense to use EFS is when you want to have multiple servers with concurrent, consistent read+write access to a shared network filesystem. That's an expensive architectural decision that's best avoided if at all possible, to be sure; personally, I would prefer not to build a service that takes a dependency on EFS. But comparing it to "just" using an SSD with some sort of replication is kind of silly.
> concurrent, consistent read+write access to a shared network filesystem
I don't think this is an "architectural decision", more a hold-over from legacy systems that had NFS dependencies. Main (public) example that comes to mind is Jira Data Center: https://confluence.atlassian.com/adminjiraserver/running-jir... If the business made a decision to buy it, then you need NFS, and EFS can fill that. In this sense, I think of EFS as being in the class of AWS products like Amazon MQ for RabbitMQ - the architectural decision was made outside of the context of your company's stack, but AWS still has a managed offering to support it.
Modern systems will typically pick S3 instead, as it's much cheaper. I'd be hard-pressed to imagine a need for which NFS is a reasonable modern decision in greenfield development.
You could compare a 5 node ceph cluster for instance. That’s 5 servers and 3 nvme drives minimum just to store your first byte. Then you have to deal with support, etc. pricing probably comes out about the same in the end.
You use EFS as a shared folder that you can share between a number of different workloads. If you want a POSIX compatible shared filesystem in the cloud, you're going to pay for it.
For example. I setup Developer Workspaces that can mount an EFS share to their linux box, and anything they put in there will be accessible from Kubernetes Jobs they kick off, and from their Jupyterhub workspace.
I can either pay AWS to do it for me, or I can figure out how to get a 250k IOPS GlusterFS server to work across multiple AZs in a region. I think the math maths out to around the same cost at the end of the day
If you don’t need this level of durability, then plain old local filesystems can work, too. XFS or ZFS or whatever, on a single machine, serving NFS, should nicely outperform EFS.
(If you have a fast disk. Which, again, AWS makes unpleasant, but which real hardware has no trouble with.)
This is dependent on your usecase, what types of storage you use, familiarity with tuning systems, setting up raid layouts, etc.
I love ZFS. It's incredibly powerful. It's also incredibly easy to screw up when designing how you want to set up your drives, especially if you intend to grow your storage. This also isn't including the effort needed to figure out how to make your filesystem redundant across datacenters or even just between racks in the same closet.
At the end of the day, if I screw up setting something on EFS I can always create a new EFS filesystem and move my data over. If I screw up a ZFS layout, I'm going to need a box of temporary drives to shuffle data onto while I remake an array.
> At the end of the day, if I screw up setting something on EFS I can always create a new EFS filesystem and move my data over. If I screw up a ZFS layout, I'm going to need a box of temporary drives to shuffle data onto while I remake an array.
True, but…
At EFS pricing, this seems like the wrong comparison. There’s no fundamental need to ever grow a local array to compete — buy an entirely new one instead. Heck, buy an entirely new server.
Admittedly, this means that the client architecture needs to support migration to a different storage backend. But, for a business where the price is at all relevant, using EFS for a single month will cost as much as that entire replacement server, and a replacement server comes with compute, too. And many more IOPS.
In any case, AWS is literally pitching using EFS for AI/ML. For that sort of use case, just replicate the data locally if you don’t have or need the absurdly fast networks that could actually be performant. Or use S3. I’m having trouble imagining any use case where EFS makes any sort of sense for this.
Keep in mind that the entire “pile” fits on ~$100 of NVMe SSD with better performance than EFS can possibly offer. Those fancy “10 trillion token” training sets fit in a single U.2 or EDSFF slot, on a device that speaks PCIe x4 and costs <$4000. Just replicate it and be done with it.
Buuttt... you're trying to compare apples (a rack in a DC) to oranges (an AWS Native Solution that spans multiple DCs). And that's before you get into all the AWS bullshit that fucking sucks, but it sucks more to do it yourself.
A Rack in a DC isn't a solution that's useful to people who are in AWS.
Had to deal with a few nightmare scenarios for a client with runaway costs due to high read/writes on a piece of software they were using that used SQLite on EFS + ECS as the infrastructure layer.
Short answer you can do it, but it will get expensive if you do any significant amount of writes.
I do wish AWS would release a SQLite compatible database though. Maybe this week during Reinvent.
I went through AWS’ example workloads e.g. analytics, CMS, web serving. Those all seem to be better / more commonly served using other storage (EBS, RDBMS, S3).
Can someone share how high-throughput EFS has been helpful ?
I’ve never had much luck with NFS in the past besides shared home directories. Anything concurrent always seemed to have consistency issues.
We looked at EFS to pair with fargate to run a small server.. we only needed a tiny bit of persistent shared storage (which served pages and config to connect to the DB)... But EFS was atrociously slow for large numbers of small files. I anticipate high throughout would help for applications that want to read/write) lots of small files serially like websites built on frameworks that have thousands of files to read to start or to render a page.
NFSv4 is designed to alleviate the consistency shortcomings of NFSv3, but unfortunately I'm not aware of any client implementations that leverage the full spec as intended. The linux implementations are improving frequently.
Didn't notice they moved to a new "Elastic Throughput" pricing, the old one being "Legacy"... it now charges $0.03/GB reads and $0.06/GB writes on top of the same $0.30/GB storage, terrible pricing.