Updated: April 20, 2021
Amazon S3 is brilliant. It makes serving up large amounts of really easy, and it’s generally pretty cheap.
However, if you ever run into a situation where you need to serve up terrabytes of data each month, it can get pretty costly.
Recently a new business joined one of my platforms and immediately started using 10 times more bandwidth than all the existing users combined.
Here’s the quick and easy way I have dramatically reduced this cost, while not severly compromising any of the benefits a service like S3 normally provides.
Step 1: Migrate Data to Wasabi
Wasabi is an S3 compatible service with must lower costs than Amazon, at the expense of a few less features (and the associated complexity of moving ‘outside’ the AWS ecosystem.
Contrary to popular belief, Wasabi’s egress bandwidth isn’t necessarily free – they’ll contact you pretty quickly if you start using too much (as I found out the hard way, necessitating step 2 below). You can elect to pay $0.04 per Gb of egress bandwidth though, which is still substantially better than Amazon’s pricing.
Wasabi seems to be the most reliable and performant alternative to Amazon that I tested. I also tried out Digital Ocean’s Spaces, as well as Linode’s Object Storage. In my opinion, neither is ready for production use.
Digital Ocean’s service suffered from poor performance, so I immediately ruled it out… My customers would have a heart attack if their content crawled along at 300k/s as it did while I was testing the platform!
Linode was fast, but seemingly unreliable. While testing it for approximately 48 hours, users started reporting 403 errors when accessing files stored on the service.
In addition, there was a notable bug in their management interface which prevented seeing more than 70 top level folders within a bucket.
While this didn’t affect anything past the top level, and didn’t affect the actual storage… the existence of the bug makes me believe that the service is likely not being used for production.
Getting back to Wasabi. Migrating data was a piece of cake with Minio’s mc mirror feature, and implementing it within our platform was as easy as swapping out the endpoint and keys – neato!
Performance is great, it seems reliable, and really is a drop-in replacement for S3 – at least for what I needed.
Depending on your requirements, switching to Wasabi could potentially save 80 – 90% over S3 without taking any other steps.
Step 2: Caching With Minio
After digging into what was contributing to the overall bandwidth use, I found that only about 20% of the data being stored on S3 (now Wasabi) was ‘hot’ for any given week.
That is to say, in any given 7 day period, only about 20% of the total data stored was ever accessed.
Having a relatively small subset of data actually being used makes this a great opportunity to implement caching to further reduce the egress bandwidth that Wasabi needs to serve.
Assuming that there’s an easy way to cache content from S3 AND a the bandwidth for the cache server itself is cheaper, of course!
Enter Minio Gateway.
Minio Gateway allows you to configure Minio (an S3 compatible server) to function as a proxy for another S3 compatible service (And some other object storage services for that matter), while allowing caching of both objects and the associated meta data.
I won’t got over the configuration here, as it’s pretty straight forward. Just point it at your source S3 service, tell it to enable caching, give it an access key… and that’s just about it!
After that, Minio Gateway will operate as an LRU cache for object and meta data, and it performs great.
There is one caveat that may or may not be an serious issue for your implementation though…
Minio caches a new objects by creating a lock as it downloads it from the source S3 service. It then downloads/streams the content to whichever device requested it at the rate that the requesting device can receive the content.
This lock means that any subsequent attempt to access the same object must wait until the first device (or rather, the device that caused the object to be cached) has finished downloading it before any data at all will be sent. Until that happens, the read will just block.
For small objects, this is probably a non-issue. But for big objects, you may run into a scenario where lots of users are waiting hours for some dialup user to download the object!
Step 3: High(ish) Availability
One downside to implementing a cache, is that there is now a single point of failure where there wasn’t before.
In my implementation, I work around this by monitoring the cache server and automatically re-routing any object requests directly to Wasabi if the cache service is unavailable.
It’s not ideal, as any currently downloading files will fail and need to be manually restarted.
It would be possible to implement multiple Minio Gateway servers to provide redundancy, and have a load balancer balance/route between them of course – but that’s overkill for my implementation – at least for the time being 🙂
Step 4: Monitoring
Minio Gateway provides Prometheus metrics with token authentication built right in, and takes only few seconds to get set up.
As an aside, I really wish every Prometheus service had authentication built-in – looking at you, node_exporter!.
From there, it’s as simple as setting up a few pretty charts in Grafana and it’s all done!
As I write this, around 99% of data is being served from the cache – a big win!