Scaling Mastodon: moving media assets to Object Storage

⚠️ I have written a follow-up post moving from B2 to Scaleway as Object Storage provider. This post is still valid, the only thing is I'm not longer using B2, which might be perfectly fine for your needs 👍


When I installed my private personal Mastodon instance y quickly noticed Mastodon has storage issues, so I tried to improve it by setting up a bunch of cron jobs to purge cached files.

All the commands used are out of the box, no hacking involved but it's something that's not really highlighted in the installation guides and it's quite important.

I've been battling with a steady reduction of disk space for the last month. Not terrible but not great.

After reading a lot from other users that the actual costs of moving the cached files with a CDN in front were so low I deiced to make the jump. External object storage provider was the next step.

Why?

Disk space, obviously 😄 The reality is that VPS providers don't give you that much disk space. My VPS provider Linode, offers great shared CPU plans such as:

  • 5$ for 1 CPU, 1GB of RAM and 25 GB of disk space

It's quite cheap, if you ask me 💜

But that's not enough for Mastodon's media usage. Nowadays the space should be cheap, and it actually is, but you have to use object storage. Regular SSD disk space was not meant for this use case.

Linode also offers object storage, nice. In fact (at the time of the writing) they offer 250GB for only 5$ extra. Five dollars is not much but it's a flat fee 😢 If you go with AWS or BackBlaze the price starts from 0$ 🤩 Unless you already have two hundred GB of media, you're probably better off with a dynamically priced provider to start with just like I did 😉

What's my setup?

  • BackBlaze: the object storage provider with lower prices than Amazon's famous S3 but it has a compatible S3 API (be mindful when creating an account, select the correct region US vs EU). It has partnership with Cloudflare for very cheap close to free CDN. You need however to move your domain's DNS to Cloudflare if you want to use their CDN.
  • CloudFront from Amazon Web Services: I didn't move my domain's DNS to Cloudflare so I used AWS. It has an absurdly high free tier which I can't possibly make use of. My CDN forecasted usage is less than 2% of the free tier 🤭
  • AWS CLI: I've used to do the initial sync between my disk and the S3 bucket in BackBlaze. It takes a while, sit tight and grab a coffee.

Configuring your .env.production file

From your Mastodon folder ~/live/.env.production

Here's what I had to add, customize the values based on your S3 and URL's:

S3_ENABLED=true S3_PROTOCOL=https S3_ENDPOINT=https://XXXXX.backblazeb2.com S3_HOSTNAME=XXXX.backblazeb2.com S3_BUCKET=XXXX AWS_ACCESS_KEY_ID=XXXX AWS_SECRET_ACCESS_KEY=XXXX S3_ALIAS_HOST=media.ricard.social
Code language: Bash (bash)

What's the Mastodon retention policy?

I've set up the purging of attachments and previews to 4 days running every night.

Before doing the S3 migration I had it to one day but considering how cheap it is, seven feels like a nice number.

How cheap is the whole thing?

I'm honestly surprised how low the cost of the whole move is. For the past month:

  1. BackBlaze S3 bucket:
    • 💾 Disk usage (60 GB): USD $0.16
    • 📈 Transactions: USD $0.25
    • 📉 Downloaded GB: 0€
    • 💰 Cost: USD $0.41
  2. Amazon Web Services CloudFront CDN:
    • 📈 Transactions: 1% from the free tier
    • 📉 Downloaded GB: 1.2% from the free tier
    • 💰 Monthly cost: free

Conclusion

There is no doubt a move to object storage should be your default if you attempt to host a Mastodon instance. It doesn't matter if for a private single-user, 10 users, or 10.000 users. The cost of hosting the media files in a S3-compatible bucket will be far cheaper than locally on the same server.

Have you moved your files to an object storage? Have you encountered any issues?

Any other cloud storage provider you recommend?

Comments

  1. @ricard I did consider adding relays to mine but they either seemed really unreliable or filled with tons of random instances that I wasn't 100% happy with. I definitely recommend https://github.com/g3rv4/FakeRelay/ if you haven't tried it, I've found it super useful as you can pull in hashtags from the big instances that don't always make it to relays.
    GitHub – g3rv4/FakeRelay: An API to index statuses on Mastodon acting as a relay

Reposts

  • Head Nerd 🏳️‍🌈
  • Thomas Mielke :verified:
  • Jelv 🎴
  • Maarten den Braber
  • Bruno Miguel
  • Daniel
  • Gerbrand van Dieyen
  • David Gregory
  • Alex Nedelcu :scala: :java:
  • Rui Batista

Mentions

  • Mastodon braucht ein bissl viel Disk-Space - was tun? - Feddit
  • Ricard Torres

Leave a Reply

Your email address will not be published. Required fields are marked *