Scaling Mastodon: moving media assets to Object Storage

⚠️ I have written a follow-up post moving from B2 to Scaleway as Object Storage provider. This post is still valid, the only thing is I'm not longer using B2, which might be perfectly fine for your needs 👍


When I installed my private personal Mastodon instance y quickly noticed Mastodon has storage issues, so I tried to improve it by setting up a bunch of cron jobs to purge cached files.

All the commands used are out of the box, no hacking involved but it's something that's not really highlighted in the installation guides and it's quite important.

I've been battling with a steady reduction of disk space for the last month. Not terrible but not great.

After reading a lot from other users that the actual costs of moving the cached files with a CDN in front were so low I deiced to make the jump. External object storage provider was the next step.

Why?

Disk space, obviously 😄 The reality is that VPS providers don't give you that much disk space. My VPS provider Linode, offers great shared CPU plans such as:

  • 5$ for 1 CPU, 1GB of RAM and 25 GB of disk space

It's quite cheap, if you ask me 💜

But that's not enough for Mastodon's media usage. Nowadays the space should be cheap, and it actually is, but you have to use object storage. Regular SSD disk space was not meant for this use case.

Linode also offers object storage, nice. In fact (at the time of the writing) they offer 250GB for only 5$ extra. Five dollars is not much but it's a flat fee 😢 If you go with AWS or BackBlaze the price starts from 0$ 🤩 Unless you already have two hundred GB of media, you're probably better off with a dynamically priced provider to start with just like I did 😉

What's my setup?

  • BackBlaze: the object storage provider with lower prices than Amazon's famous S3 but it has a compatible S3 API (be mindful when creating an account, select the correct region US vs EU). It has partnership with Cloudflare for very cheap close to free CDN. You need however to move your domain's DNS to Cloudflare if you want to use their CDN.
  • CloudFront from Amazon Web Services: I didn't move my domain's DNS to Cloudflare so I used AWS. It has an absurdly high free tier which I can't possibly make use of. My CDN forecasted usage is less than 2% of the free tier 🤭
  • AWS CLI: I've used to do the initial sync between my disk and the S3 bucket in BackBlaze. It takes a while, sit tight and grab a coffee.

Configuring your .env.production file

From your Mastodon folder ~/live/.env.production

Here's what I had to add, customize the values based on your S3 and URL's:

S3_ENABLED=true
S3_PROTOCOL=https
S3_ENDPOINT=https://XXXXX.backblazeb2.com
S3_HOSTNAME=XXXX.backblazeb2.com
S3_BUCKET=XXXX
AWS_ACCESS_KEY_ID=XXXX
AWS_SECRET_ACCESS_KEY=XXXX
S3_ALIAS_HOST=media.ricard.socialCode language: Bash (bash)

What's the Mastodon retention policy?

I've set up the purging of attachments and previews to 4 days running every night.

Before doing the S3 migration I had it to one day but considering how cheap it is, seven feels like a nice number.

How cheap is the whole thing?

I'm honestly surprised how low the cost of the whole move is. For the past month:

  1. BackBlaze S3 bucket:
    • 💾 Disk usage (60 GB): USD $0.16
    • 📈 Transactions: USD $0.25
    • 📉 Downloaded GB: 0€
    • 💰 Cost: USD $0.41
  2. Amazon Web Services CloudFront CDN:
    • 📈 Transactions: 1% from the free tier
    • 📉 Downloaded GB: 1.2% from the free tier
    • 💰 Monthly cost: free

Conclusion

There is no doubt a move to object storage should be your default if you attempt to host a Mastodon instance. It doesn't matter if for a private single-user, 10 users, or 10.000 users. The cost of hosting the media files in a S3-compatible bucket will be far cheaper than locally on the same server.

Have you moved your files to an object storage? Have you encountered any issues?

Any other cloud storage provider you recommend?

Comments

  1. Nicolas says:

    Hi thanks for the info.
    Could you make a post on how to move an existing system directory from a Linux instance to a S3 bucket ?
    That's what I'll need to do if I move my data there.

    Thanks !

    1. Hey Nicolas

      Read this post where I showcase using the “aws” CLI commands to do exactly that:

      https://ricard.dev/moving-s3-from-backblaze-b2-to-scaleway/

      This one should also be read:

      https://ricard.dev/scaling-mastodon-moving-media-assets-to-object-storage/

      Long story short, for example to upload assets from your disk to an S3 bucket (to the Scaleway S3 Bucket)

      aws s3 sync . s3://NAME_OF_THE_BUCKET --acl public-read --exclude ".DS_Store"Code language: JavaScript (javascript)
  2. @ricard @anlomedad While it takes a while to do so you might not want to run it every night (as well as some providers charging for the api calls to check all the files), I didn’t see a command in there to remove orphans ( https://docs.joinmastodon.org/admin/tootctl/#media-remove-orphans )tootctl media remove-orphansIt will go through for media files that aren’t connected to stuff anymore, just taking up space.
    Using the admin CLI – Mastodon documentation

  3. @ricard Thank you for your follow-up post. Fun to read. Challenge for my instance is running it with $0 cost with minimum maintenance cost(my time). So I am focusing more on automated disaster recovery(with cloudinit, ansible, docker) than scaling at the moment. It’s tiny instance only for me and my bots running on Oracle cloud’s free-tier. No plan to expand its size but try to maximize what I have. The biggest challenge now is its 20G S3 allowance. Your article saved lots of my S3 space. :+1:

  4. C. W. Smith says:

    Used this to get to Object Storage on Linode. But I seem to have done something wrong because all the media is blurred or showing broken links for newly posted content.

    1. Very strange, they so say it’s S3 compatible.

      1. Have you tried comparing their docs? https://www.linode.com/docs/products/storage/object-storage/
      2. WHat do you mean by broken links? The image src points to your CDN but it doesn’t load?
      3. Have you checked if the attachments are in the actual S3 storage?
      4. Check the CDN, see if it’s reaching the S3 origin

      Hope this helps

  5. @ricard I did consider adding relays to mine but they either seemed really unreliable or filled with tons of random instances that I wasn't 100% happy with. I definitely recommend https://github.com/g3rv4/FakeRelay/ if you haven't tried it, I've found it super useful as you can pull in hashtags from the big instances that don't always make it to relays.
    GitHub – g3rv4/FakeRelay: An API to index statuses on Mastodon acting as a relay

  6. @ricard I’m running MinIO on a $30 xeon server that’s in my garage, so I don’t think it gets any cheaper than that. I was briefly considering trying out backblaze or R2, but I think I prefer self-hosting object storage.I thought I had read somewhere that there were a couple alternatives to the Pleroma-FE frontend.

  7. Mastodon’s built-in CLI gives you the availability to clean attachments and previews from remote accounts, purging the disk cache. This is fantastic and you couldn’t possible survive with out it.

    My current

    crontab

    that runs every 3 hours:

    .wp-block-code {
    border: 0;
    padding: 0;
    -webkit-text-size-adjust: 100%;
    text-size-adjust: 100%;
    }

    .wp-block-code > span {
    display: block;
    overflow: auto;
    }

    .shcb-language {
    border: 0;
    clip: rect(1px, 1px, 1px, 1px);
    -webkit-clip-path: inset(50%);
    clip-path: inset(50%);
    height: 1px;
    margin: -1px;
    overflow: hidden;
    padding: 0;
    position: absolute;
    width: 1px;
    word-wrap: normal;
    word-break: normal;
    }

    .hljs {
    box-sizing: border-box;
    }

    .hljs.shcb-code-table {
    display: table;
    width: 100%;
    }

    .hljs.shcb-code-table > .shcb-loc {
    color: inherit;
    display: table-row;
    width: 100%;
    }

    .hljs.shcb-code-table .shcb-loc > span {
    display: table-cell;
    }

    .wp-block-code code.hljs:not(.shcb-wrap-lines) {
    white-space: pre;
    }

    .wp-block-code code.hljs.shcb-wrap-lines {
    white-space: pre-wrap;
    }

    .hljs.shcb-line-numbers {
    border-spacing: 0;
    counter-reset: line;
    }

    .hljs.shcb-line-numbers > .shcb-loc {
    counter-increment: line;
    }

    .hljs.shcb-line-numbers .shcb-loc > span {
    padding-left: 0.75em;
    }

    .hljs.shcb-line-numbers .shcb-loc::before {
    border-right: 1px solid #ddd;
    content: counter(line);
    display: table-cell;
    padding: 0 0.75em;
    text-align: right;
    -webkit-user-select: none;
    -moz-user-select: none;
    -ms-user-select: none;
    user-select: none;
    white-space: nowrap;
    width: 1%;
    }

    0 */3 * * * /bin/bash /home/mastodon/purge-media.shCode language: JavaScript (javascript)

    Code language: Bash (bash)

    As of Mastodon 4.1.0, we have new available commands. Here’s the content of my

    purge-media.shCode language: CSS (css)

    script:

    #!/bin/bash
    
    # Prune remote accounts that never interacted with a local user
    RAILS_ENV=production /home/mastodon/live/bin/tootctl accounts prune;
    
    # Remove remote statuses that local users never interacted with older than 4 days
    RAILS_ENV=production /home/mastodon/live/bin/tootctl statuses remove --days 4;
    
    # Remove media attachments older than 4 days
    RAILS_ENV=production /home/mastodon/live/bin/tootctl media remove --days 4;
    
    # Remove all headers (including people I follow)
    RAILS_ENV=production /home/mastodon/live/bin/tootctl media remove --remove-headers --include-follows --days 0;
    
    # Remove link previews older than 4 days
    RAILS_ENV=production /home/mastodon/live/bin/tootctl preview_cards remove --days 4;
    
    # Remove files not linked to any post
    RAILS_ENV=production /home/mastodon/live/bin/tootctl media remove-orphans;Code language: PHP (php)

    Code language: PHP (php)

    ⚠️ If you’ve never run these commands before I’d suggest run them one by one (not in a cronjob) as they might take several hours (or days) to run each. The size of the cached media and database will depend on how many people you follow, how many are on your instance, how many relays you have added to your instance, etc.

    Having like 10 relays added to my single-user instance with the bash script above I’m around 30Gb in my Object Storage S3.

    Do you have any other tips on how to keep a Mastodon instance lean?

    👋 Don’t miss the follow up post: Scaling Mastodon: moving media assets to Object Storage

    Mentions

  • 💬 Borghal :verified_gay:

Leave a Reply

Your email address will not be published. Required fields are marked *