Improving Mastodon’s disk usage

Mastodon's built-in CLI gives you the availability to clean attachments and previews from remote accounts, purging the disk cache. This is fantastic and you couldn't possible survive with out it.

My current crontab job hourly with this:

@hourly /bin/bash /home/mastodon/purge-media.sh
Code language: Bash (bash)

Here's the content of the purge-media.sh script:

#!/bin/bash RAILS_ENV=production /home/mastodon/live/bin/tootctl statuses remove --days 4; RAILS_ENV=production /home/mastodon/live/bin/tootctl media remove --days 4; RAILS_ENV=production /home/mastodon/live/bin/tootctl preview_cards remove --days 4;
Code language: JavaScript (javascript)

The size of the cached media will depend on how many people you follow and/or how many relays you have added to your instance.

I have like 10 relays and with commands above I'm around 60Gb in my Object Storage S3.

How about avatars and headers?

Update the purging of avatars and headers command has been added and should be included in an upcoming release (> 4.0.2)

Edited: December 2022

Here's where things get disappointing, a GitHub issue from 2018 outlines the problem of caching avatars and headers for the the users from outside your instance. Yes, it makes sense that these are downloaded to your instance but it doesn't make sense that are kept forever (?).

There should be a way to purge them, to check if these are accounts you're interacting with or not. In a way you have:

tootctl accounts refresh --all

Which does:

Refetch remote user data and files for one or multiple accounts.

But this is not good enough ❌

You have yet another command at your disposal:

tootctl accounts cull

Remove remote accounts that no longer exist. Queries every single remote account in the database to determine if it still exists on the origin server, and if it doesn't, then remove it from the database. Accounts that have had confirmed activity within the last week are excluded from the checks, in case the server is just down.

Again, this is not good enough ❌

I refuse to believe that after running these commands I'm still stuck with:

Avatars: 920 MB (61.3 KB local) Headers: 1.95 GB (101 KB local)
Code language: plaintext (plaintext)

I couldn't care less for user headers 🤷‍♂️

💡Idea: replace all headers with 1x1 pixel

Update: I've ended up simply removing all the files inside the accounts/headers folder. Nothing seems to break and it's just more fast and clean ✅

Edited: November 14th 2022

What if we replace all header files (jpg, png, jpeg, webp...) with the smallest image file possible? A 1x1 pixel file.

Yes, you would still have thousands of files but taking way less space.

Well, that's exactly what I did. A small bash script (I'm no expert!) that loops through all the image files inside the cache/headers folder and replaces them with a pixel (symbolic link).

#!/bin/bash for file in $(find /home/mastodon/live/public/system/cache/accounts/headers -type f \( -iname \*.jpg -o -iname \*.jpeg -o -iname \*.png -o -iname \*.webp \) -type f); do rm "$file" if [ "${file: -5}" == ".jpeg" ] then echo "This is a JPEG! $file" SOURCE="pixel.jpeg" fi if [ "${file: -4}" == ".jpg" ] then echo "This is a JPG! $file" SOURCE="pixel.jpg" fi if [ "${file: -4}" == ".png" ] then echo "This is a PNG! $file" SOURCE="pixel.png" fi if [ "${file: -5}" == ".webp" ] then echo "This is a webP! $file" SOURCE="pixel.webp" fi ln "$SOURCE" "$file" done
Code language: Bash (bash)

GitHub repo with the code ready for you to checkout

Did it work?

Yes! The headers folder has considerably reduced (for a single-user instance):

2GB ➡️ 250MB

Nothing seems to be broken. The user profiles load, the images load, the only thing is that they're the replaced 1 pixel images instead of the originally cached header.

⚠️ The only thing is that Mastodon's built in usage check still thinks I have 2GB in headers. My guess is the size of the images must be stored in the database (?)

tootctl media usage

💡 Re-compress instead of replace

We can also re-compress the files, if you don't want to delete the headers or avatars. Personally I run the following on the avatars cache:

find -name '*.jpg' -print0 | xargs -0 jpegoptim --verbose --preserve --threshold=1 --max=45 find -name '*.jpeg' -print0 | xargs -0 jpegoptim --verbose --preserve --threshold=1 --max=45 find -name '*.png' -print0 | xargs -0 pngquant --verbose --ext=.png --force --speed 10 --quality 45-50 --skip-if-larger
Code language: JavaScript (javascript)

Conclusion

Keeping the size of a single-user Mastodon is not trivial.

Had I known this before getting started I would have probably installed a Pleroma or Akkoma instance instead. Which are way more lightweight. Granted the UI is not as good (if you want multi-column layout) but maybe you don't need all Mastodon's features. I am currently too invested to switch but I would highly encourage you to check out Pleroma and Akkoma (a fork) before installing Mastodon.

Do you have any other tips on how to keep a Mastodon instance lean?

⚠️ Don't miss the follow up post: Scaling Mastodon: moving media assets to Object Storage

Comments

  1. @wild1145 I think it could have been something with IPv6 I’ve compared my other domains and .dev had it set up at the Linode Domain dashboard, for some reason.I’ve removed the extra configuration (which I don’t remember creating) for IPv6 and now they look the same, see if that solves it after propagation 🤞 Thank you again for the feedback 💜

Reposts

  • Patrick Walter
  • あに(aniy -iis, n.) :verified:

Mentions

  • Ricard Torres
  • YoSiJo :anxde: :debian: :tor:

Leave a Reply

Your email address will not be published. Required fields are marked *