Improving Mastodon’s disk usage

Mastodon's built-in CLI gives you the availability to clean attachments and previews from remote accounts, purging the disk cache. This is fantastic and you couldn't possible survive with out it.

My current crontab looks something like this:

0 3 * * * RAILS_ENV=production /home/mastodon/live/bin/tootctl media remove --days 1 0 4 * * * RAILS_ENV=production /home/mastodon/live/bin/tootctl preview_cards remove --days 1 0 5 * * * RAILS_ENV=production /home/mastodon/live/bin/tootctl statuses remove --days 1
Code language: Bash (bash)

I basically run these 3 jobs at 3AM, 4AM and 5AM every day to keep the single-user instance thin. It works, the size is kept down to something like this.

Attachments: 494 MB (0 Bytes local) Custom emoji: 43.4 MB (0 Bytes local) Preview cards: 13.3 MB
Code language: plaintext (plaintext)

Q: Why do you run the commands cleaning 1 day old media?
A: My server doesn't have much space and I want to keep it as lean as possible.

How about avatars and headers?

Here's where things get dissapointing, a GitHub issue from 2018 outlines the problem of caching avatars and headers for the the users from outside your instance. Yes, it makes sense that these are downloaded to your instance but it doesn't make sense that are kept forever (?).

There should be a way to purge them, to check if these are accounts you're interacting with or not. In a way you have:

tootctl accounts refresh --all

Which does:

Refetch remote user data and files for one or multiple accounts.

But this is not good enough ❌

You have yet another command at your disposal:

tootctl accounts cull

Remove remote accounts that no longer exist. Queries every single remote account in the database to determine if it still exists on the origin server, and if it doesn't, then remove it from the database. Accounts that have had confirmed activity within the last week are excluded from the checks, in case the server is just down.

Again, this is not good enough ❌

I refuse to believe that after running these commands I'm still stuck with:

Avatars: 920 MB (61.3 KB local) Headers: 1.95 GB (101 KB local)
Code language: plaintext (plaintext)

I couldn't care less for user headers 🤷‍♂️

💡Idea: replace all headers with 1x1 pixel

Update: I've ended up simply removing all the files inside the accounts/headers folder. Nothing seems to break and it's just more fast and clean ✅

Edited: November 14th 2022

What if we replace all header files (jpg, png, jpeg, webp...) with the smallest image file possible? A 1x1 pixel file.

Yes, you would still have thousands of files but taking way less space.

Well, that's exactly what I did. A small bash script (I'm no expert!) that loops through all the image files inside the cache/headers folder and replaces them with a pixel.

#!/bin/bash for file in $(find /home/mastodon/live/public/system/cache/accounts/headers -type f \( -iname \*.jpg -o -iname \*.jpeg -o -iname \*.png -o -iname \*.webp \) -type f); do rm "$file" if [ "${file: -5}" == ".jpeg" ] then echo "This is a JPEG! $file" SOURCE="pixel.jpeg" fi if [ "${file: -4}" == ".jpg" ] then echo "This is a JPG! $file" SOURCE="pixel.jpg" fi if [ "${file: -4}" == ".png" ] then echo "This is a PNG! $file" SOURCE="pixel.png" fi if [ "${file: -5}" == ".webp" ] then echo "This is a webP! $file" SOURCE="pixel.webp" fi cp "$SOURCE" "$file" done
Code language: Bash (bash)

GitHub repo with the code ready for you to checkout

Did it work?

Yes! The headers folder has considerably reduced (for a single-user instance):

2GB ➡️ 250MB

Nothing seems to be broken. The user profiles load, the images load, the only thing is that they're the replaced 1 pixel images instead of the originally cached header.

⚠️ The only thing is that Mastodon's built in usage check still thinks I have 2GB in headers. My guess is the size of the images must be stored in the database (?)

tootctl media usage

💡 Re-compress instead of replace

We can also re-compress the files, if you don't want to delete the headers or avatars. Personally I run the following on the avatars cache:

find -name '*.jpg' -print0 | xargs -0 jpegoptim --verbose --preserve --threshold=1 --max=45 find -name '*.jpeg' -print0 | xargs -0 jpegoptim --verbose --preserve --threshold=1 --max=45 find -name '*.png' -print0 | xargs -0 pngquant --verbose --ext=.png --force --speed 10 --quality 45-50 --skip-if-larger
Code language: JavaScript (javascript)

Conclusion

Keeping the size of a single-user Mastodon is not trivial.

Had I known this before getting started I would have probably installed a Pleroma or Akkoma instance instead. Which are way more lightweight. Granted the UI is not as good (if you want multi-column layout) but maybe you don't need all Mastodon's features. I am currently too invested to switch but I would highly encourage you to check out Pleroma and Akkoma (a fork) before installing Mastodon.

Do you have any other tips on how to keep a Mastodon instance lean?

Mentions

Leave a Reply

Your email address will not be published. Required fields are marked *