How to create a Mastodon user sitemap.xml

The out of the box Mastodon does not come with sitemap.xml generation built-in. If you want to push a user's sitemap to Google or other search engines here's how to do it.

πŸ’‘ My setup is meant to be for a single-user Mastodon instance.

πŸ€” Why?

It's simple, I don't have Elastic Search set up in my Mastodon instance and want to have a way to search my posts.

πŸ§‘β€πŸ’» I'm sold, tell me more

I went a step further and used GitHub actions to automate this for me, so I don't have to install anything in my instance. Here's an overview of that we'll do:

  • GitHub Actions to run a daily scheduled job
  • GitHub Actions will use the Python script to create a sitemap
  • GitHub Actions will move the sitemap.xml to GitHub Pages to it's publicly available
  • Fetch the file into our Mastodon instance using wget.

0. Mastodon Sitemap Generator

We will use this awesome Python script πŸ’œ

https://github.com/binfalse/mastodon-sitemap

1. πŸ™ Using GitHub Actions

I've forked the script and added the GitHub Action below.

⚠️ Please notice: I've changed the path where the file is created (from tmp to dist and added this also the .py script. So I'd recommend forking my repository if you want to use this as it is.

This GitHub Action will:

  • 1️⃣ Set up Python
  • 2️⃣ Install the 3 dependencies
  • 3️⃣ Run the script
  • 4️⃣ Move the generated dist/sitemap.xml into the gh-pages branch to it's hosted publicly.
name: Build and Deploy
on:
  schedule:
      - cron: "0 4 * * *"
jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout πŸ›ŽοΈ
        uses: actions/checkout@v3
        with:
            persist-credentials: false

      - name: Install Python
        uses: actions/setup-python@v4
        with:
            python-version: '3.9'
            cache: 'pip'
            cache-dependency-path: 'mastodon-sitemap.py'

      - name: Install Deps
        run: pip3 install argparse Mastodon.py sitemap_python

      - name: Build
        run: |
          python3 mastodon-sitemap.py --instance https://ricard.social        \
              --access-token ${{ secrets.ACCESS_TOKEN }} \
              --max-urls 500                               \
              --overwrite                                 \
              ./dist/sitemap.xml

      - name: Deploy πŸš€
        uses: JamesIves/github-pages-deploy-action@releases/v4
        with:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          BRANCH: gh-pages
          FOLDER: distCode language: JavaScript (javascript)

https://github.com/quicoto/mastodon-sitemap

2. ‡️ Fetch the file into the instance

Google Webmaster Tools doesn't let you submit a sitemap hosted in a other domain 😒 Otherwise we could just send the GitHub public URL of the file 🀭

No worries, we will add a daily job to simply download the file into our folder. Couldn't get any easier! πŸ˜‰

cron job

@daily /bin/bash /home/mastodon/fetch-sitemap.shCode language: CSS (css)

fetch-sitemap.sh

#!/bin/bash

cd live/public
wget https://quicoto.github.io/mastodon-sitemap/sitemap.xmlCode language: PHP (php)

3. βœ… Done

Now the file is available under my domain and Google is starting to index my posts:

https://ricard.social/sitemap.xml

Conclusion

Sure, this is not as good as adding Elastic Search to your Mastodon instance but it can be a nice workaround if you want to search your posts without Elastic Search. Personally the less bloated my server and my instance is, the better. This way I outsource the indexing to Google and the generation to GitHub.

Once the posts are indexed you can simply run this query on Google to find what you're looking for.

site:ricard.social SOME_TEXTCode language: CSS (css)

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *