How to create a Mastodon user sitemap.xml

The out of the box Mastodon does not come with sitemap.xml generation built-in. If you want to push a user's sitemap to Google or other search engines here's how to do it.

๐Ÿ’ก My setup is meant to be for a single-user Mastodon instance.

๐Ÿค” Why?

It's simple, I don't have Elastic Search set up in my Mastodon instance and want to have a way to search my posts.

๐Ÿง‘โ€๐Ÿ’ป I'm sold, tell me more

I went a step further and used GitHub actions to automate this for me, so I don't have to install anything in my instance. Here's an overview of that we'll do:

  • GitHub Actions to run a daily scheduled job
  • GitHub Actions will use the Python script to create a sitemap
  • GitHub Actions will move the sitemap.xml to GitHub Pages to it's publicly available
  • Fetch the file into our Mastodon instance using wget.

0. Mastodon Sitemap Generator

We will use this awesome Python script ๐Ÿ’œ

https://github.com/binfalse/mastodon-sitemap

1. ๐Ÿ™ Using GitHub Actions

I've forked the script and added the GitHub Action below.

โš ๏ธ Please notice: I've changed the path where the file is created (from tmp to dist and added this also the .py script. So I'd recommend forking my repository if you want to use this as it is.

This GitHub Action will:

  • 1๏ธโƒฃ Set up Python
  • 2๏ธโƒฃ Install the 3 dependencies
  • 3๏ธโƒฃ Run the script
  • 4๏ธโƒฃ Move the generated dist/sitemap.xml into the gh-pages branch to it's hosted publicly.
name: Build and Deploy on: schedule: - cron: "0 4 * * *" jobs: build-and-deploy: runs-on: ubuntu-latest steps: - name: Checkout ๐Ÿ›Ž๏ธ uses: actions/checkout@v3 with: persist-credentials: false - name: Install Python uses: actions/setup-python@v4 with: python-version: '3.9' cache: 'pip' cache-dependency-path: 'mastodon-sitemap.py' - name: Install Deps run: pip3 install argparse Mastodon.py sitemap_python - name: Build run: | python3 mastodon-sitemap.py --instance https://ricard.social \ --access-token ${{ secrets.ACCESS_TOKEN }} \ --max-urls 500 \ --overwrite \ ./dist/sitemap.xml - name: Deploy ๐Ÿš€ uses: JamesIves/github-pages-deploy-action@releases/v4 with: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} BRANCH: gh-pages FOLDER: dist
Code language: JavaScript (javascript)

https://github.com/quicoto/mastodon-sitemap

2. โคต๏ธ Fetch the file into the instance

Google Webmaster Tools doesn't let you submit a sitemap hosted in a other domain ๐Ÿ˜ข Otherwise we could just send the GitHub public URL of the file ๐Ÿคญ

No worries, we will add a daily job to simply download the file into our folder. Couldn't get any easier! ๐Ÿ˜‰

cron job

@daily /bin/bash /home/mastodon/fetch-sitemap.sh
Code language: CSS (css)

fetch-sitemap.sh

#!/bin/bash cd live/public wget https://quicoto.github.io/mastodon-sitemap/sitemap.xml
Code language: PHP (php)

3. โœ… Done

Now the file is available under my domain and Google is starting to index my posts:

https://ricard.social/sitemap.xml

Conclusion

Sure, this is not as good as adding Elastic Search to your Mastodon instance but it can be a nice workaround if you want to search your posts without Elastic Search. Personally the less bloated my server and my instance is, the better. This way I outsource the indexing to Google and the generation to GitHub.

Once the posts are indexed you can simply run this query on Google to find what you're looking for.

site:ricard.social SOME_TEXT
Code language: CSS (css)

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *