The out of the box Mastodon does not come with sitemap.xml
generation built-in. If you want to push a user's sitemap to Google or other search engines here's how to do it.
๐ก My setup is meant to be for a single-user Mastodon instance.
๐ค Why?
It's simple, I don't have Elastic Search set up in my Mastodon instance and want to have a way to search my posts.
๐งโ๐ป I'm sold, tell me more
I went a step further and used GitHub actions to automate this for me, so I don't have to install anything in my instance. Here's an overview of that we'll do:
- GitHub Actions to run a daily scheduled job
- GitHub Actions will use the Python script to create a sitemap
- GitHub Actions will move the sitemap.xml to GitHub Pages to it's publicly available
- Fetch the file into our Mastodon instance using
wget
.
0. Mastodon Sitemap Generator
We will use this awesome Python script ๐
https://github.com/binfalse/mastodon-sitemap
1. ๐ Using GitHub Actions
I've forked the script and added the GitHub Action below.
โ ๏ธ Please notice: I've changed the path where the file is created (from tmp
to dist
and added this also the .py
script. So I'd recommend forking my repository if you want to use this as it is.
This GitHub Action will:
- 1๏ธโฃ Set up Python
- 2๏ธโฃ Install the 3 dependencies
- 3๏ธโฃ Run the script
- 4๏ธโฃ Move the generated
dist/sitemap.xml
into thegh-pages
branch to it's hosted publicly.
name: Build and Deploy
on:
schedule:
- cron: "0 4 * * *"
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout ๐๏ธ
uses: actions/checkout@v3
with:
persist-credentials: false
- name: Install Python
uses: actions/setup-python@v4
with:
python-version: '3.9'
cache: 'pip'
cache-dependency-path: 'mastodon-sitemap.py'
- name: Install Deps
run: pip3 install argparse Mastodon.py sitemap_python
- name: Build
run: |
python3 mastodon-sitemap.py --instance https://ricard.social \
--access-token ${{ secrets.ACCESS_TOKEN }} \
--max-urls 500 \
--overwrite \
./dist/sitemap.xml
- name: Deploy ๐
uses: JamesIves/github-pages-deploy-action@releases/v4
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
BRANCH: gh-pages
FOLDER: dist
Code language: JavaScript (javascript)
https://github.com/quicoto/mastodon-sitemap
2. โคต๏ธ Fetch the file into the instance
Google Webmaster Tools doesn't let you submit a sitemap hosted in a other domain ๐ข Otherwise we could just send the GitHub public URL of the file ๐คญ
No worries, we will add a daily job to simply download the file into our folder. Couldn't get any easier! ๐
cron job
@daily /bin/bash /home/mastodon/fetch-sitemap.sh
Code language: CSS (css)
fetch-sitemap.sh
#!/bin/bash
cd live/public
wget https://quicoto.github.io/mastodon-sitemap/sitemap.xml
Code language: PHP (php)
3. โ Done
Now the file is available under my domain and Google is starting to index my posts:
https://ricard.social/sitemap.xml
Conclusion
Sure, this is not as good as adding Elastic Search to your Mastodon instance but it can be a nice workaround if you want to search your posts without Elastic Search. Personally the less bloated my server and my instance is, the better. This way I outsource the indexing to Google and the generation to GitHub.
Once the posts are indexed you can simply run this query on Google to find what you're looking for.
site:ricard.social SOME_TEXT
Code language: CSS (css)
@ricard What would be the benefit of adding this? Indexability of a user’s mastadon’s timeline?
@remkus That’s my thought, I felt I wanted to search in my old posts. But alas, I have no Elastic Search set up.