Featured image of post Archiving Mastodon Toots to Hugo

Archiving Mastodon Toots to Hugo

PESOS - Publish Elsewhere, Syndicate (to your) Own Site

TL;DR;

Run a go script to archive your Mastodon archive to your Hugo site (source). Visit this site’s Mastodon category for an illustration.

PESOS

The rise of social media has dramatically reduced the cost of authoring and sharing information. One downside of this movement is that I’ve relinquished content ownership to each site’s admins or owners. No doubt there are benefits to this arrangement, among them being I don’t need to manage the technical bits, but it would be nice for me to maintain a copy in case of the worst. I learned this is abbreviated PESOS (publish to third party, archive to self) and while it’s less desireable than POSSE (publish to self, syndicate to third parties), it’s something I’d like to keep doing.

Taking inspiration from Ross Baker’s Mastodon to Hugo archive script, I decided to create a go version of his Scala script. It’s been a while since I wrote Scala on a regular basis and while I could read Ross’ code, I’m not confident I’d be able to make changes that preserved the pure functional aspects. Plus, I wasn’t looking forward to keeping JVM and Scala dependencies up to date for this script.

I’m sharing this for anyone interested in the PESOS philosophy. You’ll probably need to tweak the source a bit for your own use case and Hugo theme, but the code will get you most of the way there. The script will only archive public toots or self-replies.

Steps

Request Archive

The first step is to request a copy of your Mastodon archive. This will take a bit of time to create. I received an email from Hachyderm when the archive was ready for download.

Expand Archive

After the archive is created, download it locally and expand it somewhere. The script expects the ZIP archive to be previously expanded. The outbox.json file in the archive is an Activity Streams serialization and its contents drive the archive process.

Modify Templates

Toots Markdown frontmatter and content are rendered using go’s text/template package. The Execute parameter dictionary is:

templateParamMap := map[string]interface{}{
    "ExecutionTime": nowTime,
    "Toot":          eachItem,
}

The Toot object is an ActivityEntry struct with fields extracted from deserializing the outbox.json OrderedItems contents.

The templates make assumptions about my site and the theme I’m using. For example, the frontmatter template includes a hardcoded reference to a site asset:

---
title: "Mastodon - {{ .Toot.Published }}"
subtitle: ""
canonical: {{ .Toot.Object.ID }}
description:
image: "/images/mastodon.png"
...

Change both the TEMPLATE_TOOT_FRONTMATTER and TEMPLATE_TOOT to conform with your sites’ active theme.

Modify Identity Constants

There are two constants that are hardcoded as well. These values are used to determine toot visibility and reply threads.

var HOST = "hachyderm.io"
var USER = "mweagle"

If there’s interest I will update these to command line arguments.

Run

The script accepts two parameters:

  • input: Path to the root of the expanded archive. This is the directory housing outbox.json
  • output: Path to which the toots should be rendered
    • ‼️ Note that all contents in this directory will be deleted during rendering

Example:

go run mastodon_to_hugo.go \
    --input "~/Downloads/mastodon-archive" \
    --output "./blog/content/mastodon"

Output

Assuming all goes well, the script will emit some summary stats:

time=2024-03-09T23:14:11.187-08:00 level=INFO msg="Welcome to Hugodon!"
time=2024-03-09T23:14:11.267-08:00 level=INFO msg="Toots filtered" totalCount=1498 filteredCount=963
time=2024-03-09T23:14:11.383-08:00 level=INFO msg="Deleting existing directory contents" path=./blog/content/mastodon
time=2024-03-09T23:14:11.608-08:00 level=INFO msg="Publishing statistics" totalTootCount=1498 renderedTootCount=963 filteredTootCount=535 replyThreadCount=224 mediaFilesCount=43
time=2024-03-09T23:14:11.616-08:00 level=INFO msg="Toot replication complete"

and a a series of Page Bundles with your toots. Markdown files will be created in the output directory using the following template:

{year}/{monthNumber}/{lastPathComponentOfTootObjectID}

For example, using the hardcoded templates, the source entry:

{
    "id": "https://hachyderm.io/users/mweagle/statuses/111860128005683093/activity",
    "type": "Create",
    "actor": "https://hachyderm.io/users/mweagle",
    "published": "2024-02-02T05:01:37Z",
    "to": [
        "https://www.w3.org/ns/activitystreams#Public"
    ],
    "cc": [
        "https://hachyderm.io/users/mweagle/followers"
    ],
    "object": {
        "id": "https://hachyderm.io/users/mweagle/statuses/111860128005683093",
        "type": "Note",
        "summary": null,
        "inReplyTo": null,
        "published": "2024-02-02T05:01:37Z",

is rendered to mastodon/2024/02/111860128005683093.

Notes

  • All self-reply threads are appended to the primary Toot’s markdown file
  • Toot attachments are copied to the page bundle directory. Attachments can be referenced in the toot template via the ActivityObjectAttachment.BaseFilename field value
  • ActivityFeed tags include a leading # character. This is stripped from the ActivityObjectTag.Name field
  • Only Hashtag tag types are deserialized

Source

Get the source here.