TL;DR;
Run a go script to archive your Mastodon archive to your Hugo site (source). Visit this site’s Mastodon category for an illustration.
PESOS
The rise of social media has dramatically reduced the cost of authoring and sharing information. One downside of this movement is that I’ve relinquished content ownership to each site’s admins or owners. No doubt there are benefits to this arrangement, among them being I don’t need to manage the technical bits, but it would be nice for me to maintain a copy in case of the worst. I learned this is abbreviated PESOS (publish to third party, archive to self) and while it’s less desireable than POSSE (publish to self, syndicate to third parties), it’s something I’d like to keep doing.
Taking inspiration from Ross Baker’s Mastodon to Hugo
archive script, I decided to create a go
version of his
Scala script. It’s been a
while since I wrote Scala on a regular basis and while I could read Ross’ code, I’m not confident I’d be able to
make changes that preserved the pure functional aspects. Plus, I wasn’t looking forward to keeping JVM and
Scala dependencies up to date for this script.
I’m sharing this for anyone interested in the PESOS philosophy. You’ll probably need to tweak the source a bit for your own use case and Hugo theme, but the code will get you most of the way there. The script will only archive public toots or self-replies.
Steps
Request Archive
The first step is to request a copy of your Mastodon archive. This will take a bit of time to create. I received an email from Hachyderm when the archive was ready for download.
Expand Archive
After the archive is created, download it locally and expand it somewhere. The script expects the ZIP archive to be previously expanded. The outbox.json file in the archive is an Activity Streams serialization and its contents drive the archive process.
Modify Templates
Toots Markdown frontmatter and content are rendered using go’s text/template package. The Execute parameter dictionary is:
templateParamMap := map[string]interface{}{
"ExecutionTime": nowTime,
"Toot": eachItem,
}
The Toot object is an ActivityEntry struct with fields extracted from deserializing the outbox.json OrderedItems
contents.
The templates make assumptions about my site and the theme I’m using. For example, the frontmatter template includes a hardcoded reference to a site asset:
---
title: "Mastodon - {{ .Toot.Published }}"
subtitle: ""
canonical: {{ .Toot.Object.ID }}
description:
image: "/images/mastodon.png"
...
Change both the TEMPLATE_TOOT_FRONTMATTER
and TEMPLATE_TOOT
to conform with your
sites’ active theme.
Modify Identity Constants
There are two constants that are hardcoded as well. These values are used to determine toot visibility and reply threads.
var HOST = "hachyderm.io"
var USER = "mweagle"
If there’s interest I will update these to command line arguments.
Run
The script accepts two parameters:
- input: Path to the root of the expanded archive. This is the directory housing outbox.json
- output: Path to which the toots should be rendered
- ‼️ Note that all contents in this directory will be deleted during rendering
Example:
go run mastodon_to_hugo.go \
--input "~/Downloads/mastodon-archive" \
--output "./blog/content/mastodon"
Output
Assuming all goes well, the script will emit some summary stats:
time=2024-03-09T23:14:11.187-08:00 level=INFO msg="Welcome to Hugodon!"
time=2024-03-09T23:14:11.267-08:00 level=INFO msg="Toots filtered" totalCount=1498 filteredCount=963
time=2024-03-09T23:14:11.383-08:00 level=INFO msg="Deleting existing directory contents" path=./blog/content/mastodon
time=2024-03-09T23:14:11.608-08:00 level=INFO msg="Publishing statistics" totalTootCount=1498 renderedTootCount=963 filteredTootCount=535 replyThreadCount=224 mediaFilesCount=43
time=2024-03-09T23:14:11.616-08:00 level=INFO msg="Toot replication complete"
and a a series of Page Bundles with your toots. Markdown files will be created in the output directory using the following template:
{year}/{monthNumber}/{lastPathComponentOfTootObjectID}
For example, using the hardcoded templates, the source entry:
{
"id": "https://hachyderm.io/users/mweagle/statuses/111860128005683093/activity",
"type": "Create",
"actor": "https://hachyderm.io/users/mweagle",
"published": "2024-02-02T05:01:37Z",
"to": [
"https://www.w3.org/ns/activitystreams#Public"
],
"cc": [
"https://hachyderm.io/users/mweagle/followers"
],
"object": {
"id": "https://hachyderm.io/users/mweagle/statuses/111860128005683093",
"type": "Note",
"summary": null,
"inReplyTo": null,
"published": "2024-02-02T05:01:37Z",
is rendered to mastodon/2024/02/111860128005683093.
Notes
- All self-reply threads are appended to the primary Toot’s markdown file
- Toot attachments are copied to the page bundle directory. Attachments can be referenced in the
toot template via the
ActivityObjectAttachment.BaseFilename
field value - ActivityFeed tags include a leading
#
character. This is stripped from theActivityObjectTag.Name
field - Only
Hashtag
tag types are deserialized
Source
Get the source here.