“Hypothesis. We ended up wanting to scale workers up because we were getting a lot of stuck workers due to file system issues. Then when things resolved, we actually had too many workers hitting the database all at once, then we got too much database contention which locked up those workers, leading us to reduce workers, causing a vicious cycle depending on which was misbehaving more, postgres or NFS.”
https://hazelweakly.me/blog/scaling-mastodon/