Bash scripts as a static site generator

Update 2: I no longer use Bash as a static site generator, and have moved on to a custom made one. Check it out if you're interested.

Update: A day after I wrote all this, I moved over from the Python ecosystem to the Node ecosystem, since they have more web-related tools available, just in case. (Also minified everything while at it)

The underlying bash script however, hasn't changed much. I switched Markdown parsers, and implemented optimizations for HTML and CSS, also parallelized most stuff too, and it's still somewhat fast:

./ 2.11s user 0.09s system 324% cpu 0.677 total

I still have more plans to overhaul all this, but the core concepts will probably still be there.

So, if you take a peek at the footer of any of my posts now, you'll see a notice saying that this site is "Proudly built with Bash".

Why Bash, and not any of the countless static site generators out there?

So, now that we've covered why, let's cover how.

First of all, the latest version of the build script is available here in case you want to follow along.

The first thing we need is something to convert Markdown to HTML. I only used Python-Markdown here because that's what I used originally. If there's a faster, more portable, or otherwise desirable alternative, let me know!

The core functionality of the script is to iterate over all Markdown files in the directory, and convert them into HTML. All the rest is extra decoration.

First of all, we'll need a common header and footer, to allow for both valid HTML, and a common stylesheet. This is trivial via cat.

Then, ideally we'd need to be able to have the post title on the HTML <title> attribute, which requires:

For the first one, while it's trivial via something like sed, that isn't the option I chose, since I'll need more flexibility in a second.

For the second one, I just trust myself to be consistent, and always follow a specific format with my posts, mainly, having the first line of all posts be a level 1 header, with a space between the hash and the actual title.

Thankfully, markdownlint annoys me to follow that formatting (and more) in my posts.

Now, we have an HTML representation of our Markdown text, within a common template to make everything fit in, now what?

For my previous way of doing things, that would be it. I'd push the HTML, and share the link where I felt appropriate. That however, has some downsides, like not being discoverable at all.

If you didn't know the link to the post, or the repo, you'd have no idea that post ever existed, even if you were on another post on the same site!

Now, I still don't do much effort on trying to be "discoverable" (e.g recommend my other posts on a single post's page), but at least now I have an index with a list of posts, which is better than not having one.

Also, slight tangent; my old index page was awful. It loaded one webfont (Raleway via Google Fonts) and Font Awesome via JS (ugh...), just to show what my current index shows above the post list.

One webfont and one script, for nothing but four links. :man_facepalming:

Anyway, let's move on from the old.

Now, remember how I said we'd need more flexibility than sed? This is where that comes into play. To show a list of posts in the index, we'd need to be able to loop over a list, which IIRC sed alone can't do.

I decided to just search for template engines in Bash, and found esh, which allows us to template HTML with Bash. That's exactly what I needed.

Now, to link to a post, we need the file name, and the post title. That's the perfect use case for dictionaries/maps, right? Well, this is Bash, so of course it will be harder than necessary.

Also, ideally we should be able to sort all posts by creation date (newest first)

Now, we could use Bash's associated arrays, but we cannot reverse the loop to be able to achieve newest first sort on render, and associated arrays don't keep their order, so we cannot just reverse the file loop and just iterate over it regularly on render. Instead, we still reverse the file loop, but use a regular array with semicolon seperated key and value pairs. That also has the potential ability to allow for extra data (Just add more semicolons!) in case there's a need for it, but I haven't needed anything extra so far.

Also, this sorting is based on the file name, so that's where the numbers in the Markdown files in the Git repo come from. File names are sorted from oldest to newest, and we reverse that to achieve newest to oldest sort within our index.

Lessons Learned