QR Code contains TinyURL of this article.The Perpetual βeta Got a Search Engine

Stylised rendering of a web search field and a magnifying glass

I have used a static site generator — nanoc — to publish this website for some time now. I have enjoyed the benefits of a static website and experienced only a handful of drawbacks. However, one obstacle has bothered me for some time, and that was my inability to add a search engine to the site.

The principle reason why I couldn’t add a search engine was the result of an architectural choice I made early on. And it was this: there should be no server-side scripting. In eliminating a reliance on server scripts I’ve made the Perpetual βeta resistant to hacking (no software bugs to compromise); I have realised significant performance gains (it’s just static files all the way down); and the entire website is dependency free and highly portable (I can zip up the files and move them to another server in minutes).

It is axiomatic that a traditional search engine is reliant on software running on the server. These programs will maintain an index of the content, respond to POST and GET requests from the client, perform the search itself and generate a results page to return to the user. So how could I deploy a search engine, without having this back-end application available?1

I found my solution in a third-party JavaScript library called LUNR.2  This library takes a JSON-encoded corpus — which I have configured nanoc to produce at build time — as the source for indexing and uses a Solr-like scoring formula to generate its results.

What makes this approach so cool is that, being entirely client-side driven, searching is fast. We’re talking immediate results. With no round-trip to the server there is simply no delay in processing a search query. An additional bonus: one can still perform searches when the client is offline,3 which is great for PWA websites like the Perpetual βeta.

I have made the search engine available from the Archives page.4  The way I have implemented it is that queries/results happen in-the-page, they do not add to the browser’s history. However, the system appends a query string to the URL, making the results idempotent. Thus one can bookmark, link to and otherwise use the URLs, as you would with any other, and know that you will get the same results page back. For example: here’s a search for “amiga” and one for ”macOS”.

You might wonder if all this utility comes at a cost. Just how big is the search corpus the browser has to download and evaluate? At the time of writing, for this entire website, it’s 226KB. It’s less than 90KB when gzipped, which is equivalent to a standard, non-animated ad unit.

  1. I discounted third-party options like Google Site Search as I don’t want advertisements in the results; nor do I want to introduce unnecessary, external dependencies. Additionally, I want complete control of the output. ↩︎

  2. Via Jakub Chodounský↩︎

  3. Although the client might not have, in her local cache, all the content that the search engine results might link to. ↩︎

  4. Note: As the search engine is entirely JavaScript driven, it is not available if the user disables JavaScript. ↩︎