Last night I started to dive into creating a web-crawler for some ancient forum software that didn't have text searching. I came across crawler4j which is a really cool Java package. A couple hours in and I was able to crawl the whole site and drop the content into a database, but then what to do with it?

I started to look at options for Google-like text searching where you type into a search box and get relevant results. A lot of places pointed to using Solr to do just that. Solr looked great, and as a bonus it is built using Java with which I'm pretty familiar. However, when I started playing around with the tutorials I realized that running a server cluster so that I could search 300MB of text in a database was going to be overkill.

With my next search I came across the SQLite Full Text Search (FTS) feature so I decided to give it a try. From the sqlite3 command line I was able to do blazing fast text searching and the results were perfect. I was also impressed by the performance, but how to make that available online? It turns out that PHP can basically connect to a SQLite database out of the box and in 9 lines of code I had a RESTful API returning JSON results from my database.

<?php
$dir = 'sqlite:forum.db';
 
$dbh = new PDO($dir) or die("cannot open database");
$stmt = $dbh->prepare('SELECT folder, post_number as post_id, time as date, subject, snippet(forumdata) as snippet FROM forumdata WHERE forumdata match ?');
$stmt->execute(array($_GET['q']));
echo json_encode($stmt->fetchAll(PDO::FETCH_ASSOC), JSON_UNESCAPED_SLASHES);
?>

Creating a front-end on top of the API, I wanted to quickly render the results onto a template. In the past I'd used handlebars.js to do just this and though about doing so again, but a bit more searching led me to try out pure.js. Wow is it easy! My first prototype went something like this:

<ul>
  <li><a class="subject link@href" href="#"></a><br><span class="snippet"></span></li>
</ul>

<script>
$p('ul').autoRender($.getJSON("/search.php?q=test"));
</script>

That was it. It would render my the subject, link, and snippet from the JSON array and into a list of link and Google-style snippet results. The entire front-end between HTML, JavaScript and PHP was less than 20 lines of code.

I'm still super-impressed by how quickly the PHP and SQLite database perform. Every query I've thrown at it (including just the letter "e") returns in under a second. And the end result worked out pretty well. I was able to throw the SQLite database file and PHP search onto a Virtual Private Server I already run on DreamHost. Even with 50 concurrent users on the site, my server isn't even really taxed.

Feel free to take a look at the result of my musings on https://ry4n.pw/drforumsearch/ and see the the source code on github.