Last night I started to dive into creating a web-crawler for some ancient forum software that didn't have text searching. I came across crawler4j which is a really cool Java package. A couple hours in and I was able to crawl the whole site and drop the content into a database, but then what to do with it?
I started to look at options for Google-like text searching where you type into a search box and get relevant results. A lot of places pointed to using Solr to do just that. Solr looked great, and as a bonus it is built using Java with which I'm pretty familiar. However, when I started playing around with the tutorials I realized that running a server cluster so that I could search 300MB of text in a database was going to be overkill.
With my next search I came across the SQLite Full Text Search (FTS) feature so I decided to give it a try. From the sqlite3 command line I was able to do blazing fast text searching and the results were perfect. I was also impressed by the performance, but how to make that available online? It turns out that PHP can basically connect to a SQLite database out of the box and in 9 lines of code I had a RESTful API returning JSON results from my database.
<?php $dir = 'sqlite:forum.db'; $dbh = new PDO($dir) or die("cannot open database"); $stmt = $dbh->prepare('SELECT folder, post_number as post_id, time as date, subject, snippet(forumdata) as snippet FROM forumdata WHERE forumdata match ?'); $stmt->execute(array($_GET['q'])); echo json_encode($stmt->fetchAll(PDO::FETCH_ASSOC), JSON_UNESCAPED_SLASHES); ?>
Creating a front-end on top of the API, I wanted to quickly render the results onto a template. In the past I'd used handlebars.js to do just this and though about doing so again, but a bit more searching led me to try out pure.js. Wow is it easy! My first prototype went something like this:
<ul> <li><a class="subject link@href" href="#"></a><br><span class="snippet"></span></li> </ul> <script> $p('ul').autoRender($.getJSON("/search.php?q=test")); </script>
That was it. It would render my the
I'm still super-impressed by how quickly the PHP and SQLite database perform. Every query I've thrown at it (including just the letter "e") returns in under a second. And the end result worked out pretty well. I was able to throw the SQLite database file and PHP search onto a Virtual Private Server I already run on DreamHost. Even with 50 concurrent users on the site, my server isn't even really taxed.