Posted by Steve on Tue 16 Aug 2005 at 21:31
When you have a dynamic website with all the content loaded from a database backend you're most likely going to wish visitors to be able to search it. Using the builtin MySQL FULLTEXT indexes makes this very simple.
Many websites, including this one, are implemented by having their contents stored in a database and rendered upon demand by visiting clients. There are several benefits of this approach:
(Of course there are drawbacks with dynamic websites; if you're not careful a sudden inrush of visitors to your site can cause it to become overloaded and fail - a so-called Slashdotting. We've survived two ;)
When it comes to searching such a site it's tempting to construct your own SQL statements, in a fashion similar to this:
select * from articles WHERE body LIKE '%blah%' ;
However if you start down this path you quickly begin to realise that you need to handle more complex cases, such as logical operators, finding items which contain some, or all, of the given terms for example.
Instead of implement the logical searches yourself it is better to leaverage the power of the database and allow it to do the work for you.
This is where the MySQL FULLTEXT search comes into play. Rather than constructing your SQL search string and executing it by hand you can allow the database to do the hard work.
To do this you must first add index columns upon your tables, either when you create them - or by adding them later.
Given the table:
CREATE TABLE `articles` ( `id` int(11) NOT NULL default '0', `title` varchar(65) default NULL, `topic` varchar(25) NOT NULL default '', `author` varchar(25) NOT NULL default '', `ondate` datetime NOT NULL default '0000-00-00 00:00:00', `body` text NOT NULL, );
You may add an index upon it with the following query:
ALTER TABLE articles ADD FULLTEXT(title,body);
This will create an index upon the two columns "title" and "body", which can later be used to perform searches.
By default the indexes you create will only include words of between four and twenty characters in length, this is something that you can adjust if you need to. You should try to think of the search terms your visitors are liable to use and adjust your limitations accordingly. A mimimum search term of four-letters would not allow visitors to search for such terms as "SSH", or "CVS" which would be a problem upon sites like this!
To alter the minimum word length you should adjust the file /etc/mysql/my.cnf. In the "[mysqld]" section add the following:
# Full Text Minimum Word Length ft_min_word_len = 3
Once you've restarted your database server with "/etc/init.d/mysql restart" you should be able to index and search against smaller words.
Once you have the fulltext indexes using them to search against is pretty simple, although it is important to note that you must use the indexes you've created exactly.
For example if you have an index upon both the title and body fields of a table then you must search against both of those - searching on one alone is not possible.
A basic search, using the table structure we used earlier, would look like this:
SELECT id,title FROM articles WHERE MATCH (title,body) AGAINST('cvs');
This gives :
+-----+--------------------------------------------------------------+ | id | title | +-----+--------------------------------------------------------------+ | 186 | Permitting anonymous read-only access to your CVS repository | ... | 131 | Debian Admininistration Site Update | +-----+--------------------------------------------------------------+
As you can see we've successfully managed to perform the search; the two results both contain the term CVS - one in the title, and one presumably in the body.
Using the BOOLEAN search mode you can also allow visitors to construct more advanced queries for example:
| -term | Make sure that the term mentioned doesn't appear. |
| +term | Make sure that the term mentioned does appears. |
As an example of this we'll search for text that contains the word "CVS", but doesn't contain "ssh":
select id,title FROM articles WHERE
MATCH(title,body) AGAINST ("+cvs -ssh" IN BOOLEAN MODE);
The results again look as expected:
+-----+--------------------------------------------------------------+ | id | title | +-----+--------------------------------------------------------------+ | 74 | This site's code is now available | ... | 186 | Permitting anonymous read-only access to your CVS repository | | 194 | Generating a ChangeLog from CVS history | +-----+--------------------------------------------------------------+
Finally one last thing that you might wish to do is view a "score" of the matching results, which you can use to order your results by relevence.
Using the following query allows you to view the score as well as the results, note that the search terms are included twice:
SELECT id,title,MATCH (title,body) AGAINST ('cvs') AS score
FROM articles WHERE MATCH (title,body) AGAINST ('cvs')
ORDER by score DESC
The resulting score can be used to display to clients :
+-----+----------------------------------------------------+-----------------+ | id | title | score | +-----+----------------------------------------------------+-----------------+ | 72 | Setting up a secure CVS server with OpenSSH | 4.2276616096497 | | 194 | Generating a ChangeLog from CVS history | 3.7725563049316 | +-----+----------------------------------------------------+-----------------+
This article can be found online at the Debian Administration website at the following bookmarkable URL:
This article is copyright 2005 Steve - please ask for permission to republish or translate.