Outside the Cube (31)
Web publishing (134)
about DelphiFAQ (14)
perl CGI (65)
Web Hosting (270)
How to update a htdig search engine database
This article has not been rated yet. After reading, feel free to leave comments and rate it.
Question:I use ht/dig from www.htdig.org to provide a search function on our web site. Every now and then, new pages appear on our site or existing ones get updated. How can the search engine's database get updated?
Answer:You need to run the script rundig after each significant change. You can also add this command in your crontab table and schedule it for daily execution.
If you simply run rundig it will visit all pages it can find from the start page and rebuild the database completely. This process is called 'crawling' and 'indexing'.
The downside is that during this crawling / indexing your database is not available for search and users of your web site cannot use the search function.
The solution is parameter '-a' for the rundig script. This parameter makes rundig use alternate work files during the crawling and indexing (the alternate work files have an additional extension .work - your file list in the /htdig/db folder will temporarily look like this:
Basically, a second copy of the database is built. This keeps the original files to be used by htsearch. After htdig and htmerge are done building the .work database files, rundig will move them into place, replacing the original files.
Read "How do I set up a cron job?" to see how to schedule rundig -a for daily execution.