Since the site’s inception, we’ve been massing large amounts of content on which millions of people have come to depend. We have numerous ways of getting to the content, but the quickest and easiest way to find specific information is to search for it.

AnandTech Search 1.0 (ColdFusion Verity)

The first version of the site used a search server included with ColdFusion named “Verity”. Most people have heard of Verity; they are one of the industry leaders in enterprise search software. The version of Verity that was included with ColdFusion back then was a light version of the full-blown Verity Search server. Although it did quite well at locating content via Boolean searches, it lacked flexibility and wasn’t all that of a performant.

AnandTech Search 2.0 (Microsoft FullText Search)

After we migrated to Microsoft SQL Server, we decided to use the Full Text search that is built-in to SQL Server. SQL Server Full Text came to be in version 7.0, and allows you to create catalogs that can contain multiple indexes on text column types. You can then configure Full Text to index the data in the background, or perform one time or scheduled indexing of the data.

There are, however, a couple of caveats with Microsoft Full Text search. The first is that it throws errors when your search criteria contain “noise words”. By default, Full Text search is configured with a list of “noise words”. Microsoft (and many other search engines) consider words like “because,been,before,being,between,both,but,by” to be common words that should not be contained in an index. Of course, you can trap this error easily in your application, but realistically, the search engine should just filter the words out of the search phrase itself.

The second and more important issue is how Full Text handles acronyms and numerical values in search strings. We never really did get to the bottom of the problem, but even with all of the noise words removed from Full Text, certain search phrases that contained acronym and numerical data wouldn’t return results. Since our data is full of technical acronyms and numerical model numbers, this was a major issue for us.

Along came Google
Comments Locked

48 Comments

View All Comments

  • Calin - Tuesday, September 6, 2005 - link

    Pentium III processors are still offered in those 1U servers. The reason would be (probably) cheap price for good performance and lower thermal load than any other competitive Intel processors.
    Low thermal load helps a lot for dual processors servers.
  • flatblastard - Tuesday, September 6, 2005 - link

    Not only a p3 mobo, but PC133 ram labeled for DELL!?!? I guess Google is buying up all the old junk and putting it to good use.
  • bhtooefr - Tuesday, September 6, 2005 - link

    Heck, I've got a Dell PowerEdge 350 (1U, single 850 P3, i440BX chipset) sitting in front of me, and the RAM's not even labelled Dell...
  • flatblastard - Tuesday, September 6, 2005 - link

    making a fortune in the process....
  • brownba - Tuesday, September 6, 2005 - link

    Ok, so I tested it with this query:
    google mini search server
    - it came back with 18700 useless results.
    I also tried the title of the article:
    anandtech search goes google
    - 712 useless results

    how long does it take to crawl?
  • glennpratt - Tuesday, September 6, 2005 - link

    I'm guessing jason clark meant to reply to you.
  • Rock Hydra - Tuesday, September 6, 2005 - link

    I like it. I tested it out and got the returns I was expecting. Very Google-y style. Now if they implemented something this well into the forums search....maybe another day.
  • TheInvincibleMustard - Tuesday, September 6, 2005 - link

    So, naturally, I searched for iram ... returned zero results, but it did suggest i-ram as a possibility. So I clicked that link ...
    doh ... "I" is a very common word and so was not included in my search, meaning all I actually wound up searching for is "RAM" (of which there was several thousand entires, and not one of the top few was actually about the I-RAM product), so perhaps a bit more tweaking is in order ;-)

    Granted, though, the search did only take 0.02 seconds! :-D
  • dvinnen - Tuesday, September 6, 2005 - link

    I was playing around with it to. Did the i-ram search also, the first artical presented was an artical from 1997 about memory terms (think EDO). The actuall i-ram artical was actually the forth result presented. Hell, just a google search gives it as the 4th link in all the internets. Defently could use some tweaking (give added weight based on the date of the artical?) but looks to be a step up from the useless one built into SQL server you were useing.
  • glennpratt - Tuesday, September 6, 2005 - link

    Did you read my post?

    Did you click on the link that says 'i is a common word and was excluded' in the search? That would have given you a couple of choices on what to do to fix it.

Log in

Don't have an account? Sign up now