Sunday, February 26, 2012

False hits with Contains query

I'lll try this question here too, since I coudn't get an answer in the T-SQL forum

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=166991&SiteID=1

I cant figure out why a query returns false results.
It is basically:
select *
from sometable
where somecolumn=1
and contains(someothercolumn, 'arti-b')
This query takes forever to process and returns about 100 false hits for every row where someothercolumn actually contains the string 'arti-b'.
Note, I don't get the full set of rows where somecolumn=1, so there is some filtering from the contains clause.
If I use the same query, searching for just 'arti', it works fine. 'whatever-b' seems to work as well, as long as 'whatever' != 'arti'.
Does anyone know what causes this?
SQL server Enterprise edition
version 8 (SP4)
Language: US english
Collation: Finnish_swedish.

Full-text search is a word-based tool for natural language searches. It's not designed to handle punctuation.

As suggested in http://support.microsoft.com/kb/200043/EN-US/:

Where non-alphanumeric character must be used in the search critera (primarily the dash '-' character), use the Transact-SQL LIKE clause instead of the FULLTEXT or CONTAINS predicates.

In other words, use this query:

select * from sometable
where somecolumn=1
and someothercolumn like '%arti-b%'

Steve Kass
Drew University
|||

I see. I never found that KB article.

I don't suppose this behaviour of the full text search engine can be configured somehow?

A LIKE-query is not an option for me. Someothercolumn is text and there are over a million rows in the sometable.

How does the search engine handle hyphens? Is it better to filter out any such strings or is it possible to get something acceptable by querying for CONTAINS(someothertable, "arti b") or perhaps "arti*" ?

/Gustav

No comments:

Post a Comment