Hello everyone,
My company is in the middle of coding a complex search function which
takes over most of the features that full text indexing contains (such
as stemming and stopping).
However from reading about full text indexing it appears that for a
column containing rows of pure text documents, the way that it indexes
the text is as follows: -
word id of row where word can be found
### #################################
car 0, 10, 256, 654
bike 20, 36, 92
skates 65,. 42
If this is the case I can see this improving search speeds for a large
database.
However we don't want any of the other features that full text
catalogue has as we do it ourselves.
My question is can we turn off all the extra features of full text
indexing (so we only left with a text index), or is a normal index
sufficient to provide the above example?
Many thanks for any help or advice
Kind Regards
Philip
With the Contains/ContainsTable keywords you get a strict match which it
sounds like what you are looking for. In otherwords stemming, and
wildcarding are disabled.
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Phi!" <philgoogle2003@.yahoo.com> wrote in message
news:42d3fa28.0408110112.15962c0@.posting.google.co m...
> Hello everyone,
> My company is in the middle of coding a complex search function which
> takes over most of the features that full text indexing contains (such
> as stemming and stopping).
> However from reading about full text indexing it appears that for a
> column containing rows of pure text documents, the way that it indexes
> the text is as follows: -
> word id of row where word can be found
> ### #################################
> car 0, 10, 256, 654
> bike 20, 36, 92
> skates 65,. 42
> If this is the case I can see this improving search speeds for a large
> database.
> However we don't want any of the other features that full text
> catalogue has as we do it ourselves.
> My question is can we turn off all the extra features of full text
> indexing (so we only left with a text index), or is a normal index
> sufficient to provide the above example?
> Many thanks for any help or advice
> Kind Regards
> Philip
|||Hello Hilary,
Thankyou for you help, just to make sure, when I enable full text
catalogs, does it index the text without stemming and stopping the
data.
By this i mean that is I had stored in the table "to be or not to be",
would the full text catalog contain indexes for to, be, or and not or
would it stem and stop the text.
Again thankyou for your help
Kind Regards
Phil
"Hilary Cotter" <hilaryk@.att.net> wrote in message news:<#XtB1t5fEHA.1972@.TK2MSFTNGP09.phx.gbl>...[vbcol=seagreen]
> With the Contains/ContainsTable keywords you get a strict match which it
> sounds like what you are looking for. In otherwords stemming, and
> wildcarding are disabled.
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
> "Phi!" <philgoogle2003@.yahoo.com> wrote in message
> news:42d3fa28.0408110112.15962c0@.posting.google.co m...
|||The answer is complex. First off to,be, or, and not are all noise words and
as such they would not be indexed.
But to answer your question that depends on the word breaker. In general
the words would be indexed as they appear in the content. For some word
breakers different language rules dicate different indexing patterns, for
instance in the French word breaker, marie-claire is indexed as two
different words, marie and claire, however Marie-Claire is indexed as one
word (MarieClaire). The hyphen and the capitalization cause it to be indexed
as one word not two.
It is at query time when the search arguements might will be stemmed (when
you are doing a FreeText query, or an Inflectional query). If you are doing
a Contains query without wildcarding or the FormsOf(Inflectional type
queries you won't get stemming (although there are word breaker specific
some exceptions).
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html
"Phi!" <philgoogle2003@.yahoo.com> wrote in message
news:42d3fa28.0408120050.2b0d4d81@.posting.google.c om...
> Hello Hilary,
> Thankyou for you help, just to make sure, when I enable full text
> catalogs, does it index the text without stemming and stopping the
> data.
> By this i mean that is I had stored in the table "to be or not to be",
> would the full text catalog contain indexes for to, be, or and not or
> would it stem and stop the text.
> Again thankyou for your help
> Kind Regards
> Phil
> "Hilary Cotter" <hilaryk@.att.net> wrote in message
news:<#XtB1t5fEHA.1972@.TK2MSFTNGP09.phx.gbl>...[vbcol=seagreen]
|||Phi!
In addition to what Hilary says, there is another factor here and that is
the OS platform (Win2K vs. Win2003 or WinXP) that you have SQL Server
installed on. Specifically, the Windows 2000 Server (Win2K) wordbreaker -
infosoft.dll - indexes the "-" (dash or hyphen) in "7-UP" as one token,
i.e., as single phrase. A work around for this is to drop and re-create your
FT Catalog and use the Neutral "Language for Word Breaker" for the your
FT-enabled column. However, with the Neutral "Language for Word Breaker",
you will lose the formsof(inflectional) function as the words are "broken"
into tokens based upon the "white space" between words...
However, this is not the case with Windows Server 2003 (Win2003) or Windows
XP (WinXP) as these OS-platforms, ships with a newer (or better, i.e., more
expectant results) wordbreaker - langwrbk.dll which would correctly (or more
expectant results) break 7 and UP into separate tokens. So, in the long run
to get both the correct wordbreaking for you as well as the use of the
formsof(inflectional) function, if this is the functionality you're looking
for, you should consider upgrading to Win2003...
Regards,
John
"Hilary Cotter" <hilaryk@.att.net> wrote in message
news:u4tYhaGgEHA.1188@.TK2MSFTNGP11.phx.gbl...
> The answer is complex. First off to,be, or, and not are all noise words
and
> as such they would not be indexed.
> But to answer your question that depends on the word breaker. In general
> the words would be indexed as they appear in the content. For some word
> breakers different language rules dicate different indexing patterns, for
> instance in the French word breaker, marie-claire is indexed as two
> different words, marie and claire, however Marie-Claire is indexed as one
> word (MarieClaire). The hyphen and the capitalization cause it to be
indexed
> as one word not two.
> It is at query time when the search arguements might will be stemmed (when
> you are doing a FreeText query, or an Inflectional query). If you are
doing[vbcol=seagreen]
> a Contains query without wildcarding or the FormsOf(Inflectional type
> queries you won't get stemming (although there are word breaker specific
> some exceptions).
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
> "Phi!" <philgoogle2003@.yahoo.com> wrote in message
> news:42d3fa28.0408120050.2b0d4d81@.posting.google.c om...
> news:<#XtB1t5fEHA.1972@.TK2MSFTNGP09.phx.gbl>...
it[vbcol=seagreen]
which[vbcol=seagreen]
(such[vbcol=seagreen]
indexes[vbcol=seagreen]
large
>
|||Hello Hillary,
AFter much testing we have decided to go as follows, we will use Full
Text catalogs and search using the contains keyword.
However to ensure SQL 2000 server does not do anything strange with
the results we have turned the lnaguage setting to neutral. We found
the results were not returned when we had the UK setting on. i.e.
"jpg" would not find "filename.jpg".
For wildcards we will use the LIKEkeyword as the contains does not do
patterm matching with wilcards but tries to find differenet words
instead.
Therefore we hope we have the best of both world, by the using the
contains in most cases but the the LIKE keyword whenever anyone needs
pattern matching (which should be rare).
Again thankyou for your hillary in helping me understanding full text
catlogs
Have a good weekend
Phil
"Hilary Cotter" <hilaryk@.att.net> wrote in message news:<u4tYhaGgEHA.1188@.TK2MSFTNGP11.phx.gbl>...[vbcol=seagreen]
> The answer is complex. First off to,be, or, and not are all noise words and
> as such they would not be indexed.
> But to answer your question that depends on the word breaker. In general
> the words would be indexed as they appear in the content. For some word
> breakers different language rules dicate different indexing patterns, for
> instance in the French word breaker, marie-claire is indexed as two
> different words, marie and claire, however Marie-Claire is indexed as one
> word (MarieClaire). The hyphen and the capitalization cause it to be indexed
> as one word not two.
> It is at query time when the search arguements might will be stemmed (when
> you are doing a FreeText query, or an Inflectional query). If you are doing
> a Contains query without wildcarding or the FormsOf(Inflectional type
> queries you won't get stemming (although there are word breaker specific
> some exceptions).
> --
> Hilary Cotter
> Looking for a book on SQL Server replication?
> http://www.nwsu.com/0974973602.html
>
> "Phi!" <philgoogle2003@.yahoo.com> wrote in message
> news:42d3fa28.0408120050.2b0d4d81@.posting.google.c om...
> news:<#XtB1t5fEHA.1972@.TK2MSFTNGP09.phx.gbl>...
|||Phi!,
Just FYI, if you were able to read my posting to this thread, you would of
understood why "the results were not returned when we had the UK setting on.
i.e. "jpg" would not find 'filename.jpg'" as it is a bug in the Win2K
wordbreaker dll and if in the future you decide to upgrade to Windows Server
2003 (Win2003) you would not of encountered this bug.
Regards,
John
"Phi!" <philgoogle2003@.yahoo.com> wrote in message
news:42d3fa28.0408130807.3fa93afe@.posting.google.c om...
> Hello Hillary,
> AFter much testing we have decided to go as follows, we will use Full
> Text catalogs and search using the contains keyword.
> However to ensure SQL 2000 server does not do anything strange with
> the results we have turned the lnaguage setting to neutral. We found
> the results were not returned when we had the UK setting on. i.e.
> "jpg" would not find "filename.jpg".
> For wildcards we will use the LIKEkeyword as the contains does not do
> patterm matching with wilcards but tries to find differenet words
> instead.
> Therefore we hope we have the best of both world, by the using the
> contains in most cases but the the LIKE keyword whenever anyone needs
> pattern matching (which should be rare).
> Again thankyou for your hillary in helping me understanding full text
> catlogs
> Have a good weekend
> Phil
>
> "Hilary Cotter" <hilaryk@.att.net> wrote in message
news:<u4tYhaGgEHA.1188@.TK2MSFTNGP11.phx.gbl>...[vbcol=seagreen]
and[vbcol=seagreen]
general[vbcol=seagreen]
for[vbcol=seagreen]
one[vbcol=seagreen]
indexed[vbcol=seagreen]
(when[vbcol=seagreen]
doing[vbcol=seagreen]
which it[vbcol=seagreen]
which[vbcol=seagreen]
(such[vbcol=seagreen]
a[vbcol=seagreen]
indexes[vbcol=seagreen]
large[vbcol=seagreen]
|||Hi John!
Thankyou for the tip, will bear it in mind when we upgrade.
I am having trouble getting an newsgroup reader on my desktop at work
(compnay ploicy etc) so have been limited to google at the moment so
did not know you had posted until today!
sorry for any confusion
Phil
"John Kane" <jt-kane@.comcast.net> wrote in message news:<O4zBxiagEHA.3632@.TK2MSFTNGP09.phx.gbl>...[vbcol=seagreen]
> Phi!,
> Just FYI, if you were able to read my posting to this thread, you would of
> understood why "the results were not returned when we had the UK setting on.
> i.e. "jpg" would not find 'filename.jpg'" as it is a bug in the Win2K
> wordbreaker dll and if in the future you decide to upgrade to Windows Server
> 2003 (Win2003) you would not of encountered this bug.
> Regards,
> John
>
>
> "Phi!" <philgoogle2003@.yahoo.com> wrote in message
> news:42d3fa28.0408130807.3fa93afe@.posting.google.c om...
> news:<u4tYhaGgEHA.1188@.TK2MSFTNGP11.phx.gbl>...
> and
> general
> for
> one
> indexed
> (when
> doing
> news:<#XtB1t5fEHA.1972@.TK2MSFTNGP09.phx.gbl>...
> which it
> which
> (such
> a
> indexes
> large
Friday, March 23, 2012
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment