Hi All... We have an application where a table has a field of type text that contains readable text, but some of it may have an HTML tag in it - specifically an HTML <img> tag. For example, it may be something like:
BLAH BLAH BLAH <img src='http://pics.10026.com/?src=RetrieveImage.aspx?ImageId=1234'> BLAH BLAH BLAH.
The problem we're seeing is that when searching that field using LIKE, it's returning records that do in fact satisfy the SQL query, but we'd like it not to. That is, we'd like to exclude those HTML tags from the search.
For example:
SELECT ... WHERE TextField LIKE '%retr%'
is returning records that have those HTML tags. Yes, I know SQL is only doing its job... Anyone have any ideas to exclude those tags from the LIKE search? Thanks! -- Curt
Curt, I'm not clear on if you want to filter out the rows that have ANY html tags? or you're saying that part of a single column has a mixture of data AND html and you'd want to search only the non-html part of the column?
So, if you search this string below for the word "White" you want the row, but if you search on the word "Image" you do NOT want the row?
White Horse <img src='http://pics.10026.com/?src=RetrieveImage.aspx?ImageId=1234'>
so, you want to filter out everything between the "<" and ">" ?
You could write a function that parses each line, and searches for your string...
Bruce
|||If I understand you correctly, you want the return to be:
BLAH BLAH BLAH ... BLAH BLAH BLAH
If that is a correct interpretation, it's not going to be either easy or pretty. By that I mean you will have to parse the data fields on the search which is really going to increase the time required to search. Indexes will not be used.
If this is the path you wish to take, you will need to create a User Defined Function that will take the entire field and strip out all characters between paired angle brackets.
|||
Thanks for the replies, Bruce and Arnie. I'm sorry for not making the issue very clear - believe it or not, it took me some time to figure out the wording I did manage to get down...
Using the statement SELECT * From TheTable WHERE TheField LIKE '%RETR%'
I would want the following record to be included:
BLAH RETR BLAH <img src='http://pics.10026.com/?src=RetrieveImage.aspx?ImageId=1234'> BLAH BLAH BLAH
But I would NOT want the following record included:
BLAH BLAH BLAH <img src='http://pics.10026.com/?src=RetrieveImage.aspx?ImageId=1234'> BLAH BLAH BLAH
In any records that are included, I would want the text returned as it appears - that is nothing filtered out.
You both pointed in the direction of writing a function to filter out that "<....>" data before applying a search. And yeah, that just adds to the overhead of the search... And quite frankly, I'm a bit nervous about that anyway - there's gonna be alot of these records and that LIKE just seems expensive. I've also been looking at indexing the text and using CONTAINS (that sound right?). We're also considering a sort of application-specific index of the text before we put it in the table as alot of the searchs are somewhat predictable.
|||If there is only a single tag per row, you could have a WHERE clause that includes BOTH the substring that precedes "<" and the substring that follows ">". That should not require a function.
If there are multiple "<...>" entries per row, this method would not work as desired.
Dan
|||IF, and that is a big IF in my opinion, the data is consistant and your sample correctly reflects the search value, this could work:
Code Snippet
DECLARE @.MyTable table
( RowID int IDENTITY,
Comment varchar(max)
)
INSERT INTO @.MyTable VALUES ( 'BLAH RETR BLAH <img src='http://pics.10026.com/?src='RetrieveImage.aspx?ImageId=1234''> BLAH BLAH BLAH' )
INSERT INTO @.MyTable VALUES ( 'BLAH BLAH BLAH <img src='http://pics.10026.com/?src='RetrieveImage.aspx?ImageId=1234''> BLAH BLAH BLAH' )
SELECT Comment
FROM @.MyTable
WHERE ( Comment LIKE '% RETR %'
AND Comment NOT LIKE '%=''RETR'
)
And of course, you could 'build up' the search values using parameters and constants.
This feels so 'unclean' that now I have to go take a shower...
slightly shorter version.
Code Snippet
SELECT CommentFROM @.MyTable
WHERE (Comment LIKE '%[ ]RETR[ ]%')|||This problem was born for regular expressions:
http://msdn.microsoft.com/msdnmag/issues/07/02/SQLRegex/default.aspx
you could also use charindex:
Code Snippet
SELECT *
From TheTable
where charindex('RETR', TheField)
not between charindex('<', TheField)
and charindex('>', TheField)
|||Spent far too long on this but here goes.The following allows for any number of HTML Tags in the field.
Tested with the following:
Code Snippet
SELECT * into TheTable
From(
select 'BLAH RETR BLAH <img src="RetrieveImage.aspx?ImageId=1234"> BLAH BLAH BLAH' as TheField
union all
select 'BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> BLAH BLAH BLAH'
union all
select 'BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> BLAH BLAH BLAH RETR BLAH <img src="RetrieveImage.aspx?ImageId=1234"> '
union all
select 'BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> BLAH BLAH BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> '
union all
select 'BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> BLAH BLAHRETRBLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> '
union all
select 'BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> BLAH BLAH BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> BLAH BLAH BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> BLAH BLAH BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> BLAH BLAH BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> BLAH BLAH BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> RETR BLAH BLAH BLAH BLAH <img src="RetrieveImage.aspx?ImageId=1234"> '
) as t1
create a table Numbers with a single field Num populated 1 to x where x = a number larger than the longest field you expect to work with:
Code Snippet
select * into numbers from (
select 1 as Num union all
select 2 as Num union all
select 3 as Num union all
...
select 2999 as Num union all
select 3000 as Num) as T1
Then use this query:
Code Snippet
select distinct TheField
from (
select
TheField,
case
when substring(TheField, num,1) = '>'
or num = 1
then substring(TheField, Num+1, charindex('<',TheField+'<', Num) - Num -1 )
else ''
end as c3
from TheTable, Numbers
) as T1
where c3 like '%RETR%'
No comments:
Post a Comment