Wednesday, March 7, 2012

Fast way to find similar posts

Hello!

I have several posts and one column looks like this:

"0001.11010.101101.00111000.000000000000.011"

I want to find all posts where there are zero or at least one matching 1 in the group.
Each group i separated by .

If we find similar posts to the one above we can say that it will find those for example:

0001 - true since 1 is on the right spot
01000 - true since at least one 1 is on the right spot
000100 - true since at least one 1 is on the right spot
00001000 - true since at least one 1 is on the right spot
000010000000 - true since the whole group is 0 in the string above.
001 - true since at least one 1 is on the right spot

Right now I am creating my query dynamic and use substring() to find the right posts. for the string above my query right now would look like this:

select
t1.rum_profil_id
from
rum_profil t1
where
(substring(t1.nmask,4,1)='1')
and
(substring(t1.nmask,6,1)='1' or substring(t1.nmask,7,1)='1' or substring(t1.nmask,9,1)='1')
and
(substring(t1.nmask,12,1)='1' or substring(t1.nmask,14,1)='1' or substring(t1.nmask,15,1)='1' or substring(t1.nmask,17,1)='1')
and
(substring(t1.nmask,21,1)='1' or substring(t1.nmask,22,1)='1' or substring(t1.nmask,23,1)='1')
and
(substring(t1.nmask,42,1)='1' or substring(t1.nmask,43,1)='1')

Does anyone know of a better way to do this?about the only alternative I could think of would be to store each part of the mask on it's own attribute as an int. you could then use or "|" to test for a pattern inclusion.

you could also use a function to test for a pattern...

drop table #Tmp
CREATE FUNCTION CompareCharPattern (
@.TestPattern varchar(255)
, @.PatternToTest varchar(255))
RETURNS int
AS
BEGIN
declare @.len int, @.Ans int, @.pos int
select @.Ans = 0
, @.len = len(@.TestPattern)
, @.pos = 1
while @.pos <= @.len begin
if substring(@.TestPattern,@.pos,1) = substring(@.PatternToTest,@.pos,1)
set @.Ans = 1
set @.pos = @.pos + 1
end
return @.Ans
END
GO

create table #tmp(RowID int identity,nmask varchar(50))
insert into #tmp (nmask) values('0001.00001.000001.00000001.000000000000.01 1')
insert into #tmp (nmask) values('0001.00100.000101.00000010.000000010000.01 0')
insert into #tmp (nmask) values('0001.00010.001000.00000100.000000001000.10 0')
insert into #tmp (nmask) values('0001.11010.101101.00111000.000000000000.01 1')
go

declare @.TestValue varchar(50)
set @.TestValue = '0001.11010.101101.00111000.000000000000.011'
select *
From #Tmp t1
where dbo.CompareCharPattern(substring(t1.nmask, 1, 4),substring(@.TestValue, 1, 4)) = 1
and dbo.CompareCharPattern(substring(t1.nmask, 6, 5),substring(@.TestValue, 6, 5)) = 1
and dbo.CompareCharPattern(substring(t1.nmask,12, 6),substring(@.TestValue,12, 6)) = 1
and dbo.CompareCharPattern(substring(t1.nmask,21,12),s ubstring(@.TestValue,21,12)) = 1
and dbo.CompareCharPattern(substring(t1.nmask,42, 3),substring(@.TestValue,42, 3)) = 1
go|||How could I use the bitwise OR?

Lets say I make columns of each group. Then the first column would look like this:

0001 = 1

But the first column could also look like this.

0101 = 5

To compare there can be either

0101
0001
0100

anyone would be accepted.

but

0011 = 3 shold not be accepted.|||I don't follow your logic.

In your first post you stated that a matching 1 in any position would = a match. In addition you stated that if the matched pattern were all 0s that would also be a match, I neglected to have this test in my posted function but it would be easy to add.|||Not really in any position.

It has to be on the same spot. Let me show you.

If out pattern is: "0101"

then it will find those:

0100 - Matches at least one 1 on the correct positions
0001 - Matches at least one 1 on the correct positions
0101 - Matches two 1 on the correct positions
1111 - Matches two 1 on the correct, but fails two.

These one will fail:

1000 - Doesnt match at all
1010 - Doesnt match at all
0000 - Doesnt match at all

Following now?|||Yup clear.

CREATE FUNCTION CompareCharPattern (
@.TestPattern varchar(255)
, @.PatternToTest varchar(255))
RETURNS int
AS
BEGIN
declare @.len int, @.Ans int, @.pos int
select @.Ans = 1
, @.len = len(@.TestPattern)
, @.pos = 1
while @.pos <= @.len begin
if substring(@.TestPattern,@.pos,1) = 0 and substring(@.PatternToTest,@.pos,1) = 1
select @.Ans = 0
, @.pos = @.len

set @.pos = @.pos + 1
end
return @.Ans
END
GO

create table #tmp(RowID int identity,nmask varchar(50))
insert into #tmp (nmask) values('0001.00001.000001.00000001.000000000000.01 1')
insert into #tmp (nmask) values('0001.00100.000101.00000010.000000010000.01 0')
insert into #tmp (nmask) values('0001.00010.001000.00000100.000000001000.10 0')
insert into #tmp (nmask) values('0001.11010.101101.00111000.000000000000.01 1')
insert into #tmp (nmask) values('1001.11010.101101.00111001.100000000001.11 1')
go

declare @.TestValue varchar(50)
-- 1111111111222222222233333333334444
-- 1234567890123456789012345678901234567890123
set @.TestValue = '0001.11010.101101.00111000.000000000000.011'
select *
From #Tmp t1
where dbo.CompareCharPattern(substring(t1.nmask, 1, 4),substring(@.TestValue, 1, 4)) = 1
and dbo.CompareCharPattern(substring(t1.nmask, 6, 5),substring(@.TestValue, 6, 5)) = 1
and dbo.CompareCharPattern(substring(t1.nmask,12, 6),substring(@.TestValue,12, 6)) = 1
and dbo.CompareCharPattern(substring(t1.nmask,19, 8),substring(@.TestValue,19, 8)) = 1
and dbo.CompareCharPattern(substring(t1.nmask,28,12),s ubstring(@.TestValue,28,12)) = 1
and dbo.CompareCharPattern(substring(t1.nmask,41, 3),substring(@.TestValue,41, 3)) = 1
go


I think this accounts for everything.|||You have rescued my day :D

I had to do some changes in the logic of CompareCharPattern, but your idea with using select and a function in the wherestatement is super!

Thanks alot!!

No comments:

Post a Comment