Friday, March 23, 2012

help deleting SIMILAR records (not duplicate)

i've read lots of usenet and and microsoft support articles about how
to remove duplicate rows from a table, but i am trying to modify that
logic to delete "similar" rows. for example, consider the following:
create table t1 (
col1 int,
col2 bit,
col3 bit)
insert into t1 values (1, 0, 0)
insert into t1 values (2, 0, 0)
insert into t1 values (3, 0, 1)
now, clearly there are no duplicate rows. but what if, for the sake of
logical consistency, i need to remove "similar" rows, defining similar
in this example as rows with duplicate col2 and col3 values? keep in
mind: i don't care which row gets deleted (deleting the col1 value of 1
OR 2 will be fine)
most of the examples i'm reading involve selecting distinct * into a
temp table, which won't work for me, because the whole row is not
duplicated.
the result i am looking for AFTER the similar row deletion is as
follows:
select * from t1
col1 col2 col3
-- -- --
1 0 0
3 0 1
thanks for any help!Delete Table
Where Col1 In(
Select Min(T1.Col1)
From Table1 As T1
Group By T1.Col2, T1.Col3
Having Count(*) > 1
)
Thomas
"jason" <iaesun@.yahoo.com> wrote in message
news:1114800493.591486.311530@.l41g2000cwc.googlegroups.com...
> i've read lots of usenet and and microsoft support articles about how
> to remove duplicate rows from a table, but i am trying to modify that
> logic to delete "similar" rows. for example, consider the following:
> create table t1 (
> col1 int,
> col2 bit,
> col3 bit)
> insert into t1 values (1, 0, 0)
> insert into t1 values (2, 0, 0)
> insert into t1 values (3, 0, 1)
> now, clearly there are no duplicate rows. but what if, for the sake of
> logical consistency, i need to remove "similar" rows, defining similar
> in this example as rows with duplicate col2 and col3 values? keep in
> mind: i don't care which row gets deleted (deleting the col1 value of 1
> OR 2 will be fine)
> most of the examples i'm reading involve selecting distinct * into a
> temp table, which won't work for me, because the whole row is not
> duplicated.
> the result i am looking for AFTER the similar row deletion is as
> follows:
> select * from t1
> col1 col2 col3
> -- -- --
> 1 0 0
> 3 0 1
> thanks for any help!
>|||this is exactly the kind of logic i need, even though that will only
delete 1 similar row, where as i would like to only KEEP 1 similar row.
your code satisfies the example completely, however i might actually
have dozens of "similar" rows, for which i would only want to keep the
rows identified by the select min(col1) statement.
thanks again!|||You could run the query several times in succession or put it in a loop :)
BTW, once you get this resolved, you might want to make sure that the
client-side logic doesn't allow inserts of "similar rows"; or you could
ensure this via trigger.
"jason" <iaesun@.yahoo.com> wrote in message
news:1114801761.830867.51880@.z14g2000cwz.googlegroups.com...
> this is exactly the kind of logic i need, even though that will only
> delete 1 similar row, where as i would like to only KEEP 1 similar row.
> your code satisfies the example completely, however i might actually
> have dozens of "similar" rows, for which i would only want to keep the
> rows identified by the select min(col1) statement.
> thanks again!
>|||Try,
delete t1
where exists(select * from t1 as t2 where t1col2 = t2.col2 and t1.col3 =
t2.col3 and t2.col1 > t1.col1)
AMB
"jason" wrote:

> this is exactly the kind of logic i need, even though that will only
> delete 1 similar row, where as i would like to only KEEP 1 similar row.
> your code satisfies the example completely, however i might actually
> have dozens of "similar" rows, for which i would only want to keep the
> rows identified by the select min(col1) statement.
> thanks again!
>|||DELETE FROM T1
WHERE EXISTS
(SELECT *
FROM T1 AS T2
WHERE T1.col2 = T2.col2
AND T1.col3 = T2.col3
AND T1.col1 > T2.col1)
David Portas
SQL Server MVP
--

No comments:

Post a Comment