I have a table that contains (among other things) start and end range datetimes.  The ranges can overlap.  View these datetimes as segments of coverage on a number line.  Rather than returning each line segment individually, I'd like the query to return the start and the end of the contiguous segments.  For example, the desired output of the query on the below table would be:
/* Start SQL */
/* Desired output:  range_start_datetime        range_end_datetime
                    01/01/2003                  01/06/2003
                    01/08/2003                  01/12/2003
*/  
create table #ranges (range_start_datetime datetime, range_end_datetime datetime)         
/* Range 1:  01/01/2003 to 01/06/2003 */
insert into #ranges(range_start_datetime, range_end_datetime) values ('01/01/2003', '01/02/2003')
insert into #ranges(range_start_datetime, range_end_datetime) values ('01/02/2003', '01/03/2003')
insert into #ranges(range_start_datetime, range_end_datetime) values ('01/02/2003', '01/06/2003')
/* Gap - no data at 01/07/2003 */
/* Range 2:  01/08/2003 to 01/12/2003 */
insert into #ranges(range_start_datetime, range_end_datetime) values ('01/08/2003', '01/09/2003')
insert into #ranges(range_start_datetime, range_end_datetime) values ('01/09/2003', '01/10/2003')
insert into #ranges(range_start_datetime, range_end_datetime) values ('01/10/2003', '01/12/2003')
select * from #ranges
drop table #ranges
/* End SQL */This version will work in Oracle at least:
SQL> select range_start_datetime,
  2         (
  3           select min(range_end_datetime)
  4           from   ranges r3
  5           where not exists
  6           ( select range_start_datetime from ranges r4
  7             where r4.range_start_datetime <= r3.range_end_datetime
  8             and   r4.range_end_datetime > r3.range_end_datetime
  9           )
 10           and r3.range_end_datetime >= r1.range_start_datetime
 11         )
 12  from ranges r1
 13  where not exists
 14  ( select range_end_datetime from ranges r2
 15    where r2.range_end_datetime >= r1.range_start_datetime
 16    and   r2.range_start_datetime < r1.range_start_datetime
 17  );
RANGE_STAR (SELECTMIN
---- ----
01/01/2003 01/06/2003
01/08/2003 01/12/2003|||This works in Sybase as well.  I had to change the first select to distinct when I added a few more test cases, but it works!  I'm impressed with your skills.
If you get a moment, could you explain what your query is doing?  I didn't even consider a 4-join solution.
Gump
Originally posted by andrewst 
This version will work in Oracle at least:
SQL> select range_start_datetime,
  2         (
  3           select min(range_end_datetime)
  4           from   ranges r3
  5           where not exists
  6           ( select range_start_datetime from ranges r4
  7             where r4.range_start_datetime <= r3.range_end_datetime
  8             and   r4.range_end_datetime > r3.range_end_datetime
  9           )
 10           and r3.range_end_datetime >= r1.range_start_datetime
 11         )
 12  from ranges r1
 13  where not exists
 14  ( select range_end_datetime from ranges r2
 15    where r2.range_end_datetime >= r1.range_start_datetime
 16    and   r2.range_start_datetime < r1.range_start_datetime
 17  );
RANGE_STAR (SELECTMIN
---- ----
01/01/2003 01/06/2003
01/08/2003 01/12/2003|||Probably easiest to start from here:
select range_start_datetime,
from ranges r1
where not exists
( select range_end_datetime from ranges r2
  where r2.range_end_datetime >= r1.range_start_datetime
  and   r2.range_start_datetime < r1.range_start_datetime
);
All I have done is remove the second column from the main select (which was a "scalar subquery").
This query gets the start of each contiguous range, by finding all the start dates for which there does not exist a preceding record that meets or overlaps it, i.e.:
RANGE_STAR
----
01/01/2003
01/08/2003
Now let's look at the scalar subquery:
select min(range_end_datetime)
from   ranges r3
where not exists
( select range_start_datetime from ranges r4
  where r4.range_start_datetime <= r3.range_end_datetime
  and   r4.range_end_datetime > r3.range_end_datetime
)
and r3.range_end_datetime >= r1.range_start_datetime
This sort of does the opposite of the query above: it finds the end of each contiguous range, i.e. all the end dates for which there does not exist a following record that meets or overlaps it, i.e. 
select range_end_datetime
from   ranges r3
where not exists
( select range_start_datetime from ranges r4
  where r4.range_start_datetime <= r3.range_end_datetime
  and   r4.range_end_datetime > r3.range_end_datetime
)
RANGE_END_
----
01/06/2003
01/12/2003
It then finds the MIN of those end dates where the end_date is on or after the start date of each record in the main query (i.e. corellated): for start date 01/01/2003 that will be 01/06/2003, and for start date 01/08/2003 that will be 01/12/2003.
Does that make sense?
Monday, February 27, 2012
Subscribe to:
Post Comments (Atom)
 
No comments:
Post a Comment