sql server - Condensing similar rows occuring in groups and keeping order -
i have sql table containing gps coordinates of device, updated every n minutes (the device installed in vehicle). given nature of gps, lots of entries similar, entirely different far server concerned. can approximately match things (within ~3.6' or maybe 36') easy enough cast(lat decimal(7,4))
i'd able take result set , condense approximate duplicate entries, still maintain time-based order. here's example:
row lat lng vel hdg time 01 31.12345 -88.12345 00 00 12-4-21 01:45:00 02 31.12346 -88.12345 00 00 12-4-21 01:46:00 03 31.12455 -88.12410 10 01 12-4-21 01:47:00 04 31.12495 -88.12480 17 01 12-4-21 01:48:00 05 31.12532 -88.12560 22 01 12-4-21 01:49:00 06 31.12567 -88.12608 25 02 12-4-21 01:50:00 07 31.12638 -88.12672 24 02 12-4-21 01:51:00 08 31.12689 -88.12722 19 02 12-4-21 01:52:00 09 31.12345 -88.12345 00 00 12-4-21 01:53:00 10 31.12346 -88.12346 00 00 12-4-21 01:54:00 11 31.12347 -88.12345 00 00 12-4-21 01:55:00 12 31.12346 -88.12346 00 00 12-4-21 01:56:00 13 31.12689 -88.12788 10 40 12-4-21 01:57:00 14 31.12604 -88.12691 13 39 12-4-21 01:58:00 15 31.12572 -88.12603 15 39 12-4-21 01:59:00 my desired end result rows 1 , 2 condensed single row, , rows 9 through 12 condensed single row, containing avg(lat), avg(lng), , min(time).
this result set receive, given above data:
row lat lng vel hdg time 01 31.123455 -88.12345 00 00 12-4-21 01:45:00 02 31.12455 -88.12410 10 01 12-4-21 01:47:00 03 31.12495 -88.12480 17 01 12-4-21 01:48:00 04 31.12532 -88.12560 22 01 12-4-21 01:49:00 05 31.12567 -88.12608 25 02 12-4-21 01:50:00 06 31.12638 -88.12672 24 02 12-4-21 01:51:00 07 31.12689 -88.12722 19 02 12-4-21 01:52:00 08 31.12346 -88.123455 00 00 12-4-21 01:53:00 09 31.12689 -88.12788 10 40 12-4-21 01:57:00 10 31.12604 -88.12691 13 39 12-4-21 01:58:00 11 31.12572 -88.12603 15 39 12-4-21 01:59:00 the boundaries between groupings movement. velocity being > 0, or gps coordinate changing more x amount. in case, x .0001. the problem, described below, multiple stops (at different times) @ given coordinate lumped single stop. if visit coordinate x today @ 4 pm, , tomorrow @ 8 am, , again @ 6 pm, 1 see tomorrow @ 6 pm (in case of max(time)) or today @ 4 pm (in case of min(time)).
it's given if velocity 0, heading 0. is, however, important rows 1 , 2, , 9 through 12 not grouped if coordinates similar enough same (i.e. when rounded 4 decimal places).
i have query that:
select geography::point(avg(dbo.gpsentries.latitude), avg(dbo.gpsentries.longitude), 4326 ) location, dbo.gpsentries.velocity, dbo.gpsentries.heading, max(dbo.gpsentries.time) maxtime, min(dbo.gpsentries.time) mintime, avg(dbo.rfdatas.rssi) avgrssi, count(1) samples dbo.gpsentries inner join dbo.reports on dbo.gpsentries.report_id = dbo.reports.id inner join dbo.rfdatas on dbo.gpsentries.report_id = dbo.rfdatas.report_id group cast(latitude decimal(7,4)), cast(longitude decimal(7,4)), velocity, heading order max(time) in other words, if travel point point b, stay 30 minutes (and 30 reports @ 1 per minute), travel point c, stay 20 minutes, travel point b , stay 20 more minutes before heading point d, able see both separate stops @ point b.
here's actual data db, sanitized protect innocent, or blame in north east alabama.
latitude longitude spd vel max(time) min(time) sig rowcount 34.747420 -86.302580 68 157 2012-06-13 01:31:37.000 2012-06-13 01:31:37.000 -91 1 34.759140 -86.307620 61 134 2012-06-13 01:33:06.000 2012-06-13 01:33:06.000 -91 2 34.763237 -86.307264 0 0 2012-06-13 01:34:36.000 2012-06-12 01:27:21.000 -97 7 34.763288 -86.307280 0 0 2012-06-13 14:30:44.000 2012-06-12 01:30:21.000 -98 527 34.760220 -86.308200 38 110 2012-06-13 14:33:44.000 2012-06-13 14:33:44.000 -98 1 34.750350 -86.305750 5 90 2012-06-13 14:35:13.000 2012-06-13 14:35:13.000 -83 2 34.737160 -86.298040 70 88 2012-06-13 14:36:43.000 2012-06-13 14:36:43.000 -80 1 34.736420 -86.277270 120 33 2012-06-13 14:38:13.000 2012-06-13 14:38:13.000 -87 2 34.747090 -86.248370 120 37 2012-06-13 14:39:43.000 2012-06-13 14:39:43.000 -93 2 34.755620 -86.240640 70 179 2012-06-13 14:41:13.000 2012-06-13 14:41:13.000 -81 1 34.771240 -86.242760 70 0 2012-06-13 14:42:42.000 2012-06-13 14:42:42.000 -88 2 34.785510 -86.245710 70 6 2012-06-13 14:44:12.000 2012-06-13 14:44:12.000 -99 2 34.800220 -86.239400 70 1 2012-06-13 14:45:42.000 2012-06-13 14:45:42.000 -86 1 34.815070 -86.232180 70 16 2012-06-13 14:47:12.000 2012-06-13 14:47:12.000 -98 2 34.824540 -86.226198 0 0 2012-06-13 14:51:41.000 2012-06-13 00:13:48.000 -101 9 34.824579 -86.226171 0 0 2012-06-14 00:26:19.000 2012-06-12 00:46:57.000 -99 168 you'll note 4th , last row have 527 , 168 entries, respectively, , span 2 days. entries 1 device only, , device stopped several hours in same place on multiple occasions.
here's zipped csv data: sample
what done did
some minor modifications aaron bertrand's supplied query shown below:
with d ( select time ,latitude ,longitude ,velocity ,heading ,timern = row_number() on (order [time]) dbo.gpsentries group time, latitude, longitude, velocity, heading ), y ( select begintime = min(time) ,endtime = max(time) ,latitude = avg(latitude) ,longitude = avg(longitude) -- ,[rowcount] = count(*) ,groupnumber ( select time ,latitude ,longitude ,groupnumber = ( select min(d2.timern) d d2 d2.timern >= d.timern , not exists ( select 1 d d3 -- between 250 , 337 feet abs(d2.latitude - d.latitude) <= .0007 , abs(d2.longitude - d.longitude) <= .0007 , d2.velocity = d.velocity ) ) d ) x group groupnumber ) select y.latitude ,y.longitude ,d.velocity ,d.heading ,y.begintime -- ,y.endtime -- ,y.[rowcount] -- ,duration = convert(time(0),dateadd(ss,datediff(ss,y.begintime, y.endtime), '0:00:00'), 108) y inner join d on y.begintime = d.[time] -- stops (5 minute): -- datediff(mi, y.begintime, y.endtime) + 1 > 5 order y.begintime;
here sample data in tempdb:
use tempdb; go create table dbo.gpsentries ( latitude decimal(8,5), longitude decimal(8,5), velocity tinyint, heading tinyint, [time] smalldatetime ); insert dbo.gpsentries values (31.12345,-88.12345,00,00,'2012-04-21 01:45:00'), (31.12346,-88.12345,00,00,'2012-04-21 01:46:00'), (31.12455,-88.12410,10,01,'2012-04-21 01:47:00'), (31.12495,-88.12480,17,01,'2012-04-21 01:48:00'), (31.12532,-88.12560,22,01,'2012-04-21 01:49:00'), (31.12567,-88.12608,25,02,'2012-04-21 01:50:00'), (31.12638,-88.12672,24,02,'2012-04-21 01:51:00'), (31.12689,-88.12722,19,02,'2012-04-21 01:52:00'), (31.12345,-88.12345,00,00,'2012-04-21 01:53:00'), (31.12346,-88.12346,00,00,'2012-04-21 01:54:00'), (31.12347,-88.12345,00,00,'2012-04-21 01:55:00'), (31.12346,-88.12346,00,00,'2012-04-21 01:56:00'), (31.12689,-88.12788,10,40,'2012-04-21 01:57:00'), (31.12604,-88.12691,13,39,'2012-04-21 01:58:00'), (31.12572,-88.12603,15,39,'2012-04-21 01:59:00'); and attempt @ satisfying query:
;with d ( select time, latitude, longitude, velocity, heading, normlat = convert(decimal(7,4), latitude), normlong = convert(decimal(7,4), longitude), timern = row_number() on (order [time]) dbo.gpsentries -- /* want filters: -- deviceid = @somedeviceid -- , [time] >= @somestartdate -- , [time] < dateadd(day, 1, @someenddate) -- /* sample csv file had lots of duplicates, so: group time, latitude, longitude, velocity, heading ), y ( select mintime = min(time), maxtime = max(time), latitude = avg(latitude), longitude = avg(longitude), [rowcount] = count(*) ( select time, latitude, longitude, groupnumber = ( select min(d2.timern) d d2 d2.timern >= d.timern , not exists ( select 1 d d3 d2.normlat = d.normlat , d2.normlong = d.normlong ) ) d ) x group groupnumber ) select [row] = row_number() on (order y.mintime), y.latitude, y.longitude, d.velocity, d.heading, y.mintime, y.maxtime, y.[rowcount] y inner join d on y.mintime = d.[time] order y.mintime; results:
row latitude longitude velocity heading mintime maxtime rowcount ---|---------|----------|--------|-------|----------------|----------------|-------- 1 31.123455 -88.123450 0 0 2012-04-21 01:45 2012-04-21 01:46 2 2 31.124550 -88.124100 10 1 2012-04-21 01:47 2012-04-21 01:47 1 3 31.124950 -88.124800 17 1 2012-04-21 01:48 2012-04-21 01:48 1 4 31.125320 -88.125600 22 1 2012-04-21 01:49 2012-04-21 01:49 1 5 31.125670 -88.126080 25 2 2012-04-21 01:50 2012-04-21 01:50 1 6 31.126380 -88.126720 24 2 2012-04-21 01:51 2012-04-21 01:51 1 7 31.126890 -88.127220 19 2 2012-04-21 01:52 2012-04-21 01:52 1 8 31.123460 -88.123455 0 0 2012-04-21 01:53 2012-04-21 01:56 4 9 31.126890 -88.127880 10 40 2012-04-21 01:57 2012-04-21 01:57 1 10 31.126040 -88.126910 13 39 2012-04-21 01:58 2012-04-21 01:58 1 11 31.125720 -88.126030 15 39 2012-04-21 01:59 2012-04-21 01:59 1
Comments
Post a Comment