Get index scan on zip code radius query - not sure why.. ! ?

  • [font="Courier New"]SELECT t1.*,

    DistanceInMiles = dbo.fnDistance(t1.Latitude, t1.Longitude, t2.Latitude, t2.Longitude)

    FROM dbo.ZipCodes t1

    JOIN dbo.ZipCodes t2

    ON (dbo.fnDistance(t1.Latitude, t1.Longitude, t2.Latitude, t2.Longitude) < 10)

    WHERE t2.ZipCode='10029'

    ORDER BY DistanceInMiles[/font]

    I'm a newbie, and I get a Clustered Index Scan on this zip code radius query, which I've heard is "bad" ! I added every conceivable combination of indexes to the table (zip, latitude, longitude, etc..) . Doesn't seem to make a difference. Any suggestions? tx! (see attached pic)

  • Is the field ZipCode in an index? And I'm curious about the fnDistance logic, what are you using?

    CEWII

  • Yes, Zip code is in an index..

    fnGetDistance (latitude1, long1, lat2, long2):

    ===============

    SELECT @Distance = 3963.0 * ACOS(SIN(RADIANS(@Lat1)) * SIN(RADIANS(@Lat2)) + COS(RADIANS(@Lat1)) * COS(RADIANS(@Lat2)) * COS(RADIANS(@Long2 - @Long1)))

  • an index scan is not really a bad thing; much better than a table scan.

    I think it depends on the selectivity of the index in question as to whether you are going to get an index seek vs an index scan.

    i'm thinking since your results is going to return multiple zipcodes for the given value, the index scan is the best you'll get; you see the index seek on one table, where it's finding one value, but since you want multiples in the other....

    it looks like a good execution plan to me.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • but Lowell - i've read that an index scan == table scan!

    Matt,

    NYC

  • Your clustered index is on Lat/Lon?

    SELECT t1.*,

    DistanceInMiles = dbo.fnDistance(t1.Latitude, t1.Longitude, t2.Latitude, t2.Longitude)

    FROM dbo.ZipCodes t1

    JOIN dbo.ZipCodes t2

    ON (dbo.fnDistance(t1.Latitude, t1.Longitude, t2.Latitude, t2.Longitude) < 10)

    WHERE t1.ZipCode='10029'

    ORDER BY DistanceInMiles

    Changing from t2.ZipCode in the where to t1 made a big difference..

    I have included all my code which is a table, a function and an insert script. The insert script creates ~43K or so zip code records. The ones less the 5 digits don't have leading zeroes..

    Also, I included a screen print of my query plan..

    CEWII

  • thank you very much - will try your code..

    my clustered index is on (zip, long, lat) .. yes, changing to WHERE clause to t2 results in a 'clustered index scan' vs. 'index scan'

    Elliott W (10/19/2009)


    Your clustered index is on Lat/Lon?

    SELECT t1.*,

    DistanceInMiles = dbo.fnDistance(t1.Latitude, t1.Longitude, t2.Latitude, t2.Longitude)

    FROM dbo.ZipCodes t1

    JOIN dbo.ZipCodes t2

    ON (dbo.fnDistance(t1.Latitude, t1.Longitude, t2.Latitude, t2.Longitude) < 10)

    WHERE t1.ZipCode='10029'

    ORDER BY DistanceInMiles

    Changing from t2.ZipCode in the where to t1 made a big difference..

    I have included all my code which is a table, a function and an insert script. The insert script creates ~43K or so zip code records. The ones less the 5 digits don't have leading zeroes..

    Also, I included a screen print of my query plan..

    CEWII

  • Elliot kick *** giving us the sample data to test with!

    don't know how close it is to the original posters, but i got similar results whether i choose t1.ZipCode or t2.ZipCode even though t1 produces a clustered index seek vs an index seek on T2;

    results are within a couple of milliseconds of each other. my execution plan is pretty much the same though, index scan on one table and index seek on the other, same sorting and inner join/nested loop.

    when i did statistics time on, i was running 2400 milliseconds or so on a mediocre machine.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Why did you start a new topic for the same problem?

    http://qa.sqlservercentral.com/Forums/Topic804719-149-1.aspx

    You seem to have ignored the most important point from the prior thread, which was to limit the number of rows that you have to search by setting limits on longitude and latitude.

  • Michael - This topic utilized the same code, but was about a 'clustered index scan' ... i suppose I should have categorized it with the first post.. sorry! - p.s. the F_FIND_SEARCH_LIMITS stuff you posted seems great, but it seems complicated for me (a newbie) .. I am going to try to look at it again.,

  • I found me a site that had the data for free so I loaded it.. But I agree it did not make a huge difference.. I'm wondering what the performance would be if we used the formula right in the query.. Wait, I can test that..

    Formula in the query = 156ms

    Formula in function = 453ms

    Uh, well, pick your poison.. If you are going to do this 10,000 times then the first one should be looked at.. If this is an occasional thing then 156 vs. 453 is not worth dealing with, I prefer the encapsulation of the formula..

    For example.. I recently had a process that calls this sproc ~280K times, by adding an index (a little bitty one no less) I shaved 90% of the processing time off. Little numbers can add up..

    CEWII

  • Wow! I wouldn't have guessed that the formula in the query would be faster!

  • The biggest reason why you're getting a SCAN of any sort instead of a SEEK is because of the forumula itself... it must calculate every zip code to find which ones are less than 10 miles away in order to be able to do the filtering.

    One half of a super optimization is to take the LAT1 (because it's nearly linear in miles compared to Longitude) and calculate the "Narrow Horizontal Corridor". What that means is... the Earth has an approximate circumference of 24,859.82 from pole-to-pole... there are 360 degrees of Latitude in the circumference... Sooooooo.... let's say Lat1 was 45 degrees North and you wanted a radius of 10 miles. You KNOW that none of the zip codes would be any further than 10 miles further North nor would they be any further than 10 miles further South. (45 represents Lat1 in the following examples...)

    @MaxNorthLat = 45 + (10/24,859.82*360)

    @MaxSouthLat = 45 - (10/24,859.82*360)

    For better accuracy, always do multiplication first...

    @MaxNorthLat = 45 + (10*360/24,859.82)

    @MaxSouthLat = 45 - (10*360/24,859.82)

    So, now instead of looking at every Zip Code in the nation to figure out which ones are less than ten miles away, you have a better guess... a 20 mile high "Narrow Horizontal Corridor" to be precise.

    WHERE Lat2 BETWEEN @MaxNorthLat AND @MaxSouthLat

    There's a similar trick you can do with Longitude but it's slightly more complicated because the distance between two lines of Longitude varies. (I'll let you GOOGLE that formula which is based on @MaxNorthLat in the Northern Hemisphere). What you end up with is a spherical "rectangle" of about 20 miles by 20 miles (at the Equator and a bit more trapezoidal as you approach the poles) that greatly limits the area that you need to search. THEN, you can do your radius calculation on that data to fine tune it. The key is that the "super" optimization greatly limits the number of radius calculations that need to be done because the radius calculations will ALWAYS cause a scan of ALL the zip codes. The optimization of the spherical "rectangle" will use SEEKS.

    All of this optimization works even better if you're using the "Donald Elliptic Projection Coordinate System" which some call the "Telcordia V&H Coordinate System". Basically, that's the same system that the airlines and telephone companies have used for years and years and you calculate radial distances using the Pythagorean theorem which is simple. Of course, the optimization on that projection is much simpler because it's built so the vertical axis' are parallel just like the horizontal ones are. It flatten out the Earth's coordinate system. Give it a Google…

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Wow, Jeff, you are the man! I will try to implement the "narrow corridor".. so i should just calculate the Max vars and add "WHERE Lat2 BETWEEN @MaxNorthLat AND @MaxSouthLat" to my sql? Just fyi, below is the personal hell, I mean sql, that I have been working on:

    [font="Courier New"]SELECT * FROM

    (

    SELECT

    TotalRows = COUNT(*) OVER(),

    RowNum = ROW_NUMBER()OVER (ORDER BY dbo.fnDistance(z1.Latitude, z1.Longitude, z2.Latitude, z2.Longitude)),

    P.ProviderID, P.LastName, P.FirstName, P.City, z1.ZipCode, z1.State, DistanceInMiles = dbo.fnDistance(z1.Latitude, z1.Longitude, z2.Latitude, z2.Longitude)

    FROM dbo.Providers P WITH (NOLOCK)

    INNER JOIN dbo.ZipCodes z1 ON z1.ZipCode = P.Zip

    INNER JOIN dbo.ZipCodes z2 ON dbo.fnDistance(z1.Latitude, z1.Longitude, z2.Latitude, z2.Longitude) < 20

    WHERE z2.ZipCode = @zip_code

    )

    AS XYZ -- you need this AS XYZ

    WHERE RowNum BETWEEN @x AND @y

    ORDER BY RowNum ASC

    [ fnDistance: SELECT @Distance = 3963.0 * ACOS(SIN(RADIANS(@Lat1)) * SIN(RADIANS(@Lat2)) + COS(RADIANS(@Lat1)) * COS(RADIANS(@Lat2)) * COS(RADIANS(@Long2 - @Long1))) ][/font]

  • matt6749 (10/19/2009)


    Wow, Jeff, you are the man! I will try to implement the "narrow corridor".. so i should just calculate the Max vars and add "WHERE Lat2 BETWEEN @MaxNorthLat AND @MaxSouthLat" to my sql? Just fyi, below is the personal hell, I mean sql, that I have been working on:

    Basically, Yes. Try that first... we may have to do a little "Divide'n'Conquer" if that doesn't bring the performance way up...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply