Quick question on zip code searches: U.S. and Canada

Question

Quick question on zip code searches: U.S. and Canada

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

p.s. That code uses 2 Index Seeks. One for the first single row lookup and the other to find the 1,333 ZipCodes in the Latitude Optimization for a 15 Mile Radius.

I do not have any Canadian information but it shouldn't matter much because of the Clustered Index Seeks and the Clustered Index being on Lat/Lon. You'll get a very nice seek to find the first row and a high speed sorted scan of only the data you need after that.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

Oh yeah... I almost forgot. Part of the problem with the original code and the reason why it could only do a Clustered Index Scan was because of the Scalar Function being used as JOIN criteria. There's just no way to get a meaningful Seek out of something like that.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

matt6749 Ten Centuries Points: 1204 More actions · Answer 3

Jeff - Thank you SO much for your help. Your brain power is really impressive. Now I'm reminded why I can't get programming jobs very easily! lol. I'm going to digest what you wrote, and try and hopefully get it working for myself over the next few days and integrate it. I may reply to this post again if I run into a brick wall.

Thanks again,

Matt

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

matt6749 (6/25/2011)
Jeff - Thank you SO much for your help. Your brain power is really impressive. Now I'm reminded why I can't get programming jobs very easily! lol. I'm going to digest what you wrote, and try and hopefully get it working for myself over the next few days and integrate it. I may reply to this post again if I run into a brick wall.
Thanks again,
Matt

Thanks for the compliments :blush:, Matt, but it's not any special form of brain power. Just a bit of experience in doing these things... peel one potato at a time. 😀 You had the one thing that most people miss and that's the Latitude optimization. You just needed a little help pushing that optimization over the edge. 🙂

There's also a Longitude optimization out there on the web, but I've not implemented it anywhere. If my postal code lookups where global, then I'd look into it more.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

matt6749 Ten Centuries Points: 1204 More actions · Answer 5

Hi Jeff:

- I have the indexes you recommended on my ZipCode table

I implemented your code and it reduced the query times:

OLD TIMES:

~850ms for '10023'

~2600ms for 'V6H 1E2'

NEW TIMES:

~300ms for '48326'

~1100ms for 'V6H 1E2' -- for some reason searching on Canadian zip codes takes longer, even though U.S. and Canadian zip codes are in same table.

Notes:

a) i think it would be even quicker if I didn't join with my existing Providers and Occupations table

b) query execution plan shows there is an index scan in one part:

Query 1 (24% relative to cost of batch):

-----------------------------------------------

Index Scan (NonClustered)

[ZipCodes].[IX_ZipCodes_ZipCode]

Cost 96%

SELECT @DegreesPerMile = 360/24859.82, --24859.82 = Circumference of the Earth

@StartLat = Latitude,

@StartLon = Longitude,

@MaxNorthLat = Latitude + (@pRadius*@DegreesPerMile),

@MaxSouthLat = Latitude - (@pRadius*@DegreesPerMile)

FROM dbo.ZipCodes

WHERE ZipCode = @pZip

The Stored Procedure itself:

---------------------------------------------------

-- Create CTE

WITH

cteFindDistances AS

( --=== Find the optimized band of ZipCodes and calculate their distance using the hi-speed iTVF.

SELECT d.Distance, z.ZipCode, z.State

FROM dbo.ZipCodesTest z

CROSS APPLY dbo.fnDistanceNEW(@StartLat, @StartLon, z.Latitude, z.Longitude) d

WHERE z.Latitude >= @MaxSouthLat

AND z.Latitude <= @MaxNorthLat

) --=== Return only those within the desired radius

-- Use the CTE

SELECT Distance, ZipCode

INTO #TempZip

FROM cteFindDistances

WHERE Distance <= @radius

--CREATE INDEX IDX_TempZip ON #TempZip(ZipCode) --didn't seem to increase speed any..

SELECT * FROM

(

SELECT

TotalRows = COUNT(*) OVER(),

RowNum =

CASE

WHEN @sortColumn = 'OccupationName_ASC' THEN ROW_NUMBER()OVER (ORDER BY O.OccupationName)

WHEN @sortColumn = 'City_ASC' THEN ROW_NUMBER()OVER (ORDER BY P.City)

WHEN @sortColumn = 'Distance_ASC' THEN ROW_NUMBER()OVER (ORDER BY T.Distance)

WHEN @sortColumn = 'OccupationName_DESC' THEN ROW_NUMBER()OVER (ORDER BY O.OccupationName DESC)

WHEN @sortColumn = 'City_DESC' THEN ROW_NUMBER()OVER (ORDER BY P.City DESC)

WHEN @sortColumn = 'Distance_DESC' THEN ROW_NUMBER()OVER (ORDER BY T.Distance DESC)

ELSE ROW_NUMBER()OVER (ORDER BY NewID()) -- note this returns random order, and each page's results may not be unique.

END,

P.ProviderID, P.OccupationID, P.LastName, P.FirstName, P.Designations, P.City, P.State, P.Zip, P.Phone, O.OccupationName

, T.ZipCode, T.Distance, P.NumArticles, P.Headline1, P.Headline2, P.PracticeName, P.PracticeInfo

FROM dbo.Providers P WITH (NOLOCK)

INNER JOIN dbo.Occupations O ON O.OccupationID = P.OccupationID

INNER JOIN #TempZip T ON T.ZipCode = P.Zip

WHERE (@occupationID IS NULL OR P.OccupationID = @occupationID)

)

AS XYZ -- you need this AS XYZ

WHERE RowNum BETWEEN @startRow AND (@startRow + @rowsPerPage) - 1

ORDER BY RowNum ASC

SO... THANK YOU JEFF! It has also simplified my code quite a bit. If you have any other suggestions on this I much appreciate it!

Matt

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

matt6749 (6/27/2011)
...for some reason searching on Canadian zip codes takes longer, even though U.S. and Canadian zip codes are in same table.

It's because the 2nd triad of characters in the Canadian Postal code make the equivalent of Zip+4 in the U.S. and there are a whole lot more of those in the "Latitude Optimization" than if you were just using the first 3 characters.

Notes:
a) i think it would be even quicker if I didn't join with my existing Providers and Occupations table

IIRC, I made such a statement in the comments in the code. Do the lookup and push the results to a Temp Table. Then, join the Providers and Occupations tables to the Temp Table. The code should be quite a bit faster because the join has been narrowed a whole lot.

I'm looking at the rest of the stuff you posted right now.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

matt6749 (6/27/2011)
Notes:
a) i think it would be even quicker if I didn't join with my existing Providers and Occupations table

I notice you do column sorts based on that information. My recommendation would be to filter @SortColumns for injection and use dynamic SQL to do the sort. I believe it would be faster.

b) query execution plan shows there is an index scan in one part:
Query 1 (24% relative to cost of batch):
-----------------------------------------------
Index Scan (NonClustered)
[ZipCodes].[IX_ZipCodes_ZipCode]
Cost 96%
SELECT @DegreesPerMile = 360/24859.82, --24859.82 = Circumference of the Earth
@StartLat = Latitude,
@StartLon = Longitude,
@MaxNorthLat = Latitude + (@pRadius*@DegreesPerMile),
@MaxSouthLat = Latitude - (@pRadius*@DegreesPerMile)
FROM dbo.ZipCodes
WHERE ZipCode = @pZip

I have no clue why your box decided to do an Index Scan on that with the indexes I gave you unless there are really that many rows in the Latitude Optimization where the Optimizer decided a scan would be faster. That's going to take a bit of troubleshooting by you. It's definitely worth reducing.

FROM dbo.Providers P WITH (NOLOCK)

INNER JOIN dbo.Occupations O ON O.OccupationID = P.OccupationID

INNER JOIN #TempZip T ON T.ZipCode = P.Zip

WHERE (@occupationID IS NULL OR P.OccupationID = @occupationID)

Hmmm... my recommendation in the above code would be to list #TempZip first in the FROM clause because it's likely the smallest table. Then pick the next largest. It an optimization that doesn't always work but we have to remember that the answer is "NO" unless we ask. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

matt6749 Ten Centuries Points: 1204 More actions · Answer 8

Oh my gosh Jeff, after being puzzled about why the join would affect the execution time so much (Providers table only has 2500 records, Occupations table only 10 records)

.. I spent hours whittling down the query to see where the speed differed from yours.

.. I found the culprit: I had declared the @zip parameter at the top of the procedure as nvarchar(100)

.. changed @zip param to char(5) (as your example showed)

.. this reduced execution time from 300ms to 40ms !!! Oh boy.

.. and no index scans in execution plan anymore – all index seeks.

Thanks again Jeff for your help and patience and expertise!

Just FYI - my web site is TherapyNext.com. (I should make a note on the site: "This site powered by Jeff Moden.") Now if it would only make me some money someday... lol.

If you don't mind, I thanked you here:

http://therapynext.com/About-Us.aspx

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

Absolutely outstanding, Matt! I didn't even think to ask you if you had matched datatypes because I included it all in the code example. VERY happy you sussed it. As you just found out, correctly matching datatypes can make or break just about any code.

CHAR(5) isn't going to get it with the Canadian postal codes, though. If you're using the official format of 3-space-3, then you need to change the CHAR(5) to either a CHAR(7) or a VARCHAR(7) for @Zip.

Shifting gears, you've got the beginnings of an awesome website and, after reading your "mission statement" on the "About Us" page, I'm very happy I was able to help such a worthy site. It wasn't necessary for you to mention my name on that page because I just like helping people but I'm truly honored and humbled that you took the time to make such a notation. Thanks, Matt.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions