Query improvement

Question

Post reply

Query improvement

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply

Johan Bijnens SSC Guru Points: 134873 More actions · Answer 1

Books online explains this very well !

The clustered index actually has the data in its leaf level !

A NCI is built aside from the table, based on the table and cannot live

without the table.

If a NCI is defined for a heap-table, the leaflevel pages of the index contain rids (row id) for the datarow in the heap.

If a NCI is defined on a clusterd table, data access is performed using the uniquified key(s) of the clustered index.

B-tree is at the basis for all classic indexes.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

vaidyanathan_kalyanasundaram Right there with Babe Points: 779 More actions · Answer 2

what do you mean by this statement?

if a NCI is defined on a clusterd table, data access is performed using the uniquified key(s) of the clustered index.??

Johan Bijnens SSC Guru Points: 134873 More actions · Answer 3

if the clustering index key(s) is not unique , sqlserver will make it unique for you (by adding kind of guid).

All access to the datapages for a clustered table uses the B-tree structure of the clustering index.(except for a full scan)

NCIs will point to the unique key for each row in the datapages. i.e. the uniquified clustered index key.

Books online has more info.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

vaidyanathan_kalyanasundaram Right there with Babe Points: 779 More actions · Answer 4

Thank you ALZDBA...

what about a covering index??Is it stored seperately from the base table??

Johan Bijnens SSC Guru Points: 134873 More actions · Answer 5

yep

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

GilaMonster (3/5/2008)
No, because using functions on the column in the where clause means that SQL cannot use an index seek to locate the rows, and it will have to scan.
OR or union allow SQL to seek an index on the data column (if it exists) to retrieve the applicable rows.

Well, mostly... It seems that ISNULL will frequently allow an Index SEEK to occur.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 7

Interesting.

I seem to recall a presentation where one of the devs of the execution engine spoke about collapsing functions (I don't remember the technical name). Maybe ISNULL is one of the functions that can be collapsed.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

a-brbeck SSC Veteran Points: 294 More actions · Answer 8

There have been a lot of wise things said regarding the optimization of the query “SELECT * FROM table WHERE MONTH(datefield)=’mm’”, and I have to admit there were several issues brought up that I did not notice until someone mentioned it. But as I began to dig into this problem, looking for what my answer would be, I discovered that everything said so far is incorrect to one degree or another. Or rather, while there was much good advice given, the ultimate solution was never offered.

I submit that the absolute best optimization of the statement is: “SELECT column1, column2, column3 FROM table WHERE MONTH(datefield)=1”. In other words, with the exception of the use of an asterisk as a bad idea, the query as originally submitted is 100% optimized.

Now, I know this goes directly against most of what’s been said, so let me explain.

At first I missed the use of the asterisk, but when it was pointed out I immediately knew the person who pointed this out was right. However, I can’t think of any circumstance other than a covered index where the selected columns will make a difference in the choice between a seek or a scan; I believe the choice is almost always based on the contents of the WHERE clause and statistics. But I do agree with the advice of not using asterisks in production code. Someone wise once told me that the reason for this is that if the columns on the table are changed, this can have unexpected effects on objects such as stored procs that have "SELECT *" hardcoded.

A more direct problem is with network bandwidth; pulling unneeded data across the network slows down the time it takes to make the transfer across the network as well as it robs network bandwidth from processes that may actually need the network to do some useful work. Also, any intermediate tables that SQL Server creates to run the query may include columns that were not needed and use up precious SQL Server resources such as memory. All in all, it’s just better to explicitly specify the columns you actually need rather than returning the “kitchen sink”. But I still find myself using it more often than not since I am mostly running tests via shared memory rather than across a network, and I don’t write stored procs (etc) very often, and old habits are hard to break.

Now on to the real issue: the WHERE clause. I assume that the person who originally wrote this knew what they wanted. That’s a big assumption, because as someone pointed out, this query asks for ALL records for a given month regardless of the year, and that’s likely to be an ugly query no matter what you do. Also, it's questionable whether the original developer really meant to hard code one specific month rather than allowing the month to be defined at run-time in a stored proc. And as mentioned, the use of the asterisk brings the developer's wisdom into question.

If the intent is truly to have high performance SELECT statements that return one hard coded month over and over again, I might suggest creating a trigger to copy the records that belong to this month into their own seperate table. The trigger will slow down input into the table, but if that's not a concern, the SELECT against a table that only contains records for this month will be faster than even a clustered index seek. But assuming that they won't go for this solution or that the cost of slowing INSERT statements down far outweighs the advantage of a quick SELECT, I'll continue.

Creating an entirely new WHERE clause based on something else which allows an index to be used would be far better than this query - which can never be effectively indexed other than devoting a clustered index to it. For example, searching by Sales Order Number on a Sales Order Number clustered table. Using some form of a between and then using a UNION statement somewhat falls into this category; or rather, the assumption that we can limit the query to only a few years rather than all years is essentially redefining the WHERE clause. In reality, the original developer may have really intended to only return a few years of data and it would be wise to explore this avenue. But I am going on the assumption that they actually wanted “all” years which means you can’t hardcode which years to look for but have to leave it variable and quickly end up creating more of a problem than you had in the original code as you try to concatenate a variable number of years together. So, assuming that that is “truly” what they want, this query is as good as it’s ever going to get.

I originally started exploring the avenue of creating a computed column for MONTH(datefield), and then creating an Index on the computed column. As an alternate solution, a friend suggested using triggers to populate a TINYINT column representing the month rather than a computed column. But as I began to explore this, I noticed that SQL Server was ignoring my index no matter what I did. That’s when I started asking why and the answer quickly became obvious: there are only 12 months in the American calendar and for that reason you can NEVER EVER create an effective index on a month.

The reason for this is explained on page 882 of Kalen Dalaney’s book “Inside SQL Server 2000” when she says “..If the index is used, it can eliminate most of the rows in the table from consideration and greatly reduce the work that must be performed. If the ratio of qualifying rows to total rows is high, the index has poor selectivity and will not be useful. A nonclustered index is most useful when the ratio is around 5 percent or less – that is, if the index can eliminate 95 percent selectivity, it probably will not be used; either a different index will be chosen or the table will be scanned…”

Since there are only 12 months on our calendar, the highest density you can ever have is 1 out of 12 which is 1/12=0.0833_ or about 8% density. As Kalen points out, SQL Server will gladly ignore any index with a density greater than about 5%. So, there is absolutely no way to get this query to use an index without writing a totally different WHERE clause based on totally different criteria. In short, 1 out of 12 is just not selective enough.

And SQL Server is making the correct decision when it decides to never use indexes based on such a high density value (I would say that selectivity is the reciprocal of density and that it’s actually density that should be lower than 5% not selectivity, but that just confuses the whole issue. The bottom line is that less than 5% of the data should match the index criteria or the non clustered index should not be used under any circumstance.)

First of all, it’s better to do a table scan or index scan when there are only a few pages of data involved, because it requires more effort/time to pull index pages into memory and then seek through them then it does just to load a page or two for the table and scan straight through it. In fact, my experience is that until you have about a million rows in a table or more, you can make all sorts of design mistakes and the database will still run like a Ferrari. So, the table has to reach a certain size before the issue of indexes is really an issue at all.

Furthermore, since this is a non-clustered index, it’s not really designed to do a range scan; it will have to travel through the B-tree branches for pretty much every record, which is much more I/O intensive than just loading in the table and scanning from beginning to end when it has to return 1/12th of the data. In fact, I believe what it would do would be to do a non-clustered index leaf level range scan and gather up all the row IDs or clustered index keys matching, then sort the page numbers needed and pull the necessary pages into memory. But it seems that when we are talking about pulling more than 5% of the index pages and then sorting and gathering all the necessary heap pages – or worse yet making multiple passes through the clustered index B-tree, it’s just easier to pull in the whole heap table and scan from beginning to end for the required records.

I considered using the sole clustered index for this query, but I think it would be a rare case indeed where using the sole clustered index for this query would be the best choice. However, if it makes sense to give this one query priority over all others, clustering on the computed column “MONTH(datefield)” does work… I tested it, and it seeks rather than scans. A better solution would be to look for another column that could filter out a large portion of the data, such as making use of the pre-existing clustered index. In example, WHERE SalesOrder=4298 AND MONTH(datefield)=11.

Oh. I should also mention that while I am in favor of explicit conversions rather than letting SQL Server make the conversion, my reasoning is that an explicit conversion is more clear to other developers who are reading your code. Although I don’t have any evidence to offer here, I believe that SQL Server is smart enough to make the conversion only one time before it really runs the query rather than on every row, and so the cost of conversions is probably a hand full of CPU cycles in total, and therefore it will be over before your mind can actually contemplate whether it’s over or not (If you lose 2 nanoseconds out of your server’s day in order to make the code easier to read, it’s probably worth it.) And for that reason, I won’t talk about how SQL Server is using variables rather than constants and that that’s inefficient when what you actually want is a constant to have one point of control over a constant value. Or point out that a conversion wastes CPU cycles unnecessarily (albeit probably not enough to ever notice in this case). While the Assembly Language programmer in me wants to make use of every CPU cycle, sometimes it really is worth losing a CPU cycle or two just to make the code easier to read. While I personally would be tempted to hardcode the month rather than use a “constant” (that isn’t really a constant), I do see the wisdom in not hard coding it. Besides, if this query were truly going to be used in the real world I can’t hardly image them not making the month as a parameter in a stored procedure and therefore not be hardcoded; this is another one of those areas where you should probably question what the original developer was actually trying to accomplish.

Using a “between” range from beginning of the month to the end of the month doesn’t improve the selectivity beyond 1 in 12 on average. So, I would say that’s not a good approach unless you change the requirements by changing the WHERE clause to be more specific and/or put it into a clustered index. The suggestion of doing this and using a UNION clause was a brilliant suggestion. In fact, I didn’t even realize that it needed a UNION clause until that was brought up. But once, I saw that it needed a UNION clause, I quickly realized you have to change the selection criteria to include specific years or wind up running additional queries to find MIN() and MAX() years and then looping through a cursor to UNION all the data together. But in the end, on a non-clustered index, it still won’t be selective enough to actually cause it to use the index that you create.

So, to sum it all up: a table scan is the most efficient way to gather one twelfth of the records or more, and without using a different search criteria, one twelfth of the data is the least amount you are ever going to get from searching with the month as the only criteria. And therefore, it is impossible to get SQL Server to use a non-clustered index if this is the result set desired.

But I do agree with the person who suggested that the interviewer is probably looking for how you answer this question (your thought process and confidence) rather than what your answer actually is. Because 9 out of 10 interviewers probably have not thought this whole thing completly through in this much detail. On the other hand, you might have that 1 out of 10 interviewers who have faced this issue first hand, and are wondering if you've been around long enough to experience the issue first hand. But even then, I can't imagine the interviewer counting much against someone who answered this question technically incorrect; like I said, there's a good chance the interviewer doesn't know the correct answer. I say this because I've only met a handfull of DBAs in my life that have enough knowledge of SQL Server internals to consider the things I've mentioned here. I definately consider this a VERY advanced question for a DBA, an advanced question for a T-SQL developer, and a semi-advanced question for a database architect (who, in my opinion, should always be more advanced than a DBA or developer in ALL areas related to performance.)

Oh. One more thing, a covered index is not comparable to a heap, clustered index, or a non-clustered index. A covered index, is a non-clustered index that is indexed on ALL the columns being returned. So, it is not a separate data structure. But heap tables, clustered indexes, and non-clustered indexes are three separate data structures (the only three when we are talking about indexing) that SQL Server uses. The idea of a covered index is that all the data being retrieved is in the non-clustered index itself, and therefore there is no need to look at the actual data, in order to get the data needed. Because it doesn’t have to load and look at the actual table, covered index seeks are very fast. So, a covered index is just a special type of non-clustered index. (A clustered index contains the whole table; so in a sense, all clustered tables are “sort of” like a covered index.)

And I would say that SQL Server can’t use indexes when the WHERE clause contains certain functions because it has to compute every value for every row before it can begin to search for what it needs, making it faster just to scan the whole table. An ISNULL on the other hand would be looking for NULL values that are stored in the table, which are “permanently” in the table and therefore does not need to be computed before the search can begin. So, even though it’s a function, it should still be compatible with indexes.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

a-brbeck (3/6/2008)
I submit that the absolute best optimization of the statement is: “SELECT column1, column2, column3 FROM table WHERE MONTH(datefield)=1”. In other words, with the exception of the use of an asterisk as a bad idea, the query as originally submitted is 100% optimized.
So, to sum it all up: a table scan is the most efficient way to gather one twelfth of the records or more, and without using a different search criteria, one twelfth of the data is the least amount you are ever going to get from searching with the month as the only criteria. And therefore, it is impossible to get SQL Server to use a non-clustered index if this is the result set desired.

Heh... Nice write up, but the hypothesis is wrong and so are the conclusions. Here's the proof...

--drop table jbmtest

--===== Create and populate a 1,000,000 row test table.

-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2001 non-unique date/times

-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

-- for all rows.

-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

-- Jeff Moden

SELECT TOP 1000000

RowNum = IDENTITY(INT,1,1),

SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),

SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

SomeDate = CAST(RAND(CHECKSUM(NEWID()))*365.0+36524.0 AS DATETIME),

SomeHex12 = RIGHT(NEWID(),12)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns t1,

Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

--===== A table is not properly formed unless a Primary Key has been assigned

ALTER TABLE dbo.JBMTest

ADD CONSTRAINT PK_JBMTest_RowNum PRIMARY KEY CLUSTERED (RowNum)

--===== Add the necessary covering index to show that we can get Index Seeks

-- even when 1/12th of the table is returned.

CREATE INDEX IX_JBMTest_Composite1 ON dbo.JBMTest(SomeDate, RowNum, SomeInt, SomeLetters2)

GO

--===== This query uses an index SEEK on the non-clustered index.

SELECT RowNum, SomeInt, SomeLetters2

FROM dbo.JBMTest

WHERE SomeDate >= '20000101'

AND SomeDate < '20000201'

... and, you might wanna take the code home with you because you used up all of your interview time 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

a-brbeck SSC Veteran Points: 294 More actions · Answer 10

You're right, I'm REALLY bad about using up all of my interview time. 😉

Thanks for your reply. I loaded up the query and ran it. But then I realized that the query does not entirely disprove what I said.

At a couple points I said things like, "Creating an entirely new WHERE clause based on something else which allows an index to be used would be far better than this query..." It's an ugly query, and the best thing to do, in all honesty, might be to throw it out and get a new query.:)

And that's exactly what you've done in the example: you've created new search criteria. What I mean is that the original example would return a result set for ALL rows where the date is from a given month regardless of year. So, by specifying specific years you have further limited the result set than what the original problem specified.

Your query filters out any year that is not the year 2000, and that's not what the original query does. The following query does not filter out data for years that are not the year 2000. In the data you created for this test, it only includes year 2000 data, which essentially causes it to work. You can't assume from the original problem that the database will not contain data from other years. And if it does, then your SELECT statement will return a different number of rows compared to this statement:

SELECT RowNum, SomeInt, SomeLetters2

FROM dbo.JBMTest

WHERE MONTH(SomeDate)=1

This is why it was mentioned that you need a UNION and multiple SELECT statements.

In effect, you've created test data that just happens to coincide with your point. But there was nothing in the original problem that suggested that only data from specific years will be in the database; you have to assume that any year that is a valid SQL Server DATETIME can be used. AND, your query has to return all years.

So, in order to prove me wrong, you have to write a query that does not pre-anticipate which years will be involved, but instead captures data for a given month for ALL years that can possibly be stored in the database. AND it has to run faster than a table scan on the same data. Otherwise, you are just re-writing the search criteria, and I have already said that re-writing the search criteria not only can make it work, but it is probably advisable.

Now, you may point out that even regardless of what I just said: your SELECT "seeks" on a month value, which I said is not possible. Ok. Point well taken. 🙂

This one REALLY made me think. I would even say "It bafled me for a moment, until I figured out what's going on with it." And I conceed here that you've "partly" proven me "somewhat" wrong. And you've definately taught me something here. I suppose we can chalk this one up to the voice of "Theory" (Me) vs. the voice of "Experience"(You), because you've obviously seen this problem before, whereas I have to admit that it is essentially new to me. Anyone reading this discussion thread would be well advised to carefully consider both sides of what is being said.

First of all, you are right: even though your query doesn't return all years, it DOES seek for a single month on a date range. BUT, before you break out the champaign 😉 take a look at this:

Create this index instead:

CREATE INDEX IX_JBMTest_Composite1 ON dbo.JBMTest(SomeDate)

When I use this index rather than the one you submitted, it does NOT seek, but scans as predicted.

Your index was:

CREATE INDEX IX_JBMTest_Composite1 ON dbo.JBMTest(SomeDate, RowNum, SomeInt, SomeLetters2)

and your query was:

SELECT RowNum, SomeInt, SomeLetters2

FROM dbo.JBMTest

WHERE SomeDate >= '20000101'

AND SomeDate < '20000201'

And I must conceed that this results in an Index Seek. But take a second look at that. "Index" Seek. This query is against a Clustered table. The query never reaches the table! If it did, you would have a "Clustered Index" Scan or Seek, I believe. 100% of the data is gathered without either a Clustered Index Scan or a Clustered Index Seek even though this is clustered data.

What you have done here is a wonderful example of a Covered Index. Which is kind of cool since the subject of Covered Indexes was brought up.

Notice, that all three of the columns you are selecting on, are included (covered) in your index. Because of this, SQL Server gathers all of the data it needs by "seeking" through the non-clustered index; it never has to go to the actual table at all for the data.

So, you've proven that Covered Indexes trump non-clustered indexes and that they are very effective in this situation if you can guarantee that the data you need is confined to a single year.

BUT, going back to the original discussion, the original problem specified that all columns were to be returned (the asterisk), so you've changed the question rather than solving the problem. To prove my point, run this query on the same data:

SELECT RowNum, SomeInt, SomeLetters2, SomeMoney

FROM dbo.JBMTest

WHERE SomeDate >= '20000101'

AND SomeDate < '20000201'

On my computer it has gone back to scanning and no longer seeking. All, I've done is added one column that is not covered in the index. And you probably already know that you can't cover ALL columns in an index or you defeat the purpose.

Essentially, by writing a Covered Index and matching the query to it, you've found a very effective way of returning this data quickly, which results in an Index Seek. So, you've made a good point. And I have to admit that I did not see how a Covered Index might be applicable here until you sent this.

But the original query, as you know, was:

SELECT * FROM table WHERE MONTH(datefield)='mm'

This query does not filter out any years and it returns all rows, which implies that all rows are needed in the result set, thereby excluding the use of a covered index in this specific case.

I definately conceed that you have some great ideas in solving this problem, and in a real world situation where you can change the result set that is being returned - you've offered some great ideas that are extremly useful. But, since this is a "Test" question, I think we have to assume that a successful answer has to be one that returns the EXACT same result set as the original query... just faster. And I hold to my original statements that it can't be done. The examples that you've submitted so far, change the question in order to reach the desired answer.

Care to take another crack at it? 🙂

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

Care to take another crack at it? 🙂

Why shore'nuf! 😀

You know... you're absolutely right... if the original requirement was to return ALL columns and ALL rows for ALL January's, there's no way to optimize the original code. It has to remain the same and no index except maybe a clustered index would help... and even then, we'd need some luck.

But, it depends... you're assuming that it's existing code... I'd be assuming that it's new code because the interviewers are gonna want to hear all the good stuff, right? 😉 (Wait for it... it's coming...)

You're almost correct about the SELECT List... one of the most important optimations is to get rid of the asterick and return only the columns that need to be returned. If after some research, it was determined that we only needed the columns I used in my change code, that change alone would shave 200 milliseconds off the process. Adding the covering non-clustered index would shave off another 200.

But, those aren't the only optimizations we can make if we're allowed to change the SELECT list.... We can still do better than that and still maintain the "all January's" filtering of the original code. If we make the following changes to the table...

ALTER TABLE dbo.JBMTest

ADD MyMonth AS MONTH(SomeDate)

CREATE INDEX IX_JBMTest_Composite2 ON dbo.JBMTest(MyMonth, RowNum, SomeInt, SomeLetters2)

... and we change the original code to the following using the an agreed upon SELECT list...

SELECT RowNum, SomeInt, SomeLetters2 FROM dbo.JBMTest WHERE MyMonth= 1

... the run time for the code drops to only 47 milliseconds or so and we've not changed the funtionality you were concerned with.

So, although my first attempt left quite a bit to be desired (proves that knee jerk code is a bad thing 😉 ) my second attempt shows that the hypothesis is still wrong and that the conclusion is still wrong. HOWEVER, that doesn't mean we should necessarily change the code. And that's where you and I thoroughly agee even if it is for different reasons...

Consider that the original code did it's job in only 1,000 milliseconds. Returned ~85k out of a million rows. We should have stopped right there. We changed the SELECT list of some code and added a column, calculated or not, to a table.

Now comes the hard part... we have to spend a pot wad of time doing regression testing to make sure that none of those changes broke any GUI code, any stored procedure, any view, or any Web code across how many applications? This is where a CIO would blow coffee out of his/her nose and start screaming some insane things like how stupid we are to want to spend 4 FTE's for a weeks time do a regression test to recover 953 milliseconds on a batch run that runs once a week... never mind the 4 hours we spent trying to optimize the code already... right?

I'd have never taken it that far... 😉 I'd have quit when I found out the code was doing it's job on a million rows in only 1000 milliseconds.

I believe I'd tell the interviewer(s) that, too... 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Johan Bijnens SSC Guru Points: 134873 More actions · Answer 12

Still I think your interviewer would have fallen asleep by about the 3th paragraph in your extended explanation. :w00t:

In the most optimal situation your interviewer would have had a checklist and just ticked the indicated keywords he needed to hear

in an fluent response.

As you can see, one of the questions to ask yourself with any optimisation is : is it needed ?

If the answer is yes : how much is needed or when is enough ?

But then again, when building / optimising a system, one should

strive for implemting best practises.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 13

Heh... it wouldn't take as long to explain in person as it did here, Johan... I'd have just pointed out the couple of optimizations that everyone expects, explained that there's an optimization that preserves the original intent, and then asked them if the testing was worth the miniscule savings... you're correct in that they're just looking for a little discussion with some key points in it. The other stuff would be a tie breaker if two folks interviewed equally as well.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions