Blog Post

Rolling your own index lookups

,

Over the past few weeks, I’ve been training developers in what they really need to know about SQL Server to produce efficient and robust SQL code.  One of the areas we looked at was the “Index Lookup” aka “Bookmark Lookup”.  An index lookup occurs when a non-clustered index seek is used to find the required rows, but to fully resolve the query data that is not held in the index is needed, and so a “Lookup” of the row in the table data is instigated.  If the table has a clustered index then the clustered key is used to drive the lookup, else a rowid is used.

If you execute this :

select LastName,BusinessEntityID
from Person.Person
where LastName = 'Krane'

Then the execution plan will be

image

The query has been fully resolved using the non-clustered index “IX_Person_LastName_FirstName_MiddleName”.  BusinessEntityId is the clustering key,  this means that it will be implicitly included within any non-clustered index on Person.Person.

If we add “FirstName” to our selected columns , the plan will be

 image

Exactly the same as before, the FirstName column has been returned by using the data within the index.  However, if we add the column ‘RowGuid’ into our query:

select LastName,Firstname,BusinessEntityID,rowguid
from Person.Person
where LastName = 'Krane'

image

We have an additional Key Lookup Operator added.  The data for RowGuid is not contained within the index itself and so the engine has had to find and pull it from the base table data.

The question that I was asked about this is “Why is the Lookup so slow ?”.  The answer to that is, “Its not, you just have to appreciate the amount of work that is required”.  The point is that by looking up in the non-clustered index,  you may well find that all the required data is on a single 8k page.  Let us assume that you have 100 rows of data on this one single page.  Now, for each of those rows the engine will have to instigate a lookup.  So, as there are 100 rows thats 100 lookups,  the engine will now have to find and read 100 pages of table data.  So, ‘slow’ it is not, doing a lot more work it is.

The way to mitigate this lookup overhead is to create a “covering” index.  This means adding the required data, in the case above rowguid, explicitly into the index and so avoiding the lookup.  In 2005, the INCLUDE clause was added to SQLServer, so that although the data is held within the index,  it does not form part of the key and so the overhead of maintaining the column with the key itself is avoided.

There are some scenarios where you may be prevented from doing this for operational reason,  DML performance or schema ownership issues , are just two but , if the existing indexes are ‘aligned’ then we can still avoid the overhead of reading the base data.  We will still have to lookup *some* data, but as the clustered index is the widest set of data we can still improve performance by lookup up in a different non-clustered index.

Lets first build some test data and define some indexes.

Create Table Lookup
(
Id integer identity Primary Key,
Status integer not null,
ChildId integer not null,
ReadData integer not null,
Filler char(512) not null
)
go

with cteData
as(
Select top(3000000)
ROW_NUMBER() over (Order by (select null)) as Rown
from sys.columns a
cross join sys.columns b
cross join sys.columns c
)
insert into Lookup(Status,ChildId,ReadData,Filler)
select Rown%100000,
abs(CHECKSUM(newid())),
Rown%10000,
'djdjdjdjdjd'
from cteData

go
create index idxLookupStatus on Lookup(status) include (childid)
go
create index idxChild on Lookup(childid) include (ReadData)
go
Our starting query is
 
select ReadData
from Lookup
where Status = 10000
which produces a plan of
 
image

And running with 'set statistics io on’ , we can see that a total of 133 logical reads have been issued.
 
Interestingly, running the query
 
select KeyLookup.ReadData
from Lookup
join Lookup KeyLookup
on Lookup.Id = KeyLookup.Id
where Lookup.Status = 10000

produces the plan of

image

and again 133 logical reads are used,  so there really is not much difference between a key lookup and clustered index seek.  They are pretty much synonymous here.  In our table we have the data ‘Filler’ which is not required to satisfy our query, but is still begin read by both the above two methods,  is there a way not to read that ?
 
If you look at the index definitions above, you will see that idxchild could be used to satisfy  our query rather than doing the lookup.  We know the childid we require as that is INCLUDE’d into idxLookupStatus which we are already seeking upon.  We could use that to seek inside the idxChild index in which readdata is included.  So what if we formulated to exploit this relationship in the data ?
 
select KeyLookup.ReadData
from Lookup
join Lookup KeyLookup
on Lookup.Id = KeyLookup.Id
and Lookup.ChildId = KeyLookup.ChildId
where Lookup.Status = 10000

image

103 Logical IO’s a nice drop of 25%.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating