Any idea to make it faster?

Question

Any idea to make it faster?

guostong

SSC Journeyman

Points: 93
More actions
May 2, 2008 at 9:01 am

#185047

Here is my problem:
I have a table with 100 millions rows, table's structure is:
t_all_combination
(
id bigin primary key,
gen_date datetime ,
num1 int,
num2 int,
...
num20 int
)
I want find out the rows contain some spesific group of number, for example: I want the rows which contain 12 and 35 in num1,num2 ... num20
select * from t_all_combination
where 12 in ( num1, num2 ... num20 )
and 35 in (num1, num2 ... num20 )
from num1 to num20 the stored number is sorted
is there any way to make the query faster?

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Jack Corbett SSC Guru Points: 184388 More actions · Answer 1

I'm not sure that there is a good way to write this query. It is hard to determine if there is a way to make it faster as you have not given us anything to compare it to.

I created the table with 10000 records and get the same execution plan and io stats with your query and one using num1 in (12,35), etc... The or'ing required would even make indexing not worthwhile as well. My only recommendation is to not use select * and see if you can limit rows by the date.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Sergiy SSC Guru Points: 110192 More actions · Answer 2

Effective query starts from CREATE TABLE statement.

The way the table is built there is no way to get fast response on a query like yours.

Convert it into something like this:

CREATE TABLE t_all_combination

(

id bigin ,

gen_date datetime ,

Number tinyint,

Value int,

PRIMARY KEY (id, Number)

)

Don't forget about

CREATE INDEX IX_t_all_combination_Value ON t_all_combination(Value)

Then you may run

SELECT * FROM t_all_combination

WHERE Value IN (12, 35)

and it will be terribly fast.

Also, if you often query records for specified time period I would suggest to make PK nonclustered and set up clustered index on gen_date

_____________
Code for TallyGenerator

RBarryYoung SSC Guru Points: 143329 More actions · Answer 3

Sergiy (5/2/2008)
Then you may run
SELECT * FROM t_all_combination
WHERE Value IN (12, 35)

Sergiy: Shouldn't that be something like this?:

SELECT * FROM t_all_combination

WHERE Id IN (Select t2.Id From t_all_combination t2 Where t2.Value = 35)

And Id IN (Select t2.Id From t_all_combination t2 Where t2.Value = 12)

Order By Id, Number

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Sergiy SSC Guru Points: 110192 More actions · Answer 4

rbarryyoung (5/2/2008)
Sergiy: Shouldn't that be something like this?:

I guess (can only guess here) OP is interested in id and date values only.

If it's true then my query will give it as it is.

If OP needs all other values for selected id's then it would be easier to join my query back to the table:

SELECT *

FROM t_all_combination T0

WHERE EXISTS (select 1 from t_all_combination t2

where T2.id = T0.id and t2.Value IN (12, 35)

)

And forgot to mention for OP:

column Number to contain numbers 1..20 to identify which parameter is stored in the record.

INSERT INTO t_all_combination_NEW

(id, gen_date, Number, Value)

SELECT id, gen_date, 1, num1

FROM t_all_combination_OLD

UNION ALL

SELECT id, gen_date, 2, num2

FROM t_all_combination_OLD

UNION ALL

SELECT id, gen_date, 3, num3

FROM t_all_combination_OLD

UNION ALL

....

SELECT id, gen_date, 20, num20

FROM t_all_combination_OLD

_____________
Code for TallyGenerator

guostong SSC Journeyman Points: 93 More actions · Answer 5

Thank you guys for reply, I think some poeple misunderstand me:

what I need is: to determine which records in my table contain a combination.

each record has 20 numbers: num1, num2 .... num20

I want find out those records which contain ( 12,35 ) in (num1, num2 .... num20)

Sergiy SSC Guru Points: 110192 More actions · Answer 6

Query from my version of the table will look like this:

SELECT *

FROM t_all_combination T0

WHERE EXISTS (

select 1

from t_all_combination t1

inner join t_all_combination t2 ON T1.id = T2.id

where T2.id = T0.id and t1.Value = 12 and t2.Value = 35

)

_____________
Code for TallyGenerator

souLTower SSCarpal Tunnel Points: 4026 More actions · Answer 7

I'll take a stab at this. I figure that the way the table is designed you don't have, and likely couldn't have the indexes needed to speed this up so the goal is to only go through the data once.

SELECT yourField from yourTable WHERE

(

num1 = 12 OR num2 = 12 OR num3 = 12 OR num4 = 12.....

) AND

(

num1 = 35 OR num2 = 35 OR num3 = 35 OR num4 = 35....

)

This may not be an option to you but I think that this exposes a bad table design. I'm guessing from your post that this series of 20 numbers is a combination of some kind. Therefore I would suggest the following structure:

create table mainRecord

(

unqid int identity(1, 1),

information varchar(50)

)

create table combination

(

mainRecID int, -- foreign key to mainRecord

combinationIndex int, -- 1 - 20

combinationValue -- Value entered

)

With this structure you would be able to create indexes that would speed the query. With these indexes you could write the query like this:

-- This query will get any record which contains both 12 and 35

-- in the combination. The logic being that the count would

-- be greater than 1 if both values were found. This assumes that

-- you can not repeat a value

(

SELECT mainRecordID from Combinations

WHERE combinationValue IN (12, 35)

GROUP BY mainRecordID

HAVING COUNT(mainRecordID) > 1

)

ST

RBarryYoung SSC Guru Points: 143329 More actions · Answer 8

Frankly, guostong, with that table design, I would probably just go with your original query. Unless you change the table, you probably cannot get anything significantly faster anyway.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

GSquared SSC Guru Points: 260824 More actions · Answer 9

I created a table using your structure, then populated it with 1-million rows of random numbers between 0 and 999.

I tried a couple of different query methods:

(The table name is "T", just because I was too lazy to type out your table name for my tests. I picked a pair of random numbers for my tests.)

Query 1:

select *

from dbo.t

where 887 in (num1, num2, num3, num4, num5, num6, num7, num8, num9, num10, num11,

num12, num13, num14, num15, num16, num17, num18, num19, num20)

and 161 in (num1, num2, num3, num4, num5, num6, num7, num8, num9, num10, num11,

num12, num13, num14, num15, num16, num17, num18, num19, num20)

With various numbers, took an average run time of 8 seconds.

Query 2:

select distinct t.*

from dbo.t

inner join

(select id as s1id

from dbo.t

where num1 = 887

union all

select id

from dbo.t

where num2 = 887

union all

select id

from dbo.t

where num3 = 887

union all

select id

from dbo.t

where num4 = 887

union all

select id

from dbo.t

where num5 = 887

union all

select id

from dbo.t

where num6 = 887

union all

select id

from dbo.t

where num7 = 887

union all

select id

from dbo.t

where num8 = 887

union all

select id

from dbo.t

where num9 = 887

union all

select id

from dbo.t

where num10 = 887

union all

select id

from dbo.t

where num11 = 887

union all

select id

from dbo.t

where num12 = 887

union all

select id

from dbo.t

where num13 = 887

union all

select id

from dbo.t

where num14 = 887

union all

select id

from dbo.t

where num15 = 887

union all

select id

from dbo.t

where num16 = 887

union all

select id

from dbo.t

where num17 = 887

union all

select id

from dbo.t

where num18 = 887

union all

select id

from dbo.t

where num19 = 887

union all

select id

from dbo.t

where num20 = 887) Sub1

on t.id = sub1.s1id

inner join

(select id as s2id

from dbo.t

where num1 = 161

union all

select id

from dbo.t

where num2 = 161

union all

select id

from dbo.t

where num3 = 161

union all

select id

from dbo.t

where num4 = 161

union all

select id

from dbo.t

where num5 = 161

union all

select id

from dbo.t

where num6 = 161

union all

select id

from dbo.t

where num7 = 161

union all

select id

from dbo.t

where num8 = 161

union all

select id

from dbo.t

where num9 = 161

union all

select id

from dbo.t

where num10 = 161

union all

select id

from dbo.t

where num11 = 161

union all

select id

from dbo.t

where num12 = 161

union all

select id

from dbo.t

where num13 = 161

union all

select id

from dbo.t

where num14 = 161

union all

select id

from dbo.t

where num15 = 161

union all

select id

from dbo.t

where num16 = 161

union all

select id

from dbo.t

where num17 = 161

union all

select id

from dbo.t

where num18 = 161

union all

select id

from dbo.t

where num19 = 161

union all

select id

from dbo.t

where num20 = 161) Sub2

on t.id = sub2.s2id

Average run time of 4 seconds.

Then I added some indexes:

create index IDX_T_Num1 on dbo.t(num1)

create index IDX_T_Num2 on dbo.t(num2)

create index IDX_T_Num3 on dbo.t(num3)

create index IDX_T_Num4 on dbo.t(num4)

create index IDX_T_Num5 on dbo.t(num5)

create index IDX_T_Num6 on dbo.t(num6)

create index IDX_T_Num7 on dbo.t(num7)

create index IDX_T_Num8 on dbo.t(num8)

create index IDX_T_Num9 on dbo.t(num9)

create index IDX_T_Num10 on dbo.t(num10)

create index IDX_T_Num11 on dbo.t(num11)

create index IDX_T_Num12 on dbo.t(num12)

create index IDX_T_Num13 on dbo.t(num13)

create index IDX_T_Num14 on dbo.t(num14)

create index IDX_T_Num15 on dbo.t(num15)

create index IDX_T_Num16 on dbo.t(num16)

create index IDX_T_Num17 on dbo.t(num17)

create index IDX_T_Num18 on dbo.t(num18)

create index IDX_T_Num19 on dbo.t(num19)

create index IDX_T_Num20 on dbo.t(num20)

This cut both of the above selects down significantly. After the indexes were created, the first query (with variation in the numbers) took an average of 1.6 seconds elapsed, and just over 3 seconds of CPU time.

The second one took .3 seconds elapsed, .3 seconds CPU.

The first one, however, has better IO characteristics (3 scans vs 41 scans).

In summary: Creating indexes on the columns does add to the disk space, and so on, but if you can afford it, it does very nice things to the select speed on the table. Either select solution will be faster because of those indexes.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 10

In case it matters, the table I created was 104 Meg, while the indexes were 350 Meg.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Sergiy SSC Guru Points: 110192 More actions · Answer 11

Now try to estimate performance of updates ad inserts on this table.

Take into account locks to be applied not only on table pages but on index pages as well.

_____________
Code for TallyGenerator

GSquared SSC Guru Points: 260824 More actions · Answer 12

Sergiy (5/6/2008)
Now try to estimate performance of updates ad inserts on this table.
Take into account locks to be applied not only on table pages but on index pages as well.

I tested inserting 1000 rows of data both with and without the indexes.

With:

insert into dbo.t (num1, num2, num3, num4, num5, num6, num7, num8, num9, num10,

num11, num12, num13, num14, num15, num16, num17, num18, num19, num20)

select 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1

from dbo.numbers

where number between 1 and 1000

Results:

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'T'. Scan count 0, logical reads 83932, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 20, logical reads 2391, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Numbers'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 297 ms, elapsed time = 296 ms.

Without indexes (same insert):

SQL Server parse and compile time:
CPU time = 5 ms, elapsed time = 5 ms.
Table 'T'. Scan count 0, logical reads 3317, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Numbers'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 17 ms.
(1000 row(s) affected)

20 times as long with the indexes, but still less than half a second for 1000 rows. It'll depend on the traffic on the table (is it mostly insert/update/delete or mostly select), and where the speed is needed (same comparison), but from the original post, I'm thinking just over a quarter second to add 1000 rows is probably less of a problem than over 8 seconds for the kind of select he's doing.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon