Any idea to make it faster?

  • Here is my problem:

    I have a table with 100 millions rows, table's structure is:

    t_all_combination

    (

    id bigin primary key,

    gen_date datetime ,

    num1 int,

    num2 int,

    ...

    num20 int

    )

    I want find out the rows contain some spesific group of number, for example: I want the rows which contain 12 and 35 in num1,num2 ... num20

    select * from t_all_combination

    where 12 in ( num1, num2 ... num20 )

    and 35 in (num1, num2 ... num20 )

    from num1 to num20 the stored number is sorted

    is there any way to make the query faster?

  • I'm not sure that there is a good way to write this query. It is hard to determine if there is a way to make it faster as you have not given us anything to compare it to.

    I created the table with 10000 records and get the same execution plan and io stats with your query and one using num1 in (12,35), etc... The or'ing required would even make indexing not worthwhile as well. My only recommendation is to not use select * and see if you can limit rows by the date.

  • Effective query starts from CREATE TABLE statement.

    The way the table is built there is no way to get fast response on a query like yours.

    Convert it into something like this:

    CREATE TABLE t_all_combination

    (

    id bigin ,

    gen_date datetime ,

    Number tinyint,

    Value int,

    PRIMARY KEY (id, Number)

    )

    Don't forget about

    CREATE INDEX IX_t_all_combination_Value ON t_all_combination(Value)

    Then you may run

    SELECT * FROM t_all_combination

    WHERE Value IN (12, 35)

    and it will be terribly fast.

    Also, if you often query records for specified time period I would suggest to make PK nonclustered and set up clustered index on gen_date

    _____________
    Code for TallyGenerator

  • Sergiy (5/2/2008)


    Then you may run

    SELECT * FROM t_all_combination

    WHERE Value IN (12, 35)

    Sergiy: Shouldn't that be something like this?:

    SELECT * FROM t_all_combination

    WHERE Id IN (Select t2.Id From t_all_combination t2 Where t2.Value = 35)

    And Id IN (Select t2.Id From t_all_combination t2 Where t2.Value = 12)

    Order By Id, Number

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • rbarryyoung (5/2/2008)

    Sergiy: Shouldn't that be something like this?:

    I guess (can only guess here) OP is interested in id and date values only.

    If it's true then my query will give it as it is.

    If OP needs all other values for selected id's then it would be easier to join my query back to the table:

    SELECT *

    FROM t_all_combination T0

    WHERE EXISTS (select 1 from t_all_combination t2

    where T2.id = T0.id and t2.Value IN (12, 35)

    )

    And forgot to mention for OP:

    column Number to contain numbers 1..20 to identify which parameter is stored in the record.

    INSERT INTO t_all_combination_NEW

    (id, gen_date, Number, Value)

    SELECT id, gen_date, 1, num1

    FROM t_all_combination_OLD

    UNION ALL

    SELECT id, gen_date, 2, num2

    FROM t_all_combination_OLD

    UNION ALL

    SELECT id, gen_date, 3, num3

    FROM t_all_combination_OLD

    UNION ALL

    ....

    SELECT id, gen_date, 20, num20

    FROM t_all_combination_OLD

    _____________
    Code for TallyGenerator

  • Thank you guys for reply, I think some poeple misunderstand me:

    what I need is: to determine which records in my table contain a combination.

    each record has 20 numbers: num1, num2 .... num20

    I want find out those records which contain ( 12,35 ) in (num1, num2 .... num20)

  • Query from my version of the table will look like this:

    SELECT *

    FROM t_all_combination T0

    WHERE EXISTS (

    select 1

    from t_all_combination t1

    inner join t_all_combination t2 ON T1.id = T2.id

    where T2.id = T0.id and t1.Value = 12 and t2.Value = 35

    )

    _____________
    Code for TallyGenerator

  • I'll take a stab at this. I figure that the way the table is designed you don't have, and likely couldn't have the indexes needed to speed this up so the goal is to only go through the data once.

    SELECT yourField from yourTable WHERE

    (

    num1 = 12 OR num2 = 12 OR num3 = 12 OR num4 = 12.....

    ) AND

    (

    num1 = 35 OR num2 = 35 OR num3 = 35 OR num4 = 35....

    )

    This may not be an option to you but I think that this exposes a bad table design. I'm guessing from your post that this series of 20 numbers is a combination of some kind. Therefore I would suggest the following structure:

    create table mainRecord

    (

    unqid int identity(1, 1),

    information varchar(50)

    )

    create table combination

    (

    mainRecID int, -- foreign key to mainRecord

    combinationIndex int, -- 1 - 20

    combinationValue -- Value entered

    )

    With this structure you would be able to create indexes that would speed the query. With these indexes you could write the query like this:

    -- This query will get any record which contains both 12 and 35

    -- in the combination. The logic being that the count would

    -- be greater than 1 if both values were found. This assumes that

    -- you can not repeat a value

    (

    SELECT mainRecordID from Combinations

    WHERE combinationValue IN (12, 35)

    GROUP BY mainRecordID

    HAVING COUNT(mainRecordID) > 1

    )

    ST

  • Frankly, guostong, with that table design, I would probably just go with your original query. Unless you change the table, you probably cannot get anything significantly faster anyway.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I created a table using your structure, then populated it with 1-million rows of random numbers between 0 and 999.

    I tried a couple of different query methods:

    (The table name is "T", just because I was too lazy to type out your table name for my tests. I picked a pair of random numbers for my tests.)

    Query 1:

    select *

    from dbo.t

    where 887 in (num1, num2, num3, num4, num5, num6, num7, num8, num9, num10, num11,

    num12, num13, num14, num15, num16, num17, num18, num19, num20)

    and 161 in (num1, num2, num3, num4, num5, num6, num7, num8, num9, num10, num11,

    num12, num13, num14, num15, num16, num17, num18, num19, num20)

    With various numbers, took an average run time of 8 seconds.

    Query 2:

    select distinct t.*

    from dbo.t

    inner join

    (select id as s1id

    from dbo.t

    where num1 = 887

    union all

    select id

    from dbo.t

    where num2 = 887

    union all

    select id

    from dbo.t

    where num3 = 887

    union all

    select id

    from dbo.t

    where num4 = 887

    union all

    select id

    from dbo.t

    where num5 = 887

    union all

    select id

    from dbo.t

    where num6 = 887

    union all

    select id

    from dbo.t

    where num7 = 887

    union all

    select id

    from dbo.t

    where num8 = 887

    union all

    select id

    from dbo.t

    where num9 = 887

    union all

    select id

    from dbo.t

    where num10 = 887

    union all

    select id

    from dbo.t

    where num11 = 887

    union all

    select id

    from dbo.t

    where num12 = 887

    union all

    select id

    from dbo.t

    where num13 = 887

    union all

    select id

    from dbo.t

    where num14 = 887

    union all

    select id

    from dbo.t

    where num15 = 887

    union all

    select id

    from dbo.t

    where num16 = 887

    union all

    select id

    from dbo.t

    where num17 = 887

    union all

    select id

    from dbo.t

    where num18 = 887

    union all

    select id

    from dbo.t

    where num19 = 887

    union all

    select id

    from dbo.t

    where num20 = 887) Sub1

    on t.id = sub1.s1id

    inner join

    (select id as s2id

    from dbo.t

    where num1 = 161

    union all

    select id

    from dbo.t

    where num2 = 161

    union all

    select id

    from dbo.t

    where num3 = 161

    union all

    select id

    from dbo.t

    where num4 = 161

    union all

    select id

    from dbo.t

    where num5 = 161

    union all

    select id

    from dbo.t

    where num6 = 161

    union all

    select id

    from dbo.t

    where num7 = 161

    union all

    select id

    from dbo.t

    where num8 = 161

    union all

    select id

    from dbo.t

    where num9 = 161

    union all

    select id

    from dbo.t

    where num10 = 161

    union all

    select id

    from dbo.t

    where num11 = 161

    union all

    select id

    from dbo.t

    where num12 = 161

    union all

    select id

    from dbo.t

    where num13 = 161

    union all

    select id

    from dbo.t

    where num14 = 161

    union all

    select id

    from dbo.t

    where num15 = 161

    union all

    select id

    from dbo.t

    where num16 = 161

    union all

    select id

    from dbo.t

    where num17 = 161

    union all

    select id

    from dbo.t

    where num18 = 161

    union all

    select id

    from dbo.t

    where num19 = 161

    union all

    select id

    from dbo.t

    where num20 = 161) Sub2

    on t.id = sub2.s2id

    Average run time of 4 seconds.

    Then I added some indexes:

    create index IDX_T_Num1 on dbo.t(num1)

    create index IDX_T_Num2 on dbo.t(num2)

    create index IDX_T_Num3 on dbo.t(num3)

    create index IDX_T_Num4 on dbo.t(num4)

    create index IDX_T_Num5 on dbo.t(num5)

    create index IDX_T_Num6 on dbo.t(num6)

    create index IDX_T_Num7 on dbo.t(num7)

    create index IDX_T_Num8 on dbo.t(num8)

    create index IDX_T_Num9 on dbo.t(num9)

    create index IDX_T_Num10 on dbo.t(num10)

    create index IDX_T_Num11 on dbo.t(num11)

    create index IDX_T_Num12 on dbo.t(num12)

    create index IDX_T_Num13 on dbo.t(num13)

    create index IDX_T_Num14 on dbo.t(num14)

    create index IDX_T_Num15 on dbo.t(num15)

    create index IDX_T_Num16 on dbo.t(num16)

    create index IDX_T_Num17 on dbo.t(num17)

    create index IDX_T_Num18 on dbo.t(num18)

    create index IDX_T_Num19 on dbo.t(num19)

    create index IDX_T_Num20 on dbo.t(num20)

    This cut both of the above selects down significantly. After the indexes were created, the first query (with variation in the numbers) took an average of 1.6 seconds elapsed, and just over 3 seconds of CPU time.

    The second one took .3 seconds elapsed, .3 seconds CPU.

    The first one, however, has better IO characteristics (3 scans vs 41 scans).

    In summary: Creating indexes on the columns does add to the disk space, and so on, but if you can afford it, it does very nice things to the select speed on the table. Either select solution will be faster because of those indexes.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • In case it matters, the table I created was 104 Meg, while the indexes were 350 Meg.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Now try to estimate performance of updates ad inserts on this table.

    Take into account locks to be applied not only on table pages but on index pages as well.

    _____________
    Code for TallyGenerator

  • Sergiy (5/6/2008)


    Now try to estimate performance of updates ad inserts on this table.

    Take into account locks to be applied not only on table pages but on index pages as well.

    I tested inserting 1000 rows of data both with and without the indexes.

    With:

    insert into dbo.t (num1, num2, num3, num4, num5, num6, num7, num8, num9, num10,

    num11, num12, num13, num14, num15, num16, num17, num18, num19, num20)

    select 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1

    from dbo.numbers

    where number between 1 and 1000

    Results:

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table 'T'. Scan count 0, logical reads 83932, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 20, logical reads 2391, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Numbers'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 297 ms, elapsed time = 296 ms.

    Without indexes (same insert):

    SQL Server parse and compile time:

    CPU time = 5 ms, elapsed time = 5 ms.

    Table 'T'. Scan count 0, logical reads 3317, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Numbers'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 15 ms, elapsed time = 17 ms.

    (1000 row(s) affected)

    20 times as long with the indexes, but still less than half a second for 1000 rows. It'll depend on the traffic on the table (is it mostly insert/update/delete or mostly select), and where the speed is needed (same comparison), but from the original post, I'm thinking just over a quarter second to add 1000 rows is probably less of a problem than over 8 seconds for the kind of select he's doing.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply