How to implement the threading in storeprocedures without duplicate

Question

How to implement the threading in storeprocedures without duplicate

polo.csit

SSCrazy

Points: 2574
More actions
July 26, 2018 at 3:20 am

#372524

Hi All,
    My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below
;with P as
(
select *,ROW_NUMber() over(order by id) as rownum,
ntile(5) over(order by id) as thread
from [Records]
where not exists(
                select name from
                [Data]
                where
                Convert(varchar(10),date,103) = @date
                and
                [Records].name = [Data].name
)
select name from P where thread = 1
Regards
Pols

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 245813 More actions · Answer 1

polo.csit - Thursday, July 26, 2018 3:20 AM
Hi All,
    My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below
;with P as
(
select *,ROW_NUMber() over(order by id) as rownum,
ntile(5) over(order by id) as thread
from [Records]
where not exists(
                select name from
                [Data]
                where
                Convert(varchar(10),date,103) = @date
                and
                [Records].name = [Data].name
)
select name from P where thread = 1
Regards
Pols

Why do you need to do any sort of threading when there are only 10,000 rows?

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

polo.csit SSCrazy Points: 2574 More actions · Answer 2

my table not 10000 records it is 400000 records...

sgmunson SSC Guru Points: 110588 More actions · Answer 3

polo.csit - Thursday, July 26, 2018 3:20 AM
Hi All,
    My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below
;with P as
(
select *,ROW_NUMber() over(order by id) as rownum,
ntile(5) over(order by id) as thread
from [Records]
where not exists(
                select name from
                [Data]
                where
                Convert(varchar(10),date,103) = @date
                and
                [Records].name = [Data].name
)
select name from P where thread = 1
Regards
Pols

Part of the problem with speed here is the use of a function on a column in your WHERE clause. Change the @date parameter to be of the same data type as the date column in your table, and then be sure there's an index on that table with the date column as the first column, and possibly the name column as the 2nd one, and you will probably run fast enough that 400,000 rows won't be a problem.

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 4

There's no way to 'thread' within a stored procedure. You can have multiple procedures that each work on a subset of the data, run them from jobs or different connections.

That said, what are you trying to achieve? What's the root problem that you're trying to solve here?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 5

From a functional point of view - the way you're trying to bucket into multiple groups isn't "stable": if you ever changed the data size, then the bucket assignments will start shifting from one bucket into another.

Example: if you had 1000 rows with id's 1-1000, using NTILE(5)
rows
1-200 -> bucket 1
201-400-> bucket 2
401-600 -> bucket 3
601-800->bucket 4
801-1000-> bucket 5
once you add ANOTHER 100 to your mix, then
1-400-> bucket 1
401-800-> bucket 2
801-1200 -> bucket 3
1201-1600->bucket 4
1601-2000-> bucket 5
Notice that your "new" bucket 1 now holds all of buckets 1 and 2 from the first run, etc.... If you're trying to split work up this way - that's NOT the right splitting strategy. Try
id % 5
instead (that % represents MODULO here)

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Phil Parkin SSC Guru Points: 245813 More actions · Answer 6

Matt Miller (4) - Thursday, July 26, 2018 8:03 AM
From a functional point of view - the way you're trying to bucket into multiple groups isn't "stable": if you ever changed the data size, then the bucket assignments will start shifting from one bucket into another.
Example: if you had 1000 rows with id's 1-1000, using NTILE(5)
rows
1-200 -> bucket 1
201-400-> bucket 2
401-600 -> bucket 3
601-800->bucket 4
801-1000-> bucket 5
once you add ANOTHER 100 to your mix, then
1-400-> bucket 1
401-800-> bucket 2
801-1200 -> bucket 3
1201-1600->bucket 4
1601-2000-> bucket 5
Notice that your "new" bucket 1 now holds all of buckets 1 and 2 from the first run, etc.... If you're trying to split work up this way - that's NOT the right splitting strategy. Try
id % 5
instead (that % represents MODULO here)

Indeed. Better to have a semi-static BucketList (heh!) table, which is updated from time to time outside of the 'multi-threaded' process:

1-400 -> bucket 1
401-800 -> bucket 2
801-1200 -> bucket 3
1201-1600 ->bucket 4
1601-âˆž -> bucket 5

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

polo.csit SSCrazy Points: 2574 More actions · Answer 7

sgmunson - Thursday, July 26, 2018 6:40 AM
polo.csit - Thursday, July 26, 2018 3:20 AM
Hi All,
    My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below
;with P as
(
select *,ROW_NUMber() over(order by id) as rownum,
ntile(5) over(order by id) as thread
from [Records]
where not exists(
                select name from
                [Data]
                where
                Convert(varchar(10),date,103) = @date
                and
                [Records].name = [Data].name
)
select name from P where thread = 1
Regards
Pols
Part of the problem with speed here is the use of a function on a column in your WHERE clause. Change the @date parameter to be of the same data type as the date column in your table, and then be sure there's an index on that table with the date column as the first column, and possibly the name column as the 2nd one, and you will probably run fast enough that 400,000 rows won't be a problem.

here speed is not a problem..the problem is duplicate loading in threads.

Phil Parkin SSC Guru Points: 245813 More actions · Answer 8

polo.csit - Thursday, July 26, 2018 9:44 AM
There speed is not a problem..the problem is duplicate loading in threads.

Please answer Gail's question.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

sgmunson SSC Guru Points: 110588 More actions · Answer 9

polo.csit - Thursday, July 26, 2018 9:44 AM
sgmunson - Thursday, July 26, 2018 6:40 AM
polo.csit - Thursday, July 26, 2018 3:20 AM
Hi All,
    My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below
;with P as
(
select *,ROW_NUMber() over(order by id) as rownum,
ntile(5) over(order by id) as thread
from [Records]
where not exists(
                select name from
                [Data]
                where
                Convert(varchar(10),date,103) = @date
                and
                [Records].name = [Data].name
)
select name from P where thread = 1
Regards
Pols
Part of the problem with speed here is the use of a function on a column in your WHERE clause. Change the @date parameter to be of the same data type as the date column in your table, and then be sure there's an index on that table with the date column as the first column, and possibly the name column as the 2nd one, and you will probably run fast enough that 400,000 rows won't be a problem.
here speed is not a problem..the problem is duplicate loading in threads.

Really? Then why are you needing to split it up into batches?

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

Jonathan AC Roberts SSCoach Points: 17601 More actions · Answer 10

polo.csit - Thursday, July 26, 2018 3:20 AM
Hi All,
    My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below
;with P as
(
select *,ROW_NUMber() over(order by id) as rownum,
ntile(5) over(order by id) as thread
from [Records]
where not exists(
                select name from
                [Data]
                where
                Convert(varchar(10),date,103) = @date
                and
                [Records].name = [Data].name
)
select name from P where thread = 1
Regards
Pols

I had to do something similar once when sending data to a CMS system over the internet. To get enough rows sent I had to create multiple processes to read the rows as sending each row over the internet to the CMS system had a long wait for the response but if sent in parallel I could increase the rate at which they were sent. So if you are doing what I think you are doing then you can split the rows using the T-SQL mod operator: '%' using some code similar to this:
DECLARE @ThreadToFind as int SET @ThreadToFind=1

;with P as ( select *,ROW_NUMber() over(order by id) as rownum, id%5 as thread from [Records] where not exists(select name from [Data] where Convert(varchar(10),date,103) = @date and [Records].name = [Data].name ) select name from P where thread = @ThreadToFind
This will split the rows into 5 different threads (0 to 4) and you won't get duplicates as it only relies on the value of the Id column to split the rows.

polo.csit SSCrazy Points: 2574 More actions · Answer 11

polo.csit

SSCrazy

Points: 2574

July 27, 2018 at 4:30 am

#1999213

duplicate records are loaded in threads...

Phil Parkin SSC Guru Points: 245813 More actions · Answer 12

polo.csit - Friday, July 27, 2018 4:30 AM
duplicate records are loaded in threads...

You've said this before. Why are you refusing to answer our questions?

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

polo.csit SSCrazy Points: 2574 More actions · Answer 13

Jonathan AC Roberts - Thursday, July 26, 2018 3:25 PM
polo.csit - Thursday, July 26, 2018 3:20 AM
Hi All,
    My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below
;with P as
(
select *,ROW_NUMber() over(order by id) as rownum,
ntile(5) over(order by id) as thread
from [Records]
where not exists(
                select name from
                [Data]
                where
                Convert(varchar(10),date,103) = @date
                and
                [Records].name = [Data].name
)
select name from P where thread = 1
Regards
Pols
I had to do something similar once when sending data to a CMS system over the internet. To get enough rows sent I had to create multiple processes to read the rows as sending each row over the internet to the CMS system had a long wait for the response but if sent in parallel I could increase the rate at which they were sent. So if you are doing what I think you are doing then you can split the rows using the T-SQL mod operator: '%' using some code similar to this:
DECLARE @ThreadToFind as int SET @ThreadToFind=1
;with P as ( select *,ROW_NUMber() over(order by id) as rownum, id%5 as thread from [Records] where not exists(select name from [Data] where Convert(varchar(10),date,103) = @date       and [Records].name = [Data].name ) select name from P where thread = @ThreadToFind
This will split the rows into 5 different threads (0 to 4) and you won't get duplicates as it only relies on the value of the Id column to split the rows.

Hi Jonathan AC Roberts,

i am implement the id%5 but some records are not load the in thread(storeprocedure).

Jonathan AC Roberts SSCoach Points: 17601 More actions · Answer 14

polo.csit - Friday, July 27, 2018 5:59 AM
Jonathan AC Roberts - Thursday, July 26, 2018 3:25 PM
polo.csit - Thursday, July 26, 2018 3:20 AM
Hi All,
    My table have 10000 records and i create the 5 store procedures and my 5 applications exec the same time and each application load the records 2000 .it is working fine but the problem is when run 1 application it load the 2000 records and take 5 min then i run the 2 application it load 1900 records .Here the problem is 1 application are working with duplication records.How to load the 2 application 10000-2000=8000 in threading application.my sp like this below
;with P as
(
select *,ROW_NUMber() over(order by id) as rownum,
ntile(5) over(order by id) as thread
from [Records]
where not exists(
                select name from
                [Data]
                where
                Convert(varchar(10),date,103) = @date
                and
                [Records].name = [Data].name
)
select name from P where thread = 1
Regards
Pols
I had to do something similar once when sending data to a CMS system over the internet. To get enough rows sent I had to create multiple processes to read the rows as sending each row over the internet to the CMS system had a long wait for the response but if sent in parallel I could increase the rate at which they were sent. So if you are doing what I think you are doing then you can split the rows using the T-SQL mod operator: '%' using some code similar to this:
DECLARE @ThreadToFind as int SET @ThreadToFind=1
;with P as ( select *,ROW_NUMber() over(order by id) as rownum, id%5 as thread from [Records] where not exists(select name from [Data] where Convert(varchar(10),date,103) = @date       and [Records].name = [Data].name ) select name from P where thread = @ThreadToFind
This will split the rows into 5 different threads (0 to 4) and you won't get duplicates as it only relies on the value of the Id column to split the rows.
Hi Jonathan AC Roberts,
i am implement the id%5 but some records are not load the in thread(storeprocedure).

I would create one stored procedure that takes a parameter @ThreadToFind then each process that calls it would call the stored procedure with the thread it wants (0 to 4).
I think all data should be returned as long as you are calling the stored procedure with all the threads: 0, 1, 2, 3, 4
You need to provide a more information on what it's not returning and ideally a sample script to illustrate if you want a more specific answer..