SSIS Percentage Sampling Transformation does not select consistent amounts

  • I am new to using this transformation. Initially, I liked the idea of it: simply drop this into your data flow to grab 20% of your rows and route those rows along a different path.

    However, in initial testing on a 50 row dataset, I set this transformation to select 20% of the 50 rows. The first time I executed the data flow task, the transformation selected 8 rows. The next time I ran the task, the transformation selected 13 rows. Why would it do this, when 20% of 50 is *always* 10?

    Thanks!

  • Direct from SQL Server 2005 BOL:

    Note: In addition to the specified percentage, the Percentage Sampling transformation uses an algorithm to determine whether a row should be included in the sample output. This means that the number of rows in the sample output may not exactly reflect the specified percentage. For example, specifying 10 percent for an input data set that has 25,000 rows may not generate a sample with 2,500 rows; the sample may have a few more or a few less rows.

  • And my original question still stands, which is Why Would It Do This?

    Can you explain why? Else, I am not able to see the value of a component that behaves this way.

    Thanks!

  • Based on what I have read in BOL the behaviour you are seeing is by design. Not working for Microsoft or the SSIS team, I can't give you any more of an answer then that, sorry.

  • Random sampling versus quota sampling? Seems you could be looking for a quota rather than a random sample. The linked pdf (see notes #8 in article) shows minimum sample sizes for continuous and categorical data.

    Steve.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply