Incremental Update doubling up data

  • I have a large cube, when I built it, it was unable to count members in the dimensions, so I manually entered in an estimate.

    I think the dimesions are private

    To be honest I am not sure how to make them shared... pretty new at Analysis Services...

    When I run Incremental Update it doubles up the data. Any ideas why?

    Full Process works fine, but unfortunately takes approx 12hours, which is unrealistic to run everyday.

    Many Thanks

  • With the incremental processing, are you using a different view/table that contains only the new records?

    re:the processing time, have yougot your processing logging turned on? If yes, have you reviewed this to determine where the time is being used?  ie can the views/queries used to source the data be optimized? Are your aggregations taking too long to build (ie may need some review). It sounds like some of your dimensions are built off the fact table - can you pre-build these in your DB prior to processing the cube?

    Steve.

  • Are you using SSIS 2005 to perform the incremental update?  If so, are you using the "Slowly Changing Dimensions" transformation task?  If so, you may want to check whether you are using a type 1 or type 2 updates.  Basically, type 1 will updates the existing record on the dimension, while type 2 will insert a new record and update the old with a timestamp.

    Another area you may want to check if you are using multiple partitioning of the table since you're using a large file.  I recall reading in Microsoft's eLearning on SSAS 2005 that ...

    ... "when the same fact table or named query is used in multiple partitions, the same rows of data might get used in multiple partitions. This may lead to incorrect display of data when you process or query the cube. To avoid the duplication of data, you can use partition filters that specify which row of the fact table is used in the partition. However, you need to ensure that the filters of all the partitions of a cube extract mutually exclusive datasets from the fact table."

    I'm still learing SS2005 but I hope this helps.

  • Hi.

    If you want to perform an incrementall update in an existing partition you must set the SourceTableFilter property correctly if the source table is the same as the original data.

    In Books online you may find more information as follow in the text below (the information is related to SQL2K Analysis Services)

    "...Incremental updates allow you to keep the contents of a cube current without requiring you to reprocess the cube in full when you add new data. An incremental update involves creating a temporary partition, filling it with updated source data, processing the temporary partition, and then merging it into another partition in the cube.

    Data to be added to a cube can come from the original fact table or from a separate fact table with a structure identical to the original. If you add data from the original fact table, take care not to duplicate data that already exists in the cube. Set the SourceTableFilter property before processing the created temporary partition to restrict the data that is imported from the fact table. Temporary partitions created for this purpose are indicated by a tilde (~) character preceding the name of the temporary partition.

    If data to be added to a cube comes from the fact table from which the cube was originally created, a risk of duplicate aggregation occurs. The cube uses the SourceTableFilter property to screen incoming data from a fact table; when performing an incremental update, it adds the aggregations computed from the fact table to the aggregations stored by the cube. If the same table is run twice, once to construct the original aggregations, and again as part of an incremental update, you will receive the same data twice, added together in the cube. The SourceTableFilter property can be used to screen out existing, already aggregated data in the fact table, preventing duplicate aggregation.

    If you add data from a fact table that includes new members of a dimension, you must also reprocess the affected dimension using the processRefreshData for the Process method of the dimension object..."

    There are other examples in books online about how to process a partition.

    Hope this helps.




    Fernando Ponte
    factdata.com.br

  • Hi,

    If u are incremental update then u have u process ur dimensions first.

    then u go for cube incrmental process.

    from

    killer

     

  • Hi, thanks for your help so far. I am not using a different table/view for the data. I have one view in the cube which is the fact table. Mmm at the time it seemed like the simplest way....

    If I break up the view into tables within the cube, then use the source table filter, is that the best way?

     

  • Hello.

    You don't need to split your fact table in two or more tables to use the Sourcetablefilter and perform Incremental Update.

    Here you may find usefull information in how to proceed with an incremental update using Analysis Manager.

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/olapdmad/aghtupdate_686d.asp

    If you plan to use DTS to perform the same task, this following link will be usefull too.

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/olapdmad/aghtauto_5bfv.asp

    I think you can split your fact table in two or more tables only if you are building a huge olap cube. I guess this is your case.

    How many rows have your fact table?

    Best Regards.




    Fernando Ponte
    factdata.com.br

  • Fernando is correct, you can use the source table filter when using a single source table, but keep in mind that you will need to modify this filter each time you process to ensure that you're getting only the incremental records.  Also i foyu need to perform a full rebuild, you need to remove the filter completely or else your full process will only be the data that satisfies the filter.

    Another way you could facilitate this is to create seperate partitions in the cube, say one for each year (or quarter or month, depends on your data set sizing) and then set the source table and filter for each of these.  Then instead of doing an incremental, just process the appropriate partition (e.g. process 2004 once only, then process the 2005 partition each day/week).  Depending on what level you set your partitioning at this could result in time savings due to you only processing the 'current' partitions data.

    Lastly, have you checked your logs to determine what is taking the time in processing?  I have seen cubes that have ~9m source rows that take a short enough time to read the rows, but the building of > 15 aggregations can take 3+ hrs.

    Steve.

  • Hi,

    Incremental process always appends the data which is available in the fact table to the cube / partition. Hence we need to make sure that already processed (loaded to the cube) data MUST NOT be there for further processing.

    There are many ways to achive this. We have the same situation in our environment and we have been using Incremental update based on a column in the fact table. Our fact table/view is having a status flag which will be made to 'P' once the process completed successfully. And the cube/partition filter is made to process the records with status_flag <> 'P'.

    Analysis server WILL NOT handle this duplication issues on its own hence we need to make sure the data correctness before processing the cube incrementally.

    Hope it helps.

    Regards,

    Elanchezhian.

     

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply