Star Schema Design for Headcount with Goals

  • I am cutting my teeth on star schema design. I have a simple star schema I am building for Headounct analysis at work. I have a factless fact table where a row represents a head in the company. Each head is toed to a particulat week in a Date dimension tabel. There are additional dimensions for things like gender, ethnicity, marital status, age, etc. Now in my department dimension - it's hierarchical. In the DimDepartmnet there is a department which belongs to a company. Comapnies belong to divisions. Now the fun part. Each division has a headcount target for each year. Up to this point I am in a perfect star schema (no snow flaking). How would I integrate in this concept of a headcount target for each division for a given year?

    We are using cognos on top of this star schema to provide reporting and analysis services if that is relevant. From the Star Schema design stand point... any thoughts?

    Christian Loris

    http://www.implementsivillage.net

  • So I assume you have a Date dimension table, Company table, what about gender, ethnicity, marital status, does each one have its own dimension table or all these belong to a customer table?

    I like to use surrogate key to build my fact table.

    Date table

    DateID INT IDENTITY(1,1),

    DateTime DATETIME,

    DateYear VARCHAR,

    DateMonth VARCHAR,

    DateDay VARCHAR...

    You can add all kind of date related columns in this table.

    Company table

    CompanyID INT IDENTITY(1,1),

    CompanyName VARCHAR

    Division VARCHAR

    Department VARCHAR

    Aside from gender, ethnicity, the fact table should be

    HeadcountFact

    DateID

    CompanyID

    This is factless table which will work find. If you want to find out how many headcount in particular year in particular division.

    SELECT COUNT(*)

    FROM HeadcountFact h

    INNER JOIN Date d ON h.DateID = d.DateID

    INNER JOIN Company c ON h.companyid = c.companyID

    WHERE h.YEAR = '2006' AND c.DIVISOIN = 'IT'

    However I am not familiar with cognos, so I don't know if it can use factless table.

    If it can't use factless table, then you have to create a headcount measure in the fact table.

    Headcounttable

    DateID

    CompanyID

    Headcount

    my 2 cents

    Am I missing something?

  • Loner:

    You've got pretty much what I have got as far as design. The thing that is missing is that I have a headcount target for each division for each year. So for example Division A has a headcount target of 2000 for 2005 and a traget of 2100 for 2006... and so on. How does this figure into the picture?

    Chris

  • Chris, one way to do this would be to have another fact table - headcount target and join it with date and division dimensions which you already have.

    If I understand your problem correctly, you have a headcount target value for each division for each year. That makes headcount target a measure with a different granularity than your headcount fact table. So it needs to go in fact table of its own.

  • Ok. That seems simplistic. I am not too familiar with the AS stack and cubing at this point - but I am assuming if I wanted to report or conduct analysis, I would just have to point to both cube/data sources. I could then view, for instance, an analysis table listing the headcount at a division for each week. And then using the other fact table, correlate the division and year, and show the delta for that week from the annual target. I just have to ensure I can map the business keys for the divisions between the two fact tables.

    Chris

  • Hi Chris,

    I go along with Loner's solution except where a table occurs that stores hierarchical information, then I would always associate it with its lowest level of granularity and hence name it accordingly. The lowest level of granularity here is Department. Therefore, in my design, it would be known as DEPARTMENT_DIMENSION rather than COMPANY.

    You seem to be objecting to snowflaking although I'm unclear why. If you can accept it, then I suggest you create a DIVISION_DIMENSION containing DIVISION_ID (foreign key on DEPARTMENT_DIMENSION), YEAR (enables you to build up headcount history) and HEADCOUNT.

    Personally, I like to keep things as simple as possible and having multiple fact tables strikes me as over engineering.

    If you need some convincing about snowflaking then consider this point: In a data warehouse to be used for marketing purposes, then typically the CUSTOMER_DIMENSION will be a slowly changing dimension. If the address key is not on the CUSTOMER_DIMENSION (snow flaking), but perhaps on the fact table, then how do you ensure you have selected the most recent address for a customer for a direct mail marketing campaign - especially if the criteria for selecting the customer was a match on a non-current row of the CUSTOMER_DIMENSION?

    I hope the above helps.

  • When you build up the fact table, you always ended up with the count of the lowest level unless you put it under Analyzer Service and it will do the count for you for all levels. Otherwise what you do when you want to get the upper level is the set the query with group by and where clause.

    SELECT c.Department, SUM(Headcount)

    FROM HeadcountTable h

    INNER JOIN Companytable c ON h.companyid = c.companyid

    WHERE c.Department = 'IT'

    GROUP BY c.Department

  • I don't think snowflake schema is necessary for this design. This is just a typical company dimension.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply