Aggregation over multiple columns

  • Hi,

    I have a table with 33 columns of Skill types e.g. Skill1, Skill2, ..etc. A particular skill, say, "C#" can be found in any of the columns. Basically this table captures employee skill details. I would like to have a query that will list down the entire list of skills along with count of each type across all employees. i.e. "How many resources do I have with C# skills? How many do I have with C++ skills and so on.

    I am a newbie to SQL and still groping the dark. Any help will be greatly appreciated

    John

  • Hi John,

    Ideally you need to redesign so that the skills , and SkillsPerPerson are stored in separate tables...

    Person

    --------

    PersonID

    PersonName....

    Skills

    -----

    SkillID

    SkillName....

    PersonSkills

    ----------

    PersonID

    SkillID

    Designing the tables in this way will mean the database design will not need to be updated when a new skill comes along. It will also make query design VERY easy. This is best practice and is called Normalisation.

    But, as I dont know your environment I may be jumping to conclusions. Here is one way to solve your problem using your current design:

    Table design based on your information and some dummy data....

    create table #Skills (

    EmployeeID int identity(1,1),

    skillA varchar(128),

    skillB varchar(128),

    skillC varchar(128),

    skillD varchar(128) )

    truncate table #skills

    insert #Skills(skillA, skillB, skillC, skillD)

    select 'C#', 'VB', null, null

    union select 'SQL', 'AJAX', 'VB', null

    union select 'C++', 'C#', 'VB', null

    union select 'ASP', 'ASP.NET', 'C++', 'Javascript'

    And the query....

    with cteSkills(Skill)

    as

    (select distinct skillA as Skill from #skills

    union

    select distinct skillB as Skill from #skills

    union

    select distinct skillC as Skill from #skills

    union

    select distinct skillD as Skill from #skills)

    select Skill , count(Skill) as SkilledPeople

    from cteSkills inner join #skills on (Skill=SkillA) or (Skill=SkillB) or (Skill=SkillC) or (Skill=SkillD)

    where Skill is not null group by Skill

    order by count(Skill) desc

    You will need to add more columns to the query for each database column.

    B

  • You might want to look up UNPIVOT in BOL, which could be used similarly to the following:

    ;WITH CTE_SKILLS AS (

    SELECT DISTINCT EMPLOYEE_ID, SKILL

    FROM SKILL_TABLE

    UNPIVOT(SKILL FOR SKILL_TYPE IN (

    SKILL1,SKILL2,SKILL3,SKILL4,SKILL5,SKILL6,SKILL7,SKILL8,

    SKILL9,SKILL10,SKILL11,SKILL12,SKILL13,SKILL14,SKILL15,SKILL16,

    SKILL17,SKILL18,SKILL19,SKILL20,SKILL21,SKILL22,SKILL23,SKILL24,

    SKILL25,SKILL26,SKILL27,SKILL28,SKILL29,SKILL30,SKILL31,SKILL32,

    SKILL33)) AS UPVT

    )

    SELECT SKILL, COUNT(EMPLOYEE_ID) AS EMP_COUNT

    FROM CTE_SKILLS

    GROUP BY SKILL

    While this will get past the poor database design, how it performs will be a question mark until you test it. I have to agree that this database needs to be re-designed.

    Steve

    (aka smunson)

    :):):)

  • What I would do is write some code that normalizes the data so that each person/skill set is in a separate row. Then deal with the data in this normalized fashion.

    It's very risky to write a lot of code that un-pivots the the data on the fly. You can write a considerable amount of SQL code that assumes - let's say - that there are only 4 skill sets that can be populated for the person. People can tell you that there will never, under any circumstances, ever be more than 4. Well, 6 months or a year later someone will come up with a good reason for there to be 5th and then a 6th skill set. I've dealt with this type of thing for years and it almost always happens.

    If the majority of your code deals with normalized data, then the only code you have to change when the new column is added is that code that pumps the data into the normalized tables - everything else is the same.

    Todd Fifield

  • technicx101 (11/16/2008)


    Hi,

    I have a table with 33 columns of Skill types e.g. Skill1, Skill2, ..etc. A particular skill, say, "C#" can be found in any of the columns. Basically this table captures employee skill details. I would like to have a query that will list down the entire list of skills along with count of each type across all employees. i.e. "How many resources do I have with C# skills? How many do I have with C++ skills and so on.

    I am a newbie to SQL and still groping the dark. Any help will be greatly appreciated

    John

    You've been dealt a great dis-service with a denormalized table like that. An "unpivot" is certainly in order, as others have suggested. BUT, it's impossible to help because you've neither provided table structure nor example data. Please see the link in my signature below.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply