Design advice for an education data warehouse

  • I work in a county education department. Each year we create/update a set of reports and procedures which provide schools with online access to many of the statistics for their school compared to district wide and to national targets, etc.

    After this year’s, which is my third at creating these report/data, I can’t help but think that the whole thing is screaming out for a Data Warehouse and Cube data to report on. I may be wrong on that, as I’m still only doing the learning regarding AS2005 and haven’t touched a previous version of it. I’m currently working through these books: Microsoft SQL Server 2005 Analysis Services Step by Step, The Microsoft Data Warehosue Toolkit, and The Data Warehouse Toolkit. But I’m willing to bet that this should be the way forward. Not only for the actual schools viewing the data, but also for all those staff who work at the county level and have to inspect/report on these schools and the county’s performance. The only thing is that most examples are stated within companies which have sales as a primary measure, however I’m not sure education grades, targets, etc fall into the same scheme. So I can’t use an example structure to help get me started.

    The student grades are not always numeric, in many of the subjects they are graded A-G etc. They all have points equivalent, but it would be good to look at the grades too.

    Common requests about the data are how many students achieved 5 A-C grades or 5 A-G etc. And other bits of analysis that are done around grade. OR what a school's pass percentage above C grade is, compared to the average of all schools.

    So some of the data needs to be analysed as stated here and some by numeric points. Can anyone help me as to how I could structure this part of it or achieve such a result?

    This isn’t something that I’ve got management backing for yet, as it’s still pretty much just my own and a colleagues’ thoughts at the moment. Also until I know more about it myself then I wouldn’t feel comfortable taking my case to management. I feel I will be better to create a simple version to demonstrate and show the benefit and power of such a system, to those who will give the backing to the project.

    The main 3rd party central application, which is used in the county, contains most of the information needed. And several other systems also contain additional data which would be useful. All based around a Unique_Pupil_ID.

    Current transactional database

    The main transaction database I would be taking data from has a main student table, then it has a student_results table. The student table also has many lookup type tables related to ethnic origins, deprivation codes relating to income/student address, etc. The kind of things that would be useful when doing analysis on students. As well as at a more broad level, being able to compare schools to each other at various subjects/overall/etc.

    So the student table contains the information about the student, but not results data.

    The results table data is similar to below, but obviously with more to it than just these fields:

    Student_ID

    Subject_ID

    Grade

    GradePointsEquivalent

    ExamYear

    Then there is obviously a lookup for the subject name.

    There are also things like average/points/score per student/year which aren’t a subject score, but more of a calculated field. So where ought that data to be located table-wise?

    There would be a lot more to a total DW solution for this environment, but I wanted to start with the basics so I have somethign to demonstrate.

    Can anyone offer my any advice regarding this? And a possible design structure for the warehouse with regards to which ought to be fact or dimension tables.

    Or do you think I'm barking up completely the wrong tree thinking that this would be a good solution?

    Here are my initial design thoughts.

    The othe thing I've condsidered is to make it more of a star schema instead of a snowflake schema. I would achieve this by combining the Student dim with the properties of the student, such as language, ethnicity, etc.

    Thanks for taking the time to read this less than short post.

    Kind regards,

    David

  • Hi

    I am also working in similar environment.

    Regarding # of students in particular grade Or achieving particular grade, I think Fact table should have student ID as one of the measure & from it, we could derive count(school_id).

    If that can work, then Student_Id should not be PK in dim table and hence will not be FK in Fact table. Subsequently, it can be used for aggregations.

    One thing I noticed in fact table-I don't see any FK named TIME_ID.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply