How to Perform a Data Makeover

The true value of your district’s reservoir of information depends largely upon two things—the quality of the data and the tools you have to analyze it.

Aside from your dedicated and talented staff, your school system’s data, locked away in silos and repositories ranging from student information systems to HR and payroll, can be the most valuable asset you have. But if the data you’re storing isn’t the right data, or is inconsistent, erroneous, or incomplete, the best analytic tools in the world won’t help you make any sense of it. Therefore, the very first step in building your repository is to make sure you’re starting with good data and have processes in place to “keep it clean.”

GARBAGE IN, GARBAGE OUT

A basic definition of data quality is simply the actual value of your data in terms of what you want to do with it. With a classroom phone list, for example, the quality of the data refers to how useful that list is in performing the necessary business function—presumably looking up and contacting students, parents, and teachers. If phone numbers are wrong, if people are missing from the list, or if there are students on the list not actually in the class, then the data quality is low. If the basic information is correct, but it’s not arranged in any particular order or is hard to read, that also affects the value of the data. A more subtle issue is formatting—for instance, if parents want to upload a phone list into their computerized address books but the application they’re using can only accept numbers separated by dashes (e.g., 111-222-3333), and your district phone list uses a different format (e.g., (111) 222 3333). The regularity of data in a database and its degree of error or accuracy is sometimes also referred to as data hygiene.

A classroom phone list is a fairly simple example, but issues of data quality can get significantly more complicated as the size and complexity of the repositories increase. This comes to the fore when you start having unanticipated uses for data. Case in point: NCLB reporting requirements might compel you to cross-reference data in your student information system with data in your HR system. If both systems use different formats for teacher ID, you may be blocked from exploiting data in this new way until you can improve your data quality and bring the two systems in synch with one another. Not surprisingly, many organizations donâ€™t find these kinds of inconsistencies within and between data systems until changing usage requirements and the need to combine different repositories force them into the spotlight.

Organizations around the world have identical issues largely originating from the same set of circumstances: a wide variety of data repositories, created at different times, for different reasons, with sometimes widely varying tools, techniques, and technologies. Often these repositories may do a good job of meeting the particular narrow purpose for which they were originally built, but prove to be difficult to integrate with other systems. And itâ€™s inescapable that we are in an era where stovepipe systems will increasingly need to connect to one another (for more information on application integration issues, see the May issue of School CIO). There are proactive steps you can take, and the good news is that increasing interconnectivity should result in a more thorough and accurate view of whatâ€™s really going on in your school system.

MEASURE TWICE, CUT ONCE

Having established the need for data quality, how does one go about creating such a thing? There are four broad steps to analyzing and implementing an effective, flexible data infrastructure that meets your needs.

STEP 1: DOCUMENT YOUR PROCESSES
This is a systematic analysis of your business processes and needsâ€”what functions do you need to perform, and what data do you need to accomplish those goals? For this step you can either develop expertise in-house, or bring in consultants knowledgeable in enterprise architecture and business process modeling techniques to document your existing processes and also potentially suggest areas where streamlining could occur. Depending on the size of your school system, this step does not necessarily require a complex, prolonged analysisâ€”you may already have a very specific, concrete idea of what you do and how you need to do it. The more detail you can provide here, though, the better, and historically, many organizations find this process of systematically analyzing and documenting their business processes to be incredibly valuable, whether or not they ultimately implement a data warehouse.

STEP 2: WHAT DO YOU HAVE, AND WHERE DO YOU HAVE IT?
Next, analyze where your data resides, and how that data supports the processes and needs identified in step 1. How many data repositories do you currently have, what do they consist of, and where do they need to interact with one another and/or be collected and crossreferenced into a central source? This typically involves taking an inventory of all currently used IT systems and ensuring you have an accurate record of the data dictionariesâ€”the formats and types of all data stored, any validation rules on that data, and so on. For example, are your methods of storing employee and student ID information consistent in format and use across all of your systems? Your work here may indicate the need to create a full-blown data warehouse, or, alternately, your needs may be met by a few simple queries into existing databases.

STEP 3: IF YOU BUILD ITâ€¦
Now comes the creation of any new data repositories indicated in step 2, or any needed application and/or data source integration work. Some business process reengineering or application improvement may be required at this point in order to gather and collect the necessary data and keep it of a consistently high quality. For instance, you may find that your existing grade book system does not adequately enforce your standard format for student IDs (perhaps it employs a free-form text entry field), and to remedy this you need to implement validation rules or mandatory lookup tables for that data. This is a basic example, but the bottom line is that you need to identify the simplest and easiest way to collect and maintain the data you need to keep things humming along smoothly.

STEP 4: BUILD YOUR TOOL KIT
Finally, put in place any analytic tools, such as OLAP, dashboards, canned reports, and other query tools you need to have available on an ongoing basis (see sidebars for more information). Having spent the time and effort to analyze what you need, and built the infrastructure required to get there, this is the big payoff. If youâ€™ve planned carefully and implemented well, the result will be drastically improved data quality and the ability to perform ongoing, accurate analyses of your operations. This will increase your capacity to make proactive changes to your entire organization. SC

Richard Hoffman, former Web technologies coordinator for Fairfax County Public Schools in Virginia, is a technical architect based in New Hampshire and site editor of CMP Mediaâ€™s Database Pipeline.

Data Central

Mine the latest data-driven decision making resources offered by our parent company, CMP Media.

Searchable product guide to analytics products and other data-related tools: productfinder.databasepipeline.com
"Data Mining for the Masses," Intelligent Enterprise, June 9, 2004
"Data: Mining with a Mission," Technology & Learning, March 2004
"Eight Buying Tips: Data Warehouses," Technology & Learning, August 2004

TOPICS