Getting Started with Data Warehousing

The first in a series on managing data efficiently in your district.

These days, "data-driven decision making" is on every school district's buzzword bingo game board. Accountability pressures and lean budgets make translating data into information is a major focus of school systems trying to improve district outcomes in all areas. As such, data warehousing has become an essential district tool. Historically available only to large corporations due to price and complexity, data warehousing is now an affordable and manageable option for gathering, manipulating and incorporating district data. While the inclusion of a data warehouse can be a boon for districts, the process of its development and support needs careful planning and management.

Data Warehousing Defined

In its simplest form, a data warehouse is "the main repository of an organization's historical data, its corporate memory. It contains the raw material for management's decision support system," (Wikipedia). Whether it is a collection of simple relational databases or massive containers with ongoing transactional data, data warehouses are structured to facilitate data collection, management, querying and reporting for decision making.

How to Start

A successful data warehouse project will begin with a detailed review and analysis of your existing data. The old adage of "garbage in-garbage out" is especially true when populating your data warehouse. Questions to ask internally might include:

  • What data do we have now?
  • What is the quality of our data?
  • Is the current data format appropriate for analysis?
  • How often is the data collected and by whom?
  • How do we currently store and access the data?
  • What data do we need that is not on our list?

This analysis should include a careful review of your existing database systems and any legacy systems that are in use that the data warehouse might allow for replacement or discontinuation. Examining the data you have should also include an analysis of your informational, operational and strategic needs and goals so that the development of the warehouse can address these. While this process can be lengthy and time-consuming, it is important to thoroughly evaluate your current and future data needs as changes later in the warehousing project can be costly and complex.

Choosing and installing a data warehouse is not a quick and simple exercise so a reasonable and flexible timeline for everyone to reference will aid in minimizing unrealistic expectations and deadlines. Some basic steps to consider to ensure a good fit between your goals and needs and a product include conversations with existing school districts using the products and a discussion of the vendor's previous experience with school system installations and delivery. Unlike most software available for purchase in a school system, data warehouses are customized to your data and district so demonstration versions are not an option. Other basic questions to consider are:

  • Costs: How is cost of product influenced by district factors such as size, data preparation needs, current infrastructure or need for customization? Is training included in the cost of the product? What are the on-going licensing fees and maintenance costs for the software, hardware and connectivity? Any additional hardware or infrastructure needs? Are additional personnel needed to manage and support the solution?
  • Functionality: Are querying and reporting tools easy to use and customizable for users? How difficult will it be to prepare data for use with their extraction, transformation and loading tools? Is the product compatible with any existing data or tools that conform to the Schools Interoperability Framework? Are the reports and tools really helpful in making important decisions or monitoring key processes or indicators in your district? How is security handled for the application both at the user and data levels?

Using your existing data to predict and explain performance in all areas is critical for schools. Data warehouse implementations are not simple solutions, but rather require a critical look at the complex issues faced by a district. However, a successful and sustainable data warehouse can be an important contributor to a district's ongoing success.

Helpful links:

The Data Warehousing Institute (TDWI)
eScholar
Microsoft SQL Server 2005
Oracle
Tetradata

Lane B. Mills, Ph.D., is Associate Professor in the Educational Leadership program at East Carolina University. A former Assistant Superintendent for Accountability and Technology for a North Carolina school system, Dr. Mills was a 2004 Technology and Learning Ed Tech Leader of the Year finalist.