Data Glossary

Dashboard is a simplified front-end to a data repository, offering easy-to-use tools for doing high-level data analysis. Many vendors offer dashboards for performing basic queries on their data repositories.

Data Mart refers to a scaled-down version of a complete data warehouse, tailored for a specific business function or domain, and often intended to help key staff make strategic decisions. For example, if out of a district’s entire data repository, senior administrative staff have access to an EIS (executive information system) that offers high-level reports on budgeting, financials, and other data of interest, that could be considered a data mart. A true data mart is more than just a set of canned reports—generally it includes query tools that can be used to search the subset of data in a flexible, ad hoc manner.

Data Mining refers to the process of slicing, dicing, and analyzing data in a repository, trying to find patterns, relationships, and trends. This analysis can be as basic as having a skilled analyst pore manually over pages of columns and tables, or as detailed as the use of dedicated data mining tools against a carefully engineered data warehouse. Often simple data mining can be accomplished through ad hoc queries, or via a dashboard that’s linked to a preconfigured data mart. More complex tasks may require “data minersâ€â€”staff specialized in performing data mining, statistics and analysis, and teasing out relationships between different fields, columns, tables, and data sources in order to produce the desired results.

Data Profiling is an organized analysis of data in an enterprise, and of how well that data fits expected formats and values. This can be a one-time snapshot of data quality, but ideally it’s part of an ongoing effort to track and maintain a consistently high level of data quality. For a detailed description of the steps and tools involved in data profiling, see www.databasepipeline.com/techwatch/dataprofile.jhtml.

Online Analytical Processing or OLAP is specialized software designed to perform queries and analysis on large amounts of information. The most common scenario is where this information has been extracted and processed into a “cube†of values ahead of time, allowing swift analysis of large amounts of data. The downside is the data is rarely real-time. In addition, ad hoc queries, or those which make use of data not originally anticipated to be necessary, cannot be easily accomplished via traditional OLAP methods. Newer Relational OLAP (ROLAP) tools are not as efficient as traditional optimized OLAP, but do offer potentially greater flexibility for analysis of a wider range of data. An increasing number of database products include OLAP functionality—even Microsoft Excel is slated to include basic OLAP features as part of the Office Solution Accelerators program.

More definitions are available atwww.databasepipeline.com/glossary.jhtml.