[Skip Global Navigation]

Dimensions Data Model™

Dimensions Data Model

Contact SalesContact SPSS Sales

An Open Data Model for the Market Research Industry

1. Introduction

Historically, the survey research industry and its computer software suppliers have been forced to accept barriers to the interchange of survey data between software systems and operating environments. These barriers, arising in part from disparate formats for handling the unique requirements of survey data, have been handled primarily through import and export programs, usually of limited effectiveness, and substantial production delays and costs are often incurred as a result. In addition, a core set of common functionality is often created in parallel by each software supplier or division, which affects software costs and ability to adapt quickly to change.

While significant headway has been made against the basic interchange problem by standards groups such as the Triple-S Committee, the challenge is in the meantime evolving. The survey research industry is beginning to see growing client demand for ever more complex analysis and data mining, for customer relationship management, for web-based publishing of interactive databases, and for executive information systems and marketing information portals that integrate multiple data streams. As we rise to meet these needs, we encounter fundamental inefficiencies arising from the difficulty of integrating applications from multiple vendors to create comprehensive solutions for clients.

The challenges include both logistical and industry issues: on the one hand, the need to handle the requirements of survey data, as well as practical issues that include hierarchical data structures and the tracking of changes over time in longitudinal survey instruments; and, on the other, to achieve consensus among industry players and software providers on the benefits to all of adopting a "lingua franca" that will allow applications from competing suppliers to share information simply and openly. The fundamental need is not only for a medium of interchange for study informationand respondent data, but for a means of simplifying the creation and linking together of application programs across a diverse range of functionality. This paper will provide an overview of an effort by SPSS to create an open data model, which provides an interchange medium, a published API (Application Programming Interface), and extensive functionality to simplify the development and integration of applications for survey research. It will review some of the challenges; the potential benefits for agencies and their clients, systems suppliers, and the industry as a whole; and learnings gathered to date from this extensive undertaking.

Design Goals and Benefits

With these challenges at hand, and with a perceived industry need for a shared, open solution to alleviate the frustrations described above—and to facilitate the creation of larger, more complex next-generation applications—the need seemed clear for a technical solution that would meet at least the following design goals:

To do this, the Dimensions Data Model must be more than just a repository for survey data. It must  provide functionality to allow applications to handle unique aspects of MR data, including multi-response questions, hierarchical datasets, and tracking of study versions, simply and quickly, so that these wheels need not be reinvented for each study. And it must provide an API that simplifies application programming.

The key benefit for the survey research community would lie beyond simple interchange of data, in customization of systems, products and business processes:

An early, simple real-world example might help to illustrate this last point. The construction of paper questionnaires for scanning often involves some fairly extensive formatting, as well as a time-consuming step in which fields on the paper are mapped into the scanning software. We've combined third-party software from multiple vendors with our data model and a proprietary application to create a solution that automates much of the formatting—and reduces the mapping process from one that often requires a day or more, depending on the questionnaire, to a matter of seconds. Further, changes to the questionnaire are automatically handled, both in the formatting and in the mapping to the scanning software. This saves multiple iterations of the manual mapping process, and allows agencies to be far more flexible and responsive.

A research agency, using these standard tools, could do the same thing.

Back to top

2. Challenges

In addressing this ambitious agenda, the design team faced some key challenges in meeting its design goals while taking into consideration the following:

Back to top

3. Meeting the Challenge: The Data Model

An open platform for common access

Fundamentally, the data model is a set of data/metadata API's to enable applications to:

…in a survey research environment.

The data model hides the implementation details of underlying, package-specific storage architectures, so that programmers of application programs need not even be aware of them. Each user program is written using a standard interface. Thus, the application can be insulated from any changes to the underlying data storage format, software environment, and even hardware platform. Further, the data model provides functions to handle common tasks such as simple aggregation and the comparison and manipulation of multi-response data, freeing the programmer to focus on the unique aspects of the application.

The data model API's are based on Microsoft technology, and make it straightforward to create custom applications in the Windows/ASP environment.

Case Data

Data Source Interfaces

Case (respondent) data is read and written using the standard ADO interface, using a subset of standard SQL commands, extended with a set of functions which provide such features as the ability to access and manipulate multi-response ("multipunch") category variables. Lower-level OleDB and ODBC interfaces are also planned.

Regardless of the underlying data storage format, the ADO provider exposes case data to the consumer application in a very simple virtual relational table. (Additional tables are used when representing hierarchical, or 'levels' data.) The columns presented in this 'VDATA' table correspond to the variables (e.g. questions) requested in the application's SQL query, and the rows correspond to respondent records. The results of a query are presented by the provider as a rowset of the selected data. For example, the following query would produce the table shown below:

... SELECT tested, color, advert, advert_other, product from VDATA

..tested ..color ..advert ..advert_other ..product
{2} {3} {1}   {2}
{3} {2} {1}   {2}
{3} {1,2} {1}   {1}
{1} {2} {1}   {2}
{2} {2} {1}   {2}
{3} {3,1} {2}   {10}
{1} {4} {7} Poster {10}

In this example:

Back to top

Metadata

The comprehensive metadata object model is designed to accommodate all required project information other than the case data—that is, information about the study, rather than the respondents

This information is stored in an xml file. The data model interacts with the xml document through another Microsoft standard, the Document Object Model (DOM), and provides a straightforward API for application programs.

Metadata Object Model

In the chart above…

Information on questionnaire versions is also maintained by the metadata model. Once a version number is specified by the end-user application, the current view shown to the application will be of that version alone. This permits applications that must be aware of changes to questionnaires over time, such as tabulation databases and OLAP environments, to access exactly what was asked in each wave, for instance, of a tracking study. A synchronization process, initiated as each version is finalized and "locked in", allows each version to store new, unique information separately, while incorporating information common to multiple versions by reference.

Back to top

How "Easy to Use" Is It, Really?

The box below shows the six lines of Visual Basic code required to open a metadata object, query it and print out the names of the question fields. Compare this with, say, the time required to write a program to extract the same information from a Triple-S metadata file – or from a closed architecture such as Quanvert, where it can't be done at all

...Accessing Metadata
' VB snippet to print variable names from an MDM Document

Dim Document As New MDMLib.Document
Dim V As MDMLib.VariableInstance

 Document.Load "MyMetadata.mdd"
 For Each V In Document.Variables
   Print V.Name   ' print names of VariableInstances

 Next

Back to top

Handling Diverse Data Formats

The "abstraction" layer, which makes diverse data and metadata storage formats accessible through the data model API's in real time, at the individual transaction-level, is achieved through the creation of "data source components", or DSC's. These are programs (COM objects), individually written for each storage format, which serve as adapters or translators. Where the high level of integration provided by data source components may not be required – where batch conversions are sufficient, for instance – simpler programs using the API can be written to import studies from proprietary formats into the Dimensions Data Model, and thus to the many data formats supported by the Data Model.

Within the SPSS product line, DSC's have been or will be made available for Quanvert data, SPSS .SAV files, mrInterview's relational database, and so on. A software developer's library is being made available to assist other software suppliers, research agencies with proprietary databases, and others in creating these components and with using the API. Once a DSC is complete, the agency or supplier will have the opportunity, over time, to integrate proprietary applications at the transaction level with the full line of SPSS products and with those of other suppliers who have written DSC's – and import and export programs for these packages should be a thing of the past.

Back to top

4. Open Architecture

The data model is "open" on two levels: first, it is built around open, industry-standard technologies; and second, it provides an open, published interface (API) for the use of one and all. The first allows DM-enabled programs to interact smoothly with a large body of other standards-based software, allowing them to be combined easily as components of more complex systems. The second allows others outside SPSS—clients, partner companies and even competitors—to easily take advantage of the benefits offered by the data model.

Back to top

Component Technology

The data model is built around industry-standard, open software technologies. From a practical standpoint, the data model's functionality is provided by a set of COM components, implemented as dynamically linked libraries (DLL's). COM is Microsoft's Component Object Model, which allows programs to expose a program interface to other programs. This enables Windows applications to open and control sessions in other programs such as Excel, Word and PowerPoint. Many survey research agencies use this functionality in MS Office to automate tasks such as report and graphics production. The COM standard facilitates the combining of "component" programs such as the data model, the Office products and many others to build applications.

Your proprietary applications can control the entire process in a straightforward way, using Visual Basic or any of the languages that work with these standards. Thus, our products and others can be combined and extended to create your own proprietary application—you can add your own unique value, and your own look and feel, combining and adding onto these building blocks. You need not go to the expense and trouble of recreating core functionality—your investment can be focused on those specific areas where your products break new ground.

The use of open standards provides a tremendous efficiency benefit at every level—whether you're simply loading a dataset into Excel with a few lines of VBA, or you're creating a complex tracking environment or a marketing information portal. Moreover, because you can work with survey data independently of the underlying storage format, you are not tied to any one product or software supplier. The result is greater flexibility and improved business processes.

Back to top

Publication

In order to encourage adoption of the data model, the API's are being published and a developer's kit for the data model is being made availableto the industry at no charge. The data of this conference happens to coincide exactly with the public release of the Dimensions Development Library. Only if the data model—or something like it—is adopted broadly, however, will it yield all of its potential benefits.

To this end, SPSS is supporting industry standards efforts wherever possible. In the MR area, our director of development has been working with the Triple-S committee, and will contribute in any practical way to the converging of our data model with this published standard.

On another front, the Object Management Group (OMG) has published the Common Warehouse Metamodel (CWM), a specification for metadata interchange among data warehousing, business intelligence, knowledge management and portal technologies.

This is a comprehensive standard. It includes relational database structures, OLAP storage, xml structures, data transformation specifications, application of analytic techniques, specification of presentation formats for information, and operational parameters and metrics for data warehouse operation. [1] SPSS has business intelligence products in areas such as analytical CRM, business metrics, and data mining, which currently utilize proprietary, xml-based metadata formats. SPSS is currently working on implementation of a metadata repository that supports CWM-based XMI interchange, which would bridge existing internal standards. While much existing metadata maps fairly easily to the CWM standard, there are places where the standard's definition of variables would have to be extended to support required functionality in the SPSS products. The tentative plan is to adopt this CWM-based repository when it becomes available. If this does in fact happen, then the MR division intends either to adopt the same metadata model or to link with it transparently.

Back to top

5. Conclusion: Strategic Thinking

Our goal at SPSS is to build on the deep, long-term partnerships we have with our clients. We must earn and sustain their trust—they must know that we as a company are out to help them deliver the most effective possible solutions for their customers. SPSS has a clear-cut commitment to open standards and systems, so as to allow clients maximum flexibility to extend and combine our products to create differentiated solutions that add value in turn for their customers. More and more, SPSS products are offered both as turnkey solutions and as components, developers' tools that support our clients in this endeavor.

This isn't the traditional thinking in this industry. Companies have felt that protecting their secrets meant protecting their client bases. The problems that result have become painfully clear to SPSS as, following our acquisition of three MR software companies, we've had to rationalize, integrate and maintain within our own product line three parallel lines of traditional MR software that didn't really talk with one another, and somehow to integrate them with our wide array of analytic, business intelligence, graphics, and other applications. One of these companies, Quantime, was well known for closed, proprietary systems such as Quanvert.

Today, though, it is important to leave this kind of tactical thinking behind. As the MR industry changes, as it faces threats from consultancies and from the do-it-yourselfers, as demand emerges for more and more sophisticated data warehousing, analytic and marketing information applications, the stakes are far too high. Our clients will be—and should be—intolerant of company policy that "traps" them within a product line.

It is important to all of us, as an industry, to think strategically about open systems and to support industry standards. By adopting standard technologies, by publishing our API's, and by working with industry standards organizations, we can as an industry help our clients to compete effectively in this changing marketplace. And when our clients win, we win.


[1] The CWM was developed by a group of companies that include industry leaders such as IBM, Oracle and Hyperion. This standard may be merged with another standard from the Meta Data Group, until recently a separate coalition that included Informatica, Microsoft, and SAS. If and when the work of merging these standards is complete, the resulting specification will be issued by the OMG as the next version of the CWM. A single standard should allow users to exchange metadata between different products from different vendors with relative freedom.

Contact SalesContact SPSS Sales