QualityMetric Logo
Search the QualityMetric.com site

A number of factors should be considered when deciding which SF survey to use for a particular application. The decision hinges in large part on making a tradeoff between respondent burden and score precision. These and other considerations – such as whether to switch forms midway during a longitudinal study and when to consider using computerized adaptive testing – are addressed below

Respondent Burden

Shorter surveys can be completed more quickly and are associated with the least respondent burden. The SF-8™ can be completed in 1 to 2 minutes on average, while the SF-12® requires roughly 2 to 3 minutes, and the SF-36® requires between 5 and 10 minutes. Longer surveys also take up more space when used in printed form. Survey length and respondent burden may be an issue in some clinical settings or when the survey is administered with a large battery of other items.

Precision

In contrast to respondent burden, precision varies inversely with length. The SF-8 scales are the most "coarse," offering the least amount of precision and generally covering a narrower range for each of the eight domains of health. The longer SF-36v2 offers a greater degree of precision than the SF-12v2™. However, Version 2.0 improvements significantly increased the precision of both surveys, so that the difference between the updated surveys is significantly smaller than the difference between original versions of the SF-36 and SF-12. As a result, when used in population studies, SF-12v2 yields results that are comparable to those that would be obtained with the original SF-36. Those surveying populations can now take advantage of the brevity of the SF-12v2, confident that it will generally detect group differences and changes in health status as well as the original SF-36.

Matching a Form to an Application

Because of improvements incorporated into the SF-36v2 and SF-12v2, the updated surveys are generally recommended over the original versions (SF-36 and SF-12). The updated surveys are most often considered the "tools of choice" for fixed-length, short-form questionnaires that require maximum efficiency, and are recommended for use in clinical trials, outcomes and effectiveness research, and clinical practice applications. The SF-36v2, SF-12v2, and SF-8 can all be considered for population surveys and studies involving large samples.

Detecting Small Group Differences and Classifying Individuals
The SF-36v2 and SF-12v2 are recommended for efforts focused on detecting small group differences and classifying individuals (e.g., screening individual patients). In these situations, a high standard of score reliability (between .90 and .95) is necessary to achieve satisfactory statistical power, and single-item measures are likely to be inadequate or to detect only very large differences. The improved precision afforded by the longer measures can be observed through narrower confidence intervals around score estimates.

Large Population Surveys and Samples
The SF-36v2, SF-12v2, or SF-8 may all be considered for the largest population surveys and for studies involving large samples (e.g., more than 500) and group-level comparisons. Single-item measures such as those in the SF-8 work well in these situations because precision and the statistical power of hypothesis testing are achieved much more by drawing a larger representative sample than by increasing measurement reliability. While some concerns have been expressed in the past about single-item measures, a number of these are addressed by the scoring algorithms, making the SF-8 a valid option in selected situations.
Because items in the SF-8 are not a subset of those in the SF-36 and SF-12, using the SF-8 may offer disadvantages in certain situations.

Scoring and Interpretation

Norm-based scoring (NBS) can now be used to score all SF surveys. NBS offers a number of advantages over the 0 to 100 based scoring previously used with the SF-36. Through NBS, scale and summary scores are standardized to a mean of 50 and a SD of 10 in the general US population, allowing scores to be compared within and across the different SF surveys. Updated normative data, available for each of these surveys, also assists interpretation by allowing comparisons to the general US population.

Switching Versions "Midstream"
Prior to 1998, the SF-36 used a different scoring algorithm yielding scale scores that ranged between 0 and 100. Although norm-based scoring algorithms became available for SF-36 and SF-12 in 1998 and provide the link required for making meaningful comparisons of results between versions of the same survey, we recommend caution in adopting an updated version midway through a longitudinal study. Unless the number of years remaining in a longitudinal study is large, real threats to validity and threats perceived by others may be too great to justify that change. In such cases, parallel administrations of items from both versions (as in our psychometric evaluations comparing the SF-12 and SF-12v2) may provide the data needed to determine whether conclusions are robust across alternate versions.

Computerized Adaptive Testing (CAT)

For the most demanding applications of health status surveys, we no longer look to static short-form tools to achieve the most practical and precise measures. Ongoing research is demonstrating that software based on computerized adaptive test (CAT) logic delivers the "best of both worlds": more practical and more precise measures that cover the very wide range of levels of health and well-being required to monitor and compare generic health outcomes across diverse populations.
By matching questions to each respondent's health level, CAT is able to estimate scores much more efficiently than do static surveys. QualityMetric's dynamic health assessment (DYNHA®) software uses item response theory (IRT) models and norm-based score estimates (for items in the SF-36 and other widely used questionnaires) to calibrate item pools to an individual. The resulting CAT survey is very accurate and can be quickly administered on the Internet at greatly reduced costs. This approach to survey administration offers efficiency, the ability to compare results to norms, and use of other interpretation guidelines based on the SF tools.

Flash Not Installed

QualityMetric News

A PRO Measurement Community

  • SF-36.org
  • Science, research, and academic discourse

  • amIhealthy.com
  • Consumer-focused for PRO survey administration

  • IQOLA.org
  • Global resource on the lingustic validation of PRO surveys