Bayliss EA, Bayliss MS, Ware JE, Jr., Steiner JF. Predicting Declines in Physical Function in Persons with Multiple Chronic Medical Conditions: What We Can Learn From the Medical Problem List? Health and Quality of Life Outcomes 2004;2(1):47.
Abstract:
BACKGROUND: Primary care physicians are caring for increasing numbers of persons with comorbid chronic illness. Longitudinal information on health outcomes associated with specific chronic conditions may be particularly relevant in caring for these populations. Our objective was to assess the effect of certain comorbid conditions on physical well being over time in a population of persons with chronic medical conditions; and to compare these effects to that of hypertension alone. METHODS: We conducted a secondary analysis of 4-year longitudinal data from the Medical Outcomes Study. A heterogeneous population of 1574 patients with either hypertension alone (referent) or one or more of the following conditions: diabetes, coronary artery disease, congestive heart failure, respiratory illness, musculoskeletal conditions and/or depression were recruited from primary and specialty (endocrinology, cardiology or mental health) practices within HMO and fee-for-service settings in three U.S. cities. We measured categorical change (worse vs. same/better) in the SF-36® Health Survey physical component summary score (PCS) over 4 years. We used logistic regression analysis to determine significant differences in longitudinal change in PCS between patients with hypertension alone and those with other comorbid conditions and linear regression analysis to assess the contribution of the explanatory variables. RESULTS: Specific diagnoses of CHF, diabetes and/or chronic respiratory disease; or 4 or more chronic conditions, were predictive of a clinically significant decline in PCS. CONCLUSIONS: Clinical recognition of these specific chronic conditions or 4 or more of a list of chronic conditions may provide an opportunity for proactive clinical decision making to maximize physical functioning in these populations.
Bjorner JB, Ware JE, Jr. Using Modern Psychometric Methods to Measure Health Outcomes. Medical Outcomes Trust Monitor 1998;3(2):11-6. [SF-36]
Abstract: Abstract unavailable
Bjorner JB, Ware JE, Jr., Kosinski M. The Potential Synergy Between Cognitive Models and Modern Psychometric Models. Quality of Life Research 2003;12:261-74. [SF-36]
Abstract:
Analyses of cognitive aspects of survey methodology (CASM) and psychometric analysis are two methods that are able to complement each other. We use concrete examples to illustrate how psychometric analyses can test hypotheses from CASM. The psychometrics framework recognizes that survey responses are affected by other factors than the concept being assessed, for example by cognitive factors and processes. Such factors are subsumed under the concept of measurement error. Possible sources of measurement error can be tested, e.g. by randomized experiments. A standard way to reduce measurement error is to ask several questions about the same concept and combine the answers into a multi-item scale that is more precise than the individual items. Techniques like structural equation models use the item correlations to assess the magnitude of measurement error and to test the assumptions behind the multi-item scale, e.g. the effect of common response choices and item time frames. A central problem in modern psychometrics is how to model the mapping of the continuous latent variable onto the item response choice categories. This is achieved by threshold models (e.g. item response models and structural equation models for categorical data). These models can, for example, analyze the impact of mode of administration, test whether the items function in the same way for all people (measurement invariance/differential item functioning) and examine the consistency of responses from any single person. Such analyses provide new possibilities for combining psychometrics and cognitive methods.
Bjorner JB, Wallenstein GV, Martin MC et al. Interpreting Score Differences in the SF-36 Vitality Scale: Using Clinical Conditions and Functional Outcomes to Define the Minimally Important Difference. 23 ed. 2007. p. 731-9.
Abstract:
OBJECTIVE: To propose the minimally important difference (MID) for the SF-36 Vitality (VT) scale by evaluating the association of score differences with clinical conditions and functional outcomes. METHODS: Analyses were performed on data from the Medical Outcomes Study (n = 3445). The first analyses regressed VT scores (0-100 scale) on chronic conditions that cause fatigue in order to determine the impact of each condition on VT. The second set of analyses examined the relationship between baseline VT scores and other outcomes at baseline, 1-year, and 7-year follow-up. RESULTS: VT scores were significantly reduced in patients with anemia [5 points (95% CI 2-9 points)], CHF [6 (3-9) points], and COPD [6 (3-9) points]. Decreases in VT score were significantly associated with increased odds of negative outcomes, including inability to work due to health at baseline [OR (5 points) = 1.27 (95% CI 1.24-1.31), OR (10 points) = 1.62 (1.54-1.71)], job loss at 1 year [OR (5) = 1.13 (1.08-1.19), OR (10) = 1.28 (1.17-1.41)], hospitalization at 1 year [OR (5) = 1.08 (1.05-1.11), OR (10) = 1.17 (1.10-1.23)], short-term mortality [0-18 months-Hazard Ratio (HR) (5) = 1.10-1.71, HR (10) = 1.21-2.39, depending on VT level] and long-term mortality [19+ months-HR (5) = 1.05-1.31, HR (10) = 1.10-1.54]. The mortality risk increase was largest at low VT levels. CONCLUSIONS: VT decrements of 5-10 points were seen for diseases known to cause fatigue. Further, differences of 5-10 points in the VT score were associated with significant increased risk of negative outcomes. We recommend an MID of 5 points for analyses of groups with VT scores below average. For follow-up of individual patients, we recommend a 10-point difference as important.
Bullinger M, Alonso J, Apolone G et al. Translating Health Status Questionnaires and Evaluating Their Quality: The IQOLA Project Approach. International Quality of Life Assessment. Journal of Clinical Epidemiology 1998;51(11):913-23.
Abstract: This article describes the methods adopted by the International Quality of Life Assessment (IQOLA) project to translate the SF-36 Health Survey. Translation methods included the production of forward and backward translations, use of difficulty and quality ratings, pilot testing, and cross-cultural comparison of the translation work. Experience to date suggests that the SF-36 can be adapted for use in other countries with relatively minor changes to the content of the form, providing support for the use of these translations in multinational clinical trials and other studies. The most difficult items to translate were physical functioning items, which used examples of activities and distances that are not common outside of the United States, items that used colloquial expressions such as pep or blue, and the social functioning items. Quality ratings were uniformly high across countries. While the IQOLA approach to translation and validation was developed for use with the SF-36, it is applicable to other translation efforts.
Cella D, Yount S, Rothrock N et al. The Patient-Reported Outcomes Measurement Information System (PROMIS): Progress of an NIH Roadmap Cooperative Group During its First Two Years. Med Care 2007 May;45(5 Suppl 1):S3-S11.
Abstract:
BACKGROUND: The National Institutes of Health (NIH) Patient-Reported Outcomes Measurement Information System (PROMIS) Roadmap initiative (www.nihpromis.org) is a 5-year cooperative group program of research designed to develop, validate, and standardize item banks to measure patient-reported outcomes (PROs) relevant across common medical conditions. In this article, we will summarize the organization and scientific activity of the PROMIS network during its first 2 years. DESIGN: The network consists of 6 primary research sites (PRSs), a statistical coordinating center (SCC), and NIH research scientists. Governed by a steering committee, the network is organized into functional subcommittees and working groups. In the first year, we created an item library and activated 3 interacting protocols: Domain Mapping, Archival Data Analysis, and Qualitative Item Review (QIR). In the second year, we developed and initiated testing of item banks covering 5 broad domains of self-reported health. RESULTS: The domain mapping process is built on the World Health Organization (WHO) framework of physical, mental, and social health. From this framework, pain, fatigue, emotional distress, physical functioning, social role participation, and global health perceptions were selected for the first wave of testing. Item response theory (IRT)-based analysis of 11 large datasets supplemented and informed item-level qualitative review of nearly 7000 items from available PRO measures in the item library. Items were selected for rewriting or creation with further detailed review before the first round of testing in the general population and target patient populations. CONCLUSIONS: The NIH PROMIS network derived a consensus-based framework for self-reported health, systematically reviewed available instruments and datasets that address the initial PROMIS domains. Qualitative item research led to the first wave of network testing which began in the second year.
Chakravarty EF, Bjorner JB, Fries JF. Improving Patient Reported Outcomes Using Item Response Theory and Computerized Adaptive Testing. Journal of Rheumatology 2007;34(6):1426-31.
Abstract: Objective. Patient reported outcomes (PRO) are considered central outcome measures for both clinical trials and observational studies in rheumatology. More sophisticated statistical models, including item response theory (IRT) and computerized adaptive testing (CAT), will enable critical evaluation and reconstruction of currently utilized PRO instruments to improve measurement precision while reducing item burden on the individual patient. -- Methods. We developed a domain hierarchy encompassing the latent trait of physical function/disability from the more general to most specific. Items collected from 165 English-language instruments were evaluated by a structured process including trained raters, modified Delphi expert consensus, and then patient evaluation. Each item in the refined data bank will undergo extensive analysis using IRT to evaluate response functions and measurement precision. CAT will allow for real-time questionnaires of potentially smaller numbers of questions tailored directly to each individual's level of physical function. -- Results. Physical function/disability domain comprises 4 subdomains: upper extremity, trunk, lower extremity, and complex activities. Expert and patient review led to consensus favoring use of present-tense "capability" questions using a 4- or 5-item Likert response construct over past-tense "performance" items. Floor and ceiling effects, attribution of disability, and standardization of response categories were also addressed. -- Conclusion. By applying statistical techniques of IRT through use of CAT, existing PRO instruments may be improved to reduce questionnaire burden on the individual patients while increasing measurement precision that may ultimately lead to reduced sample size requirements for costly clinical trials [SF-36]
Fleishman JA, Cohen JW, Manning WG, Kosinski M. Using the SF-12 Health Status Measure to Improve Predictions of Medical Expenditures. Medical Care 2006 May;44(5 suppl):I54-I63.
Abstract:
BACKGROUND:: Relatively few studies have used self-reported health status in models to predict medical expenditures, and many of these have used the SF-36. OBJECTIVES:: We sought to examine the ability of the briefer SF-12 measure of health status to predict medical expenditures in a nationally representative sample. METHODS: We used data from the 2000-2001 panel of the Medical Expenditure Panel Study. Respondents (n = 5542) completed the SF-12 in a questionnaire. Interviews obtained data on demographics and selected chronic conditions. Data on expenditures incurred subsequent to the interview were obtained in part from provider records. We examined different regression model specifications and compared different statistical estimation techniques. RESULTS: Adding the SF-12 to a regression model improved the prediction of subsequent medical expenditures. In a model with only age and gender, adding the SF-12 increased R from 0.06 to 0.13. The coefficients for the Physical Component Summary (PCS) and the Mental Component Summary (MCS) of the SF-12 for this model were -0.045 (P < 0.01) and -0.012 (P < 0.01), respectively. In a model including demographic characteristics, chronic conditions, and previous expenditures, adding the SF-12 increased the R from 0.26 to 0.29. The coefficients for the PCS and the MCS for this model were -0.025 (P < 0.001) and -0.005 (P = 0.15), respectively. A single general health status question performed almost as well as the full SF-12. Models estimated using ordinary least squares had undesirable properties. In terms of R, a generalized linear model (GLM) with a Poisson variance function was consistently superior to a GLM with a gamma variance function. CONCLUSIONS: Information on self-reported health status is useful in predicting medical expenditures. The extent to which the SF-12 adds predictive power over a comprehensive array of diagnostic data remains to be examined
Gandek B, Ware JE, Jr., Aaronson NK et al. Cross-validation of Item Selection and Scoring for the SF-12 Health Survey in Nine Countries: Results from the IQOLA Project. International Quality of Life Assessment. Journal of Clinical Epidemiology 1998;51(11):1171-8.
Abstract: Data from general population surveys (n = 1483 to 9151) in nine European countries (Denmark, France, Germany, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) were analyzed to cross- validate the selection of questionnaire items for the SF-12 Health Survey and scoring algorithms for 12-item physical and mental component summary measures. In each country, multiple regression methods were used to select 12 SF-36 items that best reproduced the physical and mental health summary scores for the SF-36 Health Survey. Summary scores then were estimated with 12 items in three ways: using standard (U.S.-derived) SF-12 items and scoring algorithms, standard items and country-specific scoring, and country-specific sets of 12 items and scoring. Replication of the 36-item summary measures by the 12-item summary measures was then evaluated through comparison of mean scores and the strength of product-moment correlations. Product-moment correlations between SF-36 summary measures and SF-12 summary measures (standard and country-specific) were very high, ranging from 0.94-0.96 and 0.94-0.97 for the physical and mental summary measures, respectively. Mean 36-item summary measures and comparable 12-item summary measures were within 0.0 to 1.5 points (median = 0.5 points) in each country and were comparable across age groups. Because of the high degree of correspondence between summary physical and mental health measures estimated using the SF-12 and SF-36, it appears that the SF-12 will prove to be a practical alternative to the SF-36 in these countries, for purposes of large group comparisons in which the focus is on overall physical and mental health outcomes.
Gandek B, Sinclair SJ, Kosinski M, Ware JE, Jr. Psychometric Evaluation of the SF-36 Health Survey in Medicare Managed Care. Health Care Financing and Review 2004;25(4):5-25.
Abstract:
Data quality and scoring assumptions for the SF-36 Health Survey were evaluated among the elderly and disabled, using 1998 Cohort I baseline Medicare HOS data (n=177,714). Missing data rates were low, and scoring assumptions were met. Internal consistency reliability was 0.83 to 0.93 for the eight scales and 0.94 and 0.89, respectively, for the physical (PCS) and mental (MCS) component summary measures. Results declined with increased risk factors (e.g., older age, more chronic conditions), but were well above accepted standards for all subgroups. These findings support using standard algorithms for scoring the SF-36 in the HOS and subgroup analyses of HOS data.
Gandek B, Ware JE, Jr., Aaronson NK et al. Tests of Data Quality, Scaling Assumptions, and Reliability of the SF- 36 in Eleven Countries: Results from the IQOLA Project. International Quality of Life Assessment. Journal of Clinical Epidemiology 1998;51(11):1149-58.
Abstract: Data from general population samples in 11 countries (n = 1483 to 9151) were used to assess data quality and test the assumptions underlying the construction and scoring of multi-item scales from the SF-36 Health Survey. Across all countries, the rate of item-level missing data generally was low, although slightly higher for items printed in the grid format. In each country, item means generally were clustered as hypothesized within scales. Correlations between items and hypothesized scales were greater than 0.40 with one exception, supporting item internal consistency. Items generally correlated significantly higher with their own scale than with competing scales, supporting item discriminant validity. Scales could be constructed for 93-100% of respondents. Internal consistency reliability of the eight SF-36 scales was above 0.70 for all scales, with two exceptions. Floor effects were low for all except the two role functioning scales, ceiling effects were high for both role functioning scales and also were noteworthy for the Physical Functioning, Bodily Pain, and Social Functioning scales in some countries. These results support the construction and scoring of the SF-36 translations in these 11 countries using the method of summated ratings.
Keller SD, Bayliss MS, Ware JE, Jr., Hsu MA, Damiano AM, Goss TF. Comparison of Responses to SF-36 Health Survey Questions with One-Week and Four-Week Recall Periods. Health Services Research 1997;32(3):367-84.
Abstract:
OBJECTIVE: To compare the measurement properties of acute (one-week recall) and standard (four-week recall) versions of SF-36 Health Survey (SF-36) scale scores. DATA SOURCES: SF-36 data collected from 142 participants (60% female, average age 39) in a clinical trial of an asthma medication: 74 patients randomized to the acute form and 68 to the standard. DATA COLLECTION: The SF-36 was self-administered at the time of a clinic visit (before clinical examination) to synchronize with clinical measures of disease severity at three different time points during the clinical trial: -2 weeks (two weeks before randomization to treatment), baseline (week 0 or randomization), and +4 weeks (four weeks after baseline). PRINCIPAL FINDINGS: The acute form yielded high-quality data; scales conformed to the assumptions of the summated ratings method used to score the standard SF-36; and scales had good distributional properties, were reliable, and had a factor content similar to the standard. The data indicated that while the acute form was more sensitive than the standard to change in health status associated with changes in acute symptoms, acute scale scores may not be comparable to national norms based on the standard, particularly for those scales that assess frequency of health events during a specified time period. CONCLUSIONS: Results support the use of the acute form in its intended applications; however, further research is required to document the generalizability of greater sensitivity of the acute form to recent changes in health and to explore whether norms based on the standard can be used to interpret the acute scale scores.
Kosinski M, Martin R, Henkenius S, Wanke LA, Buatti M. Determining Clinically Meaningful Improvement in SF-36 Scale Scores for Treatment Studies in Rheumatoid Arthritis. Arthritis and Rheumatism 2000;43(9):439.
Abstract: Abstract unavailable
Reeve BB, Hays RD, Bjorner JB et al. Psychometric Evaluation and Calibration of Health-Related Quality of Life Item Banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care 2007 May;45(5 Suppl 1):S22-S31.
Abstract:
BACKGROUND: The construction and evaluation of item banks to measure unidimensional constructs of health-related quality of life (HRQOL) is a fundamental objective of the Patient-Reported Outcomes Measurement Information System (PROMIS) project. OBJECTIVES: Item banks will be used as the foundation for developing short-form instruments and enabling computerized adaptive testing. The PROMIS Steering Committee selected 5 HRQOL domains for initial focus: physical functioning, fatigue, pain, emotional distress, and social role participation. This report provides an overview of the methods used in the PROMIS item analyses and proposed calibration of item banks. ANALYSES: Analyses include evaluation of data quality (eg, logic and range checking, spread of response distribution within an item), descriptive statistics (eg, frequencies, means), item response theory model assumptions (unidimensionality, local independence, monotonicity), model fit, differential item functioning, and item calibration for banking. RECOMMENDATIONS: Summarized are key analytic issues; recommendations are provided for future evaluations of item banks in HRQOL assessment [SF-36].
Revicki DA, Hays RD, Cella D, Sloan J. Recommended Methods for Determining Responsiveness and Minimally Important Differences for Patient-Reported Outcomes. J Clin Epidemiol 2008 Feb;61(2):102-9.
Abstract:
OBJECTIVE: The objective of this review is to summarize recommendations on methods for evaluating responsiveness and minimal important difference (MID) for patient-reported outcome (PRO) measures. STUDY DESIGN AND SETTING: We review, summarize, and integrate information on issues and methods for evaluating responsiveness and determining MID estimates for PRO measures. Recommendations are made on best-practice methods for evaluating responsiveness and MID. RESULTS: The MID for a PRO instrument is not an immutable characteristic, but may vary by population and context, and no one MID may be valid for all study applications. MID estimates should be based on multiple approaches and triangulation of methods. Anchor-based methods applying various relevant patient-rated, clinician-rated, and disease-specific variables provide primary and meaningful estimates of an instrument's MID. Results for the PRO measures from clinical trials can also provide insight into observed effects based on treatment comparisons and should be used to help determine MID. Distribution-based methods can support estimates from anchor-based approaches and can be used in situations where anchor-based estimates are unavailable. CONCLUSION: We recommend that the MID is based primarily on relevant patient-based and clinical anchors, with clinical trial experience used to further inform understanding of MID
Saris-Baglama RN, Bjorner JB, Kosinski M, Ware JE, Jr. A Unified Framework for Scoring and Missing Data Estimation for the SF-36, SF-12, and SF-8. Paper Presented at the International Society for Quality of Life Research Symposium "Stating the Art: Advacing Outcomes Research and Methodology and Clinical Applications. Boston,MA. 2004.
Abstract: Abstract unavailable
Stewart AL, Greenfield S, Hays RD et al. Functional Status and Well-Being of Patients with Chronic Conditions. Results from the Medical Outcomes Study. Journal of the American Medical Association 1989;262(7):907-13.
Abstract: Enhancing daily functioning and well-being is an increasingly advocated goal in the treatment of patients with chronic conditions. We evaluated the functioning and well-being of 9385 adults at the time of office visits to 362 physicians in three US cities, using brief surveys completed by both patients and physicians. For eight of nine common chronic medical conditions, patients with the condition showed markedly worse physical, role, and social functioning; mental health; health perceptions; and/or bodily pain compared with patients with no chronic conditions. Each condition had a unique profile among the various health components. Hypertension had the least overall impact; heart disease and patient-reported gastrointestinal disorders had the greatest impact. Patients with multiple conditions showed greater decrements in functioning and well-being than those with only one condition. Substantial variations in functioning and well-being within each chronic condition group remain to be explained. [SF-36]
Tarlov AR, Ware JE, Jr., Greenfield S, Nelson EC, Perrin E, Zubkoff M. The Medical Outcomes Study. An Application of Methods for Monitoring the Results of Medical Care. Journal of the American Medical Association 1989;262(7):925-30.
Abstract: The Medical Outcomes Study was designed to (1) determine whether variations in patient outcomes are explained by differences in system of care, clinician specialty, and clinicians' technical and interpersonal styles and (2) develop more practical tools for the routine monitoring of patient outcomes in medical practice. Outcomes included clinical end points; physical, social, and role functioning in everyday living; patients' perceptions of their general health and well- being; and satisfaction with treatment. Populations of clinicians (n = 523) were randomly sampled from different health care settings in Boston, Mass; Chicago, Ill; and Los Angeles, Calif. In the cross- sectional study, adult patients (n = 22,462) evaluated their health status and treatment. A sample of these patients (n = 2349) with diabetes, hypertension, coronary heart disease, and/or depression were selected for the longitudinal study. Their hospitalizations and other treatments were monitored and they periodically reported outcomes of care. At the beginning and end of the longitudinal study, Medical Outcomes Study staff performed physical examinations and laboratory tests. Results will be reported serially, primarily in The Journal.
Thissen D, Reeve BB, Bjorner JB, Chang CH. Methodological Issues for Building Item Banks and Computerized Adaptive Scales. Quality of Life Research 2007 Feb 10.
Abstract: This paper reviews important methodological considerations for developing item banks and computerized adaptive scales (commonly called computerized adaptive tests in the educational measurement literature, yielding the acronym CAT), including issues of the reference population, dimensionality, dichotomous versus polytomous response scales, differential item functioning (DIF) and conditional scoring, mode effects, the impact of local dependence, and innovative approaches to assessment using CATs in health outcomes research [SF-36]
Turner-Bowker DM, Bayliss MS, Ware JE, Jr., Kosinski M. Usefulness of the SF-8 Health Survey for Comparing the Impact of Migraine and Other Conditions. Quality of Life Research 2003;12:1003-12.
Abstract:
BACKGROUND: Migraine headaches have been shown to have substantial personal and societal implications. Health-related quality of life (HRQOL) assessments of migraineurs have been used to monitor and evaluate patient- and population-based outcomes, and to evaluate effectiveness and responsiveness to treatment. In this paper, we test a new, even shorter generic health survey, the SF-8 Health Survey (SF-8), an alternate form that uses one question to measure each of the eight SF-36 Health Survey (SF-36) domains, in a sub-sample of migraine sufferers. METHODS: Data from 7557 participants surveyed via the Internet and mail were used to document the burden of migraine on HRQOL and to compare the relative burden of migraine with other chronic conditions using the SF-8. RESULTS: Migraineurs' HRQOL is similar to those with congestive heart failure, hypertension and diabetes, and is better than those with depression. Migraine sufferers experience better physical health and worse mental health (MH) than those with osteoarthritis. Results support prior research indicating that the burden of migraine on functional health and well-being is considerable and comparable to other chronic conditions known to have substantial impact on HRQOL. CONCLUSIONS: The SF-8 may provide a more practical and efficient method to describe the burden of migraine in population studies.
Ware JE, Jr., Sherbourne CD. The MOS 36-Item Short-Form Health Survey (SF-36). I. Conceptual Framework and Item Selection. Medical Care 1992;30(6):473-83.
Abstract: A 36-item short-form (SF-36) was constructed to survey health status in the Medical Outcomes Study. The SF-36 was designed for use in clinical practice and research, health policy evaluations, and general population surveys. The SF-36 includes one multi-item scale that assesses eight health concepts: 1) limitations in physical activities because of health problems; 2) limitations in social activities because of physical or emotional problems; 3) limitations in usual role activities because of physical health problems; 4) bodily pain; 5) general mental health (psychological distress and well-being); 6) limitations in usual role activities because of emotional problems; 7) vitality (energy and fatigue); and 8) general health perceptions. The survey was constructed for self-administration by persons 14 years of age and older, and for administration by a trained interviewer in person or by telephone. The history of the development of the SF-36, the origin of specific items, and the logic underlying their selection are summarized. The content and features of the SF-36 are compared with the 20-item Medical Outcomes Study short-form.
Ware JE, Jr. Measuring Patients' Views: The Optimum Outcome Measure. British Medical Journal 1993;306:1429-30.
Abstract: This editorial provides commentary on the increased need for a standardized, reliable and valid outcome measure, and encourages health care policy makers, clinical investigators, and providers to become more involved in the process. This editorial accompanies articles about the SF-36. Advances in the standardized self-report survey, which has emerged as the best method of measuring outcomes from the patients point of view, are reviewed. Advances, represented in the SF-36, include ease of administration due to brevity, ease of adaptation in other English speaking countries such as the United Kingdom, and the establishment of SF-36 as a trademark protected by Medical Outcomes Trust. The Trust promotes comparability across studies and countries, and guarantees widespread availability of forms without charge. An international team of investigators is currently developing authorized translations of the SF-36 through the International Quality of Life Assessment (IQOLA) project. Ongoing research in the field of health status measurement should prove useful in determining why patient outcomes vary and how to improve them.
Ware JE, Jr. Scoring the SF-36 Health Survey. Medical Outcomes Trust Bulletin 1993;1(1):4.
Abstract: Abstract unavailable
Ware JE, Jr., Gandek B, IQOLA Project Group. The SF-36 Health Survey: Development and Use in Mental Health Research and IQOLA Project. International Journal of Mental Health 1994;23(2):49-73.
Abstract: This article focuses on the SF-36 Health Survey, and the usefulness of standardized, psychometrically sound, cross-cultural questionnaires. The article summarizes the development of the SF-36, including work on translations, methods of validation, and norming studies in 15 countries through the International Quality of Life Assessment (IQOLA) Project. The IQOLA Project is a four-year project to translate and adapt the SF-36 in 15 countries, with the objectives of validating, norming, and documenting the new questionnaire in as many countries as possible. The translation process includes forward and backward translation, English-language adaptation, quantitative evaluation, assessment of face validity, and numerous other examinations. The IQOLA Project is testing the assumption that an HRQOL questionnaire that was originally developed within the USA can be successfully translated, validated, and normed in other countries. The SF-36 has also been evaluated for content, construct, and criterion validity, and has been proven valid throughout numerous studies. The SF-36 is responsive to change and has shown the ability to discriminate between conditions. The article calls attention to examples of published applications of the SF-36 in mental health research for which, among the SF-36 measures, the 5-item mental health scale (MHI-5) has consistently performed the best.
Ware JE, Jr., Gandek B, Keller SD, The IQOLA Project Group. Evaluating Instruments Used Cross-Nationally: Methods From the IQOLA Project. Quality of Life and Pharmacoeconomics in Clinical Trials. 2nd Edition ed. New York: Raven Press; 1995.
Abstract:
INTRODUCTION: As interest has grown in evaluating health status across countries, particularly in multinational clinical trials, increased attention has been paid to assessment methods. This effort has benefited from the fields of cross-cultural psychology and sociology, which have a long tradition of developing criteria for the adaptation, translation, and psychometric testing of measures across languages and cultures. Two objectives must be addressed simultaneously. On the one hand, to make comparisons across countries, questionnaires and scoring methods must be standardized and there must be proof that the same health attributes are being measured in each country. On the other hand, questionnaires must be meaningful within each country's culture. As others have noted, the International Quality of Life Assessment (IQOLA) Project is one of the few cross-cultural efforts distinguished by a comprehensive methodological approach to meet both objectives.
Ware JE, Jr. The Status of Health Assessment 1994. Annual Review of Public Health 1995;16:327-54.
Abstract: General health status and a broader concept of quality-of-life are discussed and methods of widely used surveys are reviewed. A consensus regarding the inclusion of measures of physical, mental, social, and role functioning and general health perceptions is noted for comprehensive assessments of health. A schematic of relationships among condition-specific and generic measures is presented along with results expected for objective and subjective measures of physical and mental dimensions of health. Suggestions are offered for the labeling of disease-specific and generic measures and ways to avoid confounding of content. Applications of health surveys in general population monitoring, health policy evaluation, clinical trials of alternative treatments, monitoring and improving of health care outcomes, and in everyday clinical practice are exemplified and discussed. A unified measurement strategy is proposed and arguments in favor of standardizing the content of health surveys across applications are offered.
Ware JE, Jr., Gandek B. Methods for Testing Data Quality, Scaling Assumptions, and Reliability: The IQOLA Project Approach. International Quality of Life Assessment. Journal of Clinical Epidemiology 1998;51(11):945-52.
Abstract: Following the translation development stage, the second research stage of the IQOLA Project tests the assumptions underlying item scoring and scale construction. This article provides detailed information on the research methods used by the IQOLA Project to evaluate data quality, scaling and scoring assumptions, and the reliability of the SF-36 scales. Tests include evaluation of item and scale-level descriptive statistics; examination of the equality of item-scale correlations, item internal consistency and item discriminant validity; and estimation of scale score reliability using internal consistency and test-retest methods. Results from these tests are used to determine if standard algorithms for the construction and scoring of the eight SF-36 scales can be used in each country and to provide information that can be used in translation improvement.
Ware JE, Jr., Bjorner JB, Kosinski M. Dynamic Health Assessments: The Search for More Practical and More Precise Outcomes Measures. Quality of Life Newsletter 1999;21.
Abstract: Abstract unavailable
Ware JE, Jr. John E. Ware, Jr. on Health Status and Quality of Life Assessment and the Next Generation of Outcomes Measurement. Interview by Marcia Stevic and Katie Berry. Journal for Healthcare Quality 1999;21(5):12-7.
Abstract: John E. Ware Jr. PhD, is a founder of QualityMetric. Inc., as well as its president and chief scientific officer. He also is executive director of the Health Assessment Lab at the Health Institute, New England Medical Center, and holds professorships at Harward University and at Tufts University School of Medicine. For 14 years, he was senior research psychologist at RAND, where he developed health status and patient satisfaction measures used in the Health Insurance Experiment. He also was principal investigator for the Medical Outcomes Study (MOS), which developed the SF-36 Health Survey and other tools widely used in monitoring patient outcomes. A coauthor of papers from the MOS that received the Association for Health Services Research (AHSR) Article of the Year Award in 1993, Dr. Ware has received numerous awards for work in the field of outcomes research. He now is principal investigator of the International Quality of Life Assessment Project, which is translating and validating the SF-36 Health Survey for use in 45 countries. Dr. Ware also is developing the next generation of patient-based assessments that use advances in computer technology to provide very brief measures, yet meet the standard of precision necessary for use on an individual patient basis.