The Life Functioning Scale: A Measurement Tool Developed to Assess the Physical Functioning Abilities of Community-Dwelling Adults Aged 50 Years or Older
Article information
Abstract
Background
This study aimed to develop an instrument for assessing physical functioning among adults aged 50 years or older living in the community.
Methods
Based on a review of various national health surveys and cohort studies, a 144-item bank was constructed for assessing physical functioning. Focus group interviews were conducted among adults aged 50 years or older to investigate their level of understanding of 60 selected items, followed by a pretest of the items on a nationally representative sample (n=508). The final 25-item questionnaire was tested on an independent sample (n=259) for validity and reliability based on classical test and item response theories. Predictive validity at the 6-month follow-up was tested in a separate sample (n=263).
Results
The newly developed Life Functioning (LF) scale assessed the dimensions of functional limitations, disabilities, and social activities. The scale satisfied a one-dimensionality assumption with good item fit and demonstrated criterion validity, construct validity, high internal consistency (Cronbach’s alpha=0.93), and test-retest reliability (intra-class correlation coefficient=0.84; 95% confidence interval, 0.76–0.89). The LF scale comprised 25 items with a total score ranging from 0 to 100. Higher scores indicated higher levels of functioning. The LF score was significantly associated with the Physical Functioning score at 6 months.
Conclusion
The LF scale was developed to assess the physical functioning of people in their late midlife or older. Future studies should test the instrument on a national sample and evaluate its application in diverse population subgroups.
INTRODUCTION
With the rapidly aging population, healthy aging has gained increasing attention, even as the health span has lagged behind an increasing life expectancy. The high prevalence of functional limitations and physical disabilities in later years,1) brought on by risks accumulated over a lifetime, pose serious threats to healthy aging. The increasing costs of healthcare and long-term care highlight the need to promote healthy aging.
The goal of healthy aging is to enhance and maintain functional ability,2) based on an assessment of the physical domain of functioning. Nagi’s seminal work on the disability process3) formed a conceptual basis for the subsequent development of tools to measure physical functioning. Nagi’s model states that physical functioning progresses from pathology to impairment, functional limitations, and disability according to the respective cellular, body system, individual, and societal levels affected.
The early development of measuring tools mainly focused on assessing functional disability, which included basic activities of daily living (ADLs), such as eating and dressing, and instrumental ADLs (IADLs), such as using the phone or public transportation. In the ensuing years, instruments were developed that assessed functional limitations in task-specific activities, such as rising from a chair and reaching overhead. Such activities have been recognized as essential building blocks of physical functioning with ramifications for disability prevention.4) The World Health Organization has since proposed the International Classification of Functioning, Disability, and Health (ICF) that expanded on Nagi’s model, conceptualizing “activities” and “participation” that correspond to Nagi’s concepts of functional limitations and disability, respectively.5) One widely used instrument, the 36-item short-form developed as part of the Medical Outcomes Study, includes 10 items on physical functioning that disproportionately assess functional limitations over disability.6) A more balanced tool, the Physical Functioning (PF) scale, has been validated for use in older Koreans and encompasses a broader spectrum of functioning, with five items pertaining to functional limitations and five items to disability.7)
Over the years, the measurement of physical functioning has evolved to incorporate social engagement in various activities and uses advanced psychometric methods to evaluate individual item performance. Recent scales of physical functioning, such as the Late-Life Function and Disability Index (LLFDI), encompass social activities such as maintaining social contacts or participating in leisure activities.8,9) The LLFDI, as well as the Patient-Reported Outcomes Measurement Information System-Physical Function (PROMIS-PF) instrument, were developed based on item-response theory (IRT), which involves rigorous item analysis.10)
Currently, however, there are a limited number of tools to assess the physical function of older Korean adults, due to the limited scope and precision of the most widely used instruments. The Korean versions of the ADL and IADL indices (K-ADL, K-IADL) focus on the far end of the disability spectrum.11) The PF scale covers both functional limitations and disability but does not include social activities in its items.7) Moreover, the measurement validity of these scales has only been confirmed with classical test theory (CTT).
This study aimed to develop a new survey instrument for assessing the physical function of people aged 50 years or older living in the community. The value of the newly developed scale would be in covering a broader range of daily and social activities in its assessment of functional ability while maintaining robust psychometric properties. This study was initiated as part of a project to develop a physical functioning assessment tool that could be incorporated into the Korea National Health and Nutrition Examination Survey (KNHANES).
MATERIALS AND METHODS
Research Procedure
The primer created by Boateng et al.12) for best practices in scale development was employed to develop this new scale for the evaluation of physical functioning in aging populations. In the first phase, a 144-item bank was constructed by reviewing literature and assessing existing physical function scales and questionnaires, followed by an evaluation of each of the items by content experts. In the second phase, we selected 60 pretesting items based on content evaluation of the items by experts and the target population. The 60 items were then pretested on a nationally representative sample of 508 individuals and subsequently reduced, using multiple-item reduction techniques, to a final set of 25 items. In the last phase, the psychometric properties of the final 25-item scale were examined for concurrent and construct validity, internal consistency, and test-retest reliability in an independent sample of 259 individuals (Supplementary Fig. S1). A detailed description of item and scale development is shown in Supplement A.
In selecting the best set of items, item-statistic information obtained through item analyses, appropriateness to the theoretical structure, and item content were considered. During selection, 23 functional-limitation items and 13 disability items were eliminated due to item misfit, local dependence, poor discriminating power, or low measurement precision. Considering the distribution of item difficulties, we added an item, “standing for 2 hours.” Finally, our Life Functioning (LF) scale, designed to assess physical functioning across a broad range of daily activities in community-dwelling middle-aged and older adults, emerged. The LF scale consisted of 25 items, including 10 items on lower body function, six items on upper extremity function, six items on IADLs and ADLs, and 3 items on social functioning.
Scale Evaluation
In the scale evaluation phase, the psychometric properties of the LF scale were examined. The LF scale was administered to a sample of 259 adults aged 50 years or older from May to June of 2021. Stratified random sampling by age, gender, and geographic region was used to select samples. To test intra-individual response consistency across time, a subset of participants (n=100), proportionately sampled by age and gender, was reassessed with the LF scale within 10–14 days after the initial interview. Confirmatory factor analysis (CFA) was conducted in April 2022 using a different random sample (n=519). Predictive validity was tested using a subset of this sample (n=263) who were followed up at 5 to 6 months (Supplementary Fig. S1). These participants were examined with the LF-25 scale at baseline (April 2022), and levels of physical function at follow-up (October 2022) were assessed using the PF scale7) in order to investigate the extent to which the LF scores reliably predict future physical functioning.
In both the pilot-testing and scale-evaluation phases, trained interviewers conducted face-to-face interviews with respondents using Computer-Assisted Personal Interviewing. The interview questionnaires consisted of LF scale items, sociodemographics (e.g., age, gender, education levels, marital status, employment status, household income levels), and physical and mental health items. Physical health status was measured by a self-rated health question and the number of self-reported physician-diagnosed chronic conditions. Cognitive function was assessed by one item on subjective cognitive decline. The Korean version of the Patient Health Questionnaire-9 was used to assess mental health conditions.13) The PF scale developed by Lee et al.7) served as our criterion measure against which to compare the LF scale scores.
The LF scale items were phrased “How much difficulty do you have carrying out a particular activity without the help of someone else or the use of assistive devices?” and rated on a 5-point Likert scale: 4 (none), 3 (a little), 2 (some), 1 (a lot), and 0 (cannot do). The total score of the LF scale was calculated by summing individual raw item scores, which ranged from 0 to 100 (Supplements B, C). Higher scale scores indicated higher levels of physical functioning. The study was approved by the Institutional Review Board of Ajou University Hospital (No. AJOUIRB-SUR-2020-438, AJOUIRB-SUR-2021-677), and written informed consent was obtained from all participants before data collection. This study complied the ethical guidelines for authorship and publishing in the Annals of Geriatric Medicine and Research.14)
Statistical Analysis
Different statistical methods were employed to assess the psychometric properties of the LF scale. First, data screening was performed to identify missing data and outliers. The data screening process also included missing value analysis and the examination of outliers. Thirty-one cases (12%) had at least one missing value. Missing values were evaluated using the expectation maximization procedure to test whether missing values were completely random or whether there were patterns in the missing data. No pattern was identified in the missing values. The preliminary analysis included testing the assumptions of the statistical models. Descriptive data were generated for all study variables (e.g., sociodemographic information, physical and mental health status). An exploratory factor analysis was used to examine the latent structure of the LF scale items. Principal component analysis (PCA) was conducted on the 25 items with oblique rotation (direct oblimin). The number of factors was determined based on Kaiser’s eigenvalue criterion (eigenvalue≥1) and the scree plot test. CFA was conducted to assess the fit between the observed variables and the theoretically grounded factor structure. Given that Likert-scaled items were unevenly distributed (i.e., the presence of ceiling effects), the weighted least squares with mean and variance adjusted estimator was used.15)
In addition, IRT-based methods were adopted to assess the psychometric performance of the 25 items on the LF scale. The benefit of this approach was that the item and ability parameters were identical across groups or measurement conditions (i.e., parameter invariance). Graded response model analysis was utilized to calibrate item parameters (e.g., item difficulty parameters and item discrimination parameters) and to examine model-data fit, item misfit, and local independence. Differential item functioning (DIF) was used to examine measurement invariance between men and women. DIF was tested using the Wald chi-square test to detect items with DIF for gender. For concurrent criterion validity, the Pearson bivariate correlation coefficient was calculated to test the association between the LF scale and a criterion test (i.e., the PF scale). Independent sample t-tests and chi-square tests were carried out to test whether the LF scale scores discriminated between groups known to differ in physical functioning levels (e.g., gender, education level, morbidity). To test predictive validity, PF scores at 6 months were regressed on the LF score at baseline. The internal consistency reliability of the LF scale was assessed using Cronbach’s alpha. Weighted Kappa, agreement analysis, and intraclass correlation coefficient (ICC) were used to evaluate test-retest reliability in a sample of 100 respondents who completed the LF scale twice within a 2-week interval. Statistical analyses were performed using SPSS version 26.0 (IBM Corp., Armonk, NY, USA). For item response analysis, IRTPRO version 5.0 (Vector Psychometric Group, Chapel Hill, NC, USA) was used to examine model fit, local independence, and DIF. CFA was performed using Mplus version 8.10 (Muthén & Muthén, Los Angeles, CA, USA).
RESULTS
In the item development phase, experts evaluated the 144 items from the constructed item bank, leading to the selection of 60 items with high content validity. As part of the scale development, the items were refined based on focus group interviews of laypersons. In the pretest of 508 individuals aged 50 years and older, multiple-item reduction analyses resulted in the final set of 25 items. For a comprehensive summary of the item reduction results, please refer to Supplementary Tables S1 and S2. Detailed information on the item and scale development can be found in Supplement A.
For scale evaluation, a sample of 259 individuals completed the interview questionnaire (Table 1), ranging in age from 50 to 83 years (63.2±8.1 years). Over half (53.3%) of the participants were women. Most (72%) reported very good or good health, and a small share of the participants (5.4%) showed mild or moderate depressive symptoms. Approximately 16% reported cognitive decline over the last year, and 56.8% had at least one diagnosed chronic disease.
Item Analysis
IRT assumptions were tested and found satisfactory (Supplement A). Descriptive results for the LF scale items are presented in Table 2. The discrimination parameters ranged from 1.55 to 4.59, which indicates very good discrimination power,16) and the item difficulty parameters ranged from –4.19 to –0.41 (Table 3). Item difficulty parameters under the generalized partial credit model showed that 84% of items were very easy (b ≤–2) or easy (–2<b<–0.5), and 4 items had medium difficulty (–0.5≤b≤0.5).15) The test information function curve of the LF scale primarily covered the negative spectrum of the latent trait scale (θ ). Specifically, the test information function curve peaked at –1.6 of the latent trait scale (information=53.49) and fell off with increasing latent ability levels, which suggested that the LF scale could more precisely estimate the ability of respondents whose physical functioning levels fell near the peak (θ =–1.6). Given that negative numbers of the latent trait scale indicated low physical functioning levels, the measurement precision of the LF scale would be best for individuals below the mean physical functioning level or those with functional limitations. The test information function and standard errors to the LF scale are presented in Supplementary Fig. S2.
The results of DIF for gender in the LF scale items showed that “lifting 5 kg” and “preparing meals” were performed differently by men and women. For “lifting 5 kg,” women had significantly higher item difficulty than men across the latent trait. In contrast, for “preparing meals,” men had higher item difficulty than women (Supplementary Tables S1, S2).
Tests of Validity
Criterion validity was assessed by examining the correlation of the LF scale with the PF scale scores (the criterion test). There was a positive correlation between the two scale scores (r=0.84, p<0.001). The proportion of variance explained by the LF score was 70.6%.
To evaluate construct validity, we performed item homogeneity, factor analysis, and known-group validity. Strong evidence of item homogeneity was found in that individual item scores had significantly positive correlations with the total LF scale scores (p<0.001), which ranged from r=0.34 (getting in and out of bed) to r=0.81 (standing on tiptoes).
In the factor analysis, an initial analysis was run to obtain eigenvalues for each component in the data. Five components with eigenvalues >1 accounted for 66.6% of the total variance. The scree plot showed that the slope of the line changed at component numbers 2 and 5, indicating the need to retain either one component or four components (Supplementary Fig. S3). Combining the results of the Kaiser criterion and the scree plot test, two PCA models were identified. When one dominant factor was extracted, the underlying factor explained 41.8% of the total variance. The amount of variance in each item explained by the underlying factor ranged from 15.3% to 63.2%. Factor loadings ranged from 0.39 to 0.80, and only one item had a factor loading below 0.4 (reaching up overhead) (Table 4). In the four-factor PCA model, the amount of variance in each item accounted for by four factors ranged from 41.7% to 81.4%. We organized the items that clustered on the same factor and then analyzed the content of clustered items to identify common themes. This resulted in a four-factor structure: lower extremity (nine items), upper extremity (six items), IADL/ADL (six items), and indoor activity (four items).
Based on prior evidence and theories pertaining to the disablement model,3,5) a three-factor model was specified, with items Q1–Q10, Q11–Q16, and Q17–Q25 loaded onto the latent variables of lower extremity functioning, upper extremity functioning, and IADL/ADL, respectively (Fig. 1). The overall goodness-of-fit indices revealed that the three-factor model fit the data well (χ2(272)=404.615, p<0.001, standardized root mean squared=0.039, root mean square error of approximation=0.032, Tucker Lewis index=0.995, and comparative fit index=0.995). Also, all freely estimated unstandardized parameters were statistically significant (p<0.001).
Next, we evaluated whether the LF scale scores differed between distinct groups that had been known to vary in terms of physical functioning. The LF scale scores significantly differed by gender, age, education level, monthly household income, self-rated health conditions, subjective cognitive decline, depressive symptoms, and number of chronic conditions (Supplementary Table S3). Women, older age groups, those with poor physical and mental health conditions, and low socioeconomic status tended to exhibit significantly lower LF scores. The LF score at baseline was strongly associated with the PF score at the 6-month follow-up (β=0.80, p<0.001).
Tests of Reliability
The results of reliability tests demonstrated good internal consistency among the 25 LF scale items (Cronbach’s alpha=0.93). In addition, Cronbach’s alpha for the four sub-components was 0.915 for the lower extremity, 0.800 for the upper extremity, 0.808 for IADL/ADL, and 0.786 for indoor activity (Table 4). Across the four sub-components, no items would improve the internal consistency if they were deleted. Regarding test-retest reliability, the ICC was 0.84 (95% confidence interval, 0.76–0.89) based on a mean-rating (k=17), absolute agreement, and a two-way mixed effects model. Regarding item-level measurement agreement, the weighted Kappa of the LF scale items ranged from –0.01 to 0.61 (Supplementary Table S4). The majority of LF scale items had a weighted Kappa>0.20, with the observed agreement of the LF scale items ranging from 62% to 96%.
DISCUSSION
The 25-item LF scale was developed for use with KNHANES participants aged ≥50 years. CTT- and IRT-based analyses demonstrated that the LF questionnaire had adequate psychometric properties in mid- to late-life Koreans living in the community.
The LF scale had several strengths. The development of the LF scale was theory-based as its construction was based on Nagi’s disablement model,3) incorporating components of functional limitations and disability. The premise of this conception is that disability progresses, with difficulty performing specific tasks that involve upper and lower body functions occurring prior to the inability to participate in socially demanding roles. In addition, the content of the scale was expanded to include social activities so that it reflected the importance of social participation in daily activities, as stressed by the ICF model5) and other recently developed scales. The LLFDI, for instance, contains social activity items, such as going out with others to public places, as part of its disability component.8,9) The U.S. National Health and Nutrition Examination Survey (NHANES) Physical Function Questionnaire (PFQ) uses 20 items to assess functional limitations and disability, including participation in social activities.17) The PROMIS-PF instrument developed by the U.S. National Institutes of Health covers a wide range of physical functioning from self-care to strenuous activities.10) The inclusion of specific social components in the LF scale can be regarded as an enhancement of the PF scale developed in 2002 for use among older Korean adults.7)
The LF scale demonstrated good psychometric performance. Expert consensus helped ensure its content validity. Regarding its contents, the LF scale resembles the NHANES-PFQ in that a variety of body functions, mobility, and activity restrictions are assessed.18) Difficulties with lower body functions, such as stooping/crouching/kneeling and standing for 2 hours, were the most reported. The LF scores were significantly correlated with scores from the PF scale, demonstrating its concurrent validity. Since the LF scale distinguished groups among various sociodemographic and health-related characteristics, known-group validity was supported. CFA confirmed the latent structure of the LF items lower and upper body functions and disability. The LF scale exhibited highly reliable internal consistency (Cronbach’s alpha=0.93). Unlike the NHANES-PFQ, which was not tested for test-retest reliability,17) the LF displayed high agreement over the 2-week interval (ICC=0.84).
The LF scale did have several limitations. The LF scale’s total score was highly skewed with potential ceiling effects. Like the NHANES-PFQ, the LF scale lacked the ability to discriminate individuals with lower levels of disability.18) This is reflected in the fact that most community-dwelling respondents reported minimal difficulty in performing specified tasks. However, there was much variability in the reported difficulty of individual items, with lower-body tasks exhibiting more limitations than upper-body functions and daily living activities. Identifying the limitations in specific individual tasks would help target specific areas of functioning that need close monitoring. The total scores were still able to distinguish subgroups of different sociodemographic and health-related characteristics.
Notably, differential item functioning by gender was found for the items “lifting 5 kg” and “preparing meals,” in which women and men, respectively, reported more difficulty than the other at the same latent trait level. It is conceivable that older women are less likely than older men to encounter situations where they have to lift heavy objects. Nevertheless, by consensus, the experts considered lifting weights a central item for assessing functional ability, and suggested using various gender-neutral examples to reduce the gender-related differences in attitude toward such tasks. For example, “lifting two 2 L bottled water” could help reduce DIF since both men and women are likely to engage in such activity on a daily basis. It has been reported that culture-based gender norms prevent men from performing certain household chores, such as preparing meals.19) In this study, nine (3.6%) respondents reported not doing this activity. This item, however, was retained considering the changing gender roles.
Future areas of research merit attention. Although LF scores predicted the PF scores at 6 months, future studies are warranted to test the predictive validity of long-term outcomes, such as disability and mortality. Because the LF scale was based on self-reports, validation against physical performance tests would help to support its convergent validity.20) Sensitivity to change and identifying meaningful change would also contribute to assessing functioning over time, which is particularly useful in cohort studies and clinical trials. Performance in various subgroups, as well as replication in other samples, would confirm the scale’s validity. Because the LF scale consists of 25 items, the development of a short form that retains its validity would be practical in a national survey. In the future, this scale might be improved by including items that reflect not only the higher functioning levels of adults experiencing aging but also changes in lifestyle activities and the effects of using technology to aid functional ability. The LF scale was developed and validated among community-dwelling Korean adults, demonstrating its appropriateness for use in the KNHANES. The scale will be useful in the assessment and monitoring of physical function in mid- and late-life populations. Its application could support community-based health promotion initiatives and aid in the development of national plans for healthy population aging.
Notes
The authors would like to acknowledge the following members of the expert committee: Hyunsuk Jeong (Catholic University), Jae Woo Park (Korea Counseling Graduate University), So Young Moon (Ajou University), Sang Joon Son (Ajou University), Kwang-il Kim (Seoul National University), Hee-Won Jung (University of Ulsan), Ki Young Son (University of Ulsan), Ji Eun Lee (Seoul National University).
CONFLICT OF INTEREST
The researchers claim no conflicts of interest.
FUNDING
This work was supported by funding from the Korea Disease Control and Prevention Agency (KDCA).
AUTHOR CONTRIBUTIONS
Conceptualization, YL; Study design, YL, EK, SJ; Data curation and analysis, EK, JC, JinK; Funding acquisition, YL; Methodology, YL, EK, SJ; Project administration, YL, EK, JY, JiK, KO, JiK; Critical appraisal, CWW, MK; Writing–original draft/Writing–review & editing, YL, EK.
SUPPLEMENTARY MATERIALS
Supplementary materials can be found via https://doi.org/10.4235/agmr.24.0087.