Standardized Measurement of Muscle Strength and Physical Performance for Sarcopenia: An Expert-Based Delphi Consensus
Article information
Abstract
Background
Despite updated sarcopenia guidelines, inconsistent protocols still cause clinical confusion and may compromise diagnostic and outcome accuracy. This Delphi study aimed to establish expert consensus to support the standardization of muscle strength and physical performance assessments for sarcopenia.
Methods
A two-round modified Delphi study was conducted with 26 experts in geriatrics and sarcopenia. Participants completed two rounds of anonymous questionnaires evaluating 39 items across seven domains using a nine-point Likert scale or choice-based questions. Consensus was defined as ≥75% agreement.
Results
In total, 27 of 38 statements (71.1%) reached consensus across two rounds Experts supported further standardization of assessments in alignment with the Asian and Korean Working Group on Sarcopenia (AWGS and KWGS) guidelines. For handgrip strength, consensus was achieved on using both mechanical and hydraulic dynamometers, hydraulic protocols, value selection, measurement time, and positioning, but not on mechanical protocols, repetitions, recovery intervals, repetitions, or unified cutoff values. For calf circumference, consensus was reached on measurement position, method, and value selection, but not on guideline application. In gait speed assessment, agreement was reached on speed, repetitions, assistive device use, and equipment type, but not on value selection, distance, acceleration/deceleration phases, or device interchangeability. For the 400-m walk test, the KWGS guideline and speed were endorsed. Chair stand test (CST) and Timed up-and-go (TUG) test reached consensus on armrest use, value selection, and repetitions, but not on seat height, (CST), or speed (TUG).
Conclusion
This study highlights areas of agreement and ongoing uncertainty, supporting future standardization efforts sarcopenia assessment methods.
INTRODUCTION
Sarcopenia is a progressive skeletal muscle disorder characterized by the loss of muscle mass and strength, leading to an impaired physical performance and increased risks of falls, functional decline, frailty, and mortality.1,2) A definition for sarcopenia has evolved through the achievement of a global expert consensus, with the current guidelines from the European Working Group on Sarcopenia in Older People (EWGSOP2) and the Asian Working Group on Sarcopenia 2019 (AWGS 2019) requiring assessments of muscle mass, strength, and physical performance to diagnose and grade the severity of sarcopenia.3,4) Despite standardized protocols, definitions for sarcopenia still vary due to differences in assessment methods and population characteristics. For example, EWGSOP2 and AWGS 2019 guidelines recommend different thresholds. To reflect population-specific needs, the Korean Working Group on Sarcopenia (KWGS) recently introduced guidelines tailored to the Korean population.5)
However, a universally accepted definition of sarcopenia has yet to be established. This lack of standardization has contributed to a substantial variation in the reported prevalence, incidence, and treatment outcomes for sarcopenia across studies. The absence of a unified definition has likely impeded effective identification and management of sarcopenia in both the clinical and research settings.1) To address this issue, the Global Leadership Initiative in Sarcopenia (GLIS) was recently established to develop a globally applicable definition.1) Through a Delphi-format consensus process, GLIS has proposed a definition that includes reduced muscle mass and strength, including muscle-specific strength. Importantly, physical performance is not considered part of the diagnostic criteria but rather as an outcome measure.
Despite advances in sarcopenia diagnostic guidelines, substantial variability remains in measurement procedures and cutoff values. Dual-energy X-ray absorptiometry and bioelectrical impedance analysis are widely used to assess muscle mass, but inconsistencies persist due to differences in calibration, software algorithms, and scanning protocols.6,7) The AWGS 2019 guidelines provide method-specific cutoff points,3) underscoring the need for cautious interpretation and cross-method compatibility. Unlike instrument-based assessments of muscle mass, muscle strength—particularly handgrip strength (HGS)—is assessed manually and is more vulnerable to variability due to differences in protocols, device types, and operator technique. Discrepancies between hydraulic and mechanical dynamometers have also been reported,8-10) but no consensus exists on whether device-specific cutoffs are needed or if universal thresholds with calibration adjustments suffice. HGS protocols also lack standardization,11) with variations in the number of trials,3,5,12,13) grip duration,11-14) and recovery intervals.3,5,11,13,15)
Similar challenges also affect physical performance assessments. Standardized, accurate evaluation of functional outcomes is essential for monitoring intervention efficacy. However, heterogeneous protocols that were reported across studies and became part of guidelines may create confusion for clinicians and hinder an appropriate selection of assessment tools. Procedural variability can affect diagnostic accuracy and prevalence estimates, complicating risk identification and longitudinal monitoring. Although KWGS has recently introduced sarcopenia guidelines adapted to the Korean population, implementation gaps remain due to limited clinician familiarity and uncertainty. This Delphi study aimed to establish an expert consensus on standardized protocols for assessing muscle strength and physical performance to enhance diagnostic accuracy, consistency, and the clinical relevance of sarcopenia outcome evaluations.
MATERIALS AND METHODS
Study Design
We conducted a two-round modified Delphi study, following the Conducting and Reporting Delphi studies guidelines.16) Our process for achieving a consensus employed structured questionnaires, with a pre-defined first-round questionnaire rather than open-ended items.17) An Institutional Review Board approval was not required, as the present study collected expert opinions to inform clinical practice rather than new data from human participants.
Delphi Panel
Healthcare experts were recruited from the Korean Society of Sarcopenia, the Korean Geriatrics Society, and the Korean Society for Bone and Mineral Research. All panelists had recognized expertise in the care and research of older adults and sarcopenia. The panel included both clinical and non-clinical professionals, such as orthopedic surgeons, geriatricians, rehabilitation physicians, exercise physiologists, and endocrinologists. An invitation package—including an overview of the study objectives, the Delphi process adopted, and the study timeline—was sent to 58 experts. Of these, 26 experts (44.8%) agreed to participate in the full Delphi process.
Delphi Questionnaire Domains/Statements
A targeted literature review was conducted on HGS and physical performance assessments, along with the existing sarcopenia-related guidelines and protocols. Relevant studies, reviews, and guidelines were examined to identify inconsistencies, omissions, and areas needing further discussion. Based on this literature review, measurement items were developed across seven domains: (1) general aspects of standardizing physical performance assessments, (2) HGS, (3) calf circumference (CC), (4) gait speed, (5) 400-m walk test, (6) chair stand test (CST), and (7) timed up-and-go (TUG) test. Two structured rounds of online questionnaires (in Korean language) were used to achieve an expert consensus. A steering committee comprising five rehabilitation clinicians with extensive experience in sarcopenia research and clinical care was established to guide the development of the questionnaire. The committee oversaw the entire process, including domain selection, item formulation, and interpretation of results. An initial draft of 31 questions was developed by two principal investigators (S.K.L. and J.Y.L.) and subsequently refined through iterative discussions within the committee. During the initial group meeting, all items were reviewed for clarity and conciseness. Based on feedback, the initial questionnaire was revised and expanded to include 39 items.
Delphi Survey Method Process and Administration
A two-round modified Delphi process was conducted, as most studies have reported the achievement of consensus within two rounds.17) The first round occurred from November 1 to 21, 2023, and the second from January 15 to February 5, 2025. The experts received study materials via email and accessed questionnaires through a secure, controlled-access link. All responses were anonymous. In the first round, the participants provided background information and answered 39 questions: eight 9-point Likert-scale items (0=total disagreement, 9=total agreement), one multiple-choice question, and 30 single-choice items. Based on feedback and results, 28 questions were selected for the second round. To support decision-making, each question included relevant protocols, guidelines, and research summaries. The second round consisted of two Likert-scale statements and 26 single-choice questions. The revised versions of questionnaire items that had not reach consensus in the first round were reassessed by all experts participating in the study. For those still lacking consensus, the study team reviewed and summarized the reasons for disagreement.
Statistical Analysis
Descriptive statistics were used to analyze data from both rounds of the Delphi survey. For Likert-scale items, we calculated the mean value, standard deviation, interquartile range, consensus, and content validity ratio (CVR). A consensus was defined as strong agreement when the consensus value exceeded 0.75, using the formula: Consensus = 1 – [(Q3 – Q1) / median]. CVR was calculated as CVR = [Ne – (N/2)] / (N/2), where Ne is the number of experts providing a positive response (score ≥7), and N is the total number of experts. A CVR above 0.37 in Round 1 (N=26) and 0.42 in Round 2 (N=21) indicated sufficient convergence.18) Likert-scale items meeting both consensus and CVR thresholds were considered to have reached agreement. For single-choice items, consensus was defined as an agreement rate ≥75%. Items not meeting these criteria were classified as non-consensus. All analyses were performed using R software version 4.3.1 (R Foundation for Statistical Computing, Vienna, Austria).
RESULTS
Study Participants
Of the 58 experts invited to participate in this Delphi study, the majority were in their 40s or 50s. A total of 26 physicians agreed to participate and completed Round 1 (44.8%), while 21 completed Round 2 (80.8%). Most participants specialized in rehabilitation medicine (53.8%, n=14), and the majority had more than 10 years of professional experience (76.9%, n=20) (Table 1).
Agreement Ratings in the Round 1 Delphi Survey
The results of Round 1 and 2 Delphi surveys are presented in Tables 2 and 3. In Round 1, a consensus was reached for 10 out of 38 statements (36.3%), while 28 statements (73.7%) did not reach consensus. The Round 1 questionnaire is provided in Supplement A. An agreement was reached on the need for further standardization of physical performance assessments, as outlined in the AWGS and KWGS guidelines (75%, CVR=0.69). The experts identified multiple assessments requiring standardization, with HGS being cited most frequently, followed by gait speed (62.5%), and CST (50%).
For HGS, a consensus was reached on using both mechanical and hydraulic dynamometers (80.8%) and adhering to the existing protocols for hydraulic types (87.5%, CVR=0.69). However, no consensus was reached regarding protocol adherence for mechanical devices, cutoff values across device types, positioning, repetitions, measurement timing, or recovery intervals.
For CC, there was no agreement on applying AWGS and KWGS guidelines, measurement position, laterality, or value criteria. For gait speed, a consensus was reached on conducting two repetitions (76.9%), allowing assistive devices (88.5%), and using both manual and automated devices (80.7%). However, there was no agreement on measurement distance, acceleration/deceleration zones, speed, value selection, or the interchangeability of devices. In the 400-m walk test, the consensus supported an adherence to KWGS guidelines (96.4%, CVR=0.62), but not regarding an appropriate walking speed.
For the CST, an agreement was reached on following the KWGS guidelines (87.5%, CVR=0.77) and the cutoff value (85.7%, CVR=0.62), but not on chair type, end position, measurement value, or seat height. For the TUG test, a consensus was reached on adherence to the KWGS guidelines (85.7%, CVR=0.69), but not on chair type, pace, seat/armrest height, repetitions, or values.
Agreement Ratings in the Round 2 Delphi Survey
Following Round 1 analysis, the Delphi questionnaire was revised for clarity, with modified statements addressing non-consensus items. The second-round survey included 28 statements, of which 17 (60.7%) reached consensus and 11 did not. The full questionnaire is provided in Supplement B.
For HGS, a consensus was reached on protocol adherence for mechanical dynamometers (75.0%, CVR=0.87), use of the sitting position with hydraulic types (76.2%), recording the maximum value (90.5%), and a minimum measurement time of 3 seconds (76.2%). However, no agreement was reached on cutoff values by device type, positioning with mechanical types, number of repetitions, or recovery intervals.
For CC, a consensus was achieved on applying the AWGS and KWGS guidelines (75.0%, CVR=0.73), measuring in a standing position (90.5%), bilateral measurement (95.2%), and recording the maximum value from both sides (50.5%). For gait speed, an agreement was reached on using a 4-m walk test (76.2%) and measuring the usual walking speed (90.5%), but not on acceleration/deceleration phases, value selection, or device interchangeability. For the 400-m walk test, a consensus supported the use of the fastest possible walking speed.
For the CST, a consensus was reached on using a straight-back chair without armrests (95.2%) and selecting the fastest value from the 5-repetitions test or the maximum value from the 30-second test (81.0%). No agreement was reached on end position (sitting or standing) or whether seat height should be based on the mean Korean popliteal height value in the sitting position or individual popliteal height plus the mean heel height. For the TUG test, a consensus was reached on using a straight-back chair without armrests (90.5%), no need for armrests (85.7%), performing two repetitions (100%), and recording the maximum value. No consensus was reached on walking speed or seat height.
In total, 27 of 38 statements (71.1%) reached consensus across both rounds, while 11 (28.9%) did not. The final results are presented in Tables 4 and 5. Based on the results of the present Delphi study, the proposed standardized measurement protocols for physical performance and muscle strength are summarized in Table 6.
DISCUSSION
The diagnosis of sarcopenia and evaluation of related outcomes depend on standardized assessments of muscle strength and physical performance. However, a significant variation in assessment protocols and measurement tools poses ongoing challenges. These inconsistencies complicate clinical implementation, introduce measurement variability, and may affect diagnostic accuracy and reported prevalence. To address these issues, this study used a modified Delphi method to build an expert consensus on muscle strength and physical performance assessments employed for sarcopenia. Although an agreement was reached on adherence to AWGS and KWGS guidelines, further standardization is needed. A persistent lack of consensus on several items underscores the need for ongoing research and discussion.
Handgrip Strength
Both mechanical (e.g., Smedley) and hydraulic (e.g., Jamar) dynamometers are used to assess HGS, each with specific standardized protocols.11-14) Hydraulic devices are typically used in a seated position with the elbow flexed at 90°,11-13) while mechanical devices are generally used standing with the elbow extended,14) although a seated position with an extended elbow is recommended when standing is not feasible.5) Despite these guidelines, no consensus has been reached on the optimal posture for mechanical devices. This likely reflects differing expert perspectives—some prioritize the accommodation of older or frail adults with a seated posture, while others support permitting both positions. Given that HGS values vary between sitting and standing with mechanical devices,19) a consistent positioning based on patient condition is essential.
In conformity with KWGS guidelines,5) most experts did not agree on the interchangeability of measurements between hydraulic and mechanical dynamometers. Although calibration between device types was recognized as an important issue, no consensus was achieved. Studies have shown that hydraulic devices generally yield higher values than mechanical ones,8-10) yet current guidelines lack device-specific cutoffs or calibration-adjusted values—highlighting a need for further research in this area.
The appropriate method to be employed for estimating HGS remains inconsistent. Although some advocate using the mean of multiple trials for greater accuracy,13,20) others argue that frail individuals may fatigue quickly, leading to underestimated mean values compared to their true maximal grip strength.21) As most standard protocols11,12) and studies use the highest value,22) this approach is generally considered more practical and appropriate. Adherence to laterality is also inconsistent: although standard protocols recommend assessing both hands,11-14) and AWGS and KWGS suggest using either both arms or the dominant arm,3,5) many studies have measured only the dominant hand. Since the dominant hand is typically stronger due to muscle hypertrophy,23) while right-hand dominance in tools and activities can affect strength regardless of handedness,23,24) measuring both hands and using the maximum value is likely to yield the most accurate assessment.
Standard protocols recommend three HGS measurements,11-14) while the AWGS and KWGS guidelines recommend at least two trials.3,5) Although HGS tends to increase gradually with repeated trials,10,25) studies have shown that only the difference between the first and second trial measurements is clinically meaningful in older adults, with minimal change thereafter—supporting the sufficiency of two trials for this group.25) Additionally, the Korea National Health and Nutrition Examination Survey switched from three measurements to two starting in 2022.26) Similarly, most respondents in this study preferred two trials, underscoring the need for further discussion on the optimal repetition number to be applied.
The recommended duration for HGS assessment varies across protocols, ranging from 3–5 seconds,13) to at least 3 seconds11) or instructions like “squeeze until the needle stops rising” 12) or “until you cannot squeeze any harder.” 14) However, prolonged contraction can elevate blood pressure and heart rate,27) increasing the risk of fatigue or tendon injury in frail older adults.28) Although no consensus exists on the optimal duration for isometric tension, 3–10 seconds is generally effective,29) with a maximal effort of 3–5 seconds recommended to minimize energy depletion,30) and 3 seconds considered appropriate for older adults to reduce fatigue.27) While the AWGS and KWGS guidelines do not specify a fixed duration,3,5) a 3–5 second measurement is considered suitable.
Recovery intervals vary across protocols. Some recommend at least 15 seconds for alternating-hand measurements,13) others suggest 60 seconds11,21) or a range of 15 seconds to 1 minute15) to prevent fatigue. Generally, short to moderate rest periods (60–120 seconds) are sufficient for maximizing muscular strength gains.31) The KWGS guidelines do not specify a time limit,5) whereas the AWGS guidelines suggest avoiding a fixed acquisition time.3) An interval of around 60 seconds between trials may be appropriate, although further discussion is needed.
Calf Circumference
The KWGS guidelines recommend measuring CC in a standing position using a non-elastic tape, with cutoff values of <34 cm for men and <33 cm for women. However, it does not specify laterality or whether to use the maximum or mean value.5) Sitting measurements can overestimate CC, and may lead to an underdiagnosing of sarcopenia, while right-side standing measurements show the strongest correlation with muscle mass and function, enhancing diagnostic accuracy.32) In this Delphi study, a consensus supported the use of the maximum value from bilateral standing measurements as the most appropriate approach.
Gait Speed
The AWGS 2019 guidelines recommend measuring the time taken to walk 6 m at a normal pace from a moving start, excluding deceleration, and computing the mean value of at least two trials.3) The KWGS guidelines permit both 4 m and 6 m tests with 1–1.5 m acceleration and deceleration phases, but do not define how values should be estimated.5) In this Delphi study, a consensus supported the 4-m test with consideration of acceleration and deceleration, although no agreement was reached on their exact length—likely due to variations in clinical settings. Since frail older adults may not reach a steady gait until around 2.5 m, a proper accounting of these phases is essential for accuracy and repeatability.33)
Regarding value estimation, most respondents favored using the mean value; however, a consensus was not reached. The maximal value may better reflect true capacity and account for variability in initial attempts, while the mean value reduces measurement error.33) Further discussion is needed to determine the optimal approach.
While manual stopwatches are more accessible and commonly used in clinical settings, automatic timing devices are being increasingly adopted,3) and both are considered appropriate for assessing gait speed. Studies have reported comparable results between the two methods over various distances, with minimal error margins.33) However, discrepancies exist—for instance, slower gait speeds have been recorded with stopwatches compared to automatic timers,34) and manual static-start protocols may overestimate slowness compared to dynamic-start protocols using automatic timers.35) Additionally, manual moving-start measurements tend to yield faster speeds than both manual standing-start and automatic methods.36) Despite a general agreement on the use of both methods, a consensus was not reached on whether their results are interchangeable, warranting further investigation.
400-m Walk Test
The KWGS guidelines recommend the 400-m walk test to assess physical performance, with completion times over 6 minutes indicating reduced function, in line with EWGSOP2 criteria.4,5) The test involves walking 20 laps of 20 m along a marked corridor after a 2-minute warm-up. Participants are instructed to walk as fast as possible without running, receiving standardized encouragement at each lap. Rest is allowed without pausing the timer.4,37,38)
Chair Stand Test and Timed Up-and-Go Test
Both the CST (30 s/5 repetitions) and TUG test require a chair, but the optimal chair type to be used varies, with inconsistencies in seat height and armrest used being prevalent across studies.39,40) A straight-back chair with a seat height of 43–46 cm is generally recommended.38,41,42) For the CST, arm use is typically restricted, with participants crossing their arms over the chest.38,41,42) However, this may not be suitable for older adults with reduced function, as the absence of armrests can increase the risk of falls.39) The TUG test, involving standing, walking 3 m, turning, and returning, usually recommends a chair with armrests38,43) and permits walking aids.38,40) While both tests assess physical performance, the CST focuses on lower body strength while the TUG test assesses overall mobility. In this study, experts favored a straight-back chair without armrests for both tests. Given the variability in clinical settings, assessments should be adapted to the condition of the patient and the environment.
Seat height significantly impacts test performance: lower seats increase difficulty, while higher ones reduce hip and knee effort.44) In the 30-second CST, participants performed best with chairs at 120% and 110% of their lower leg length, while performance did not differ significantly between the standard 43 cm chair and chairs at 90% or 80% of this value.45) In the 5-repetition CST, performance improved at 115% of knee height compared to 100%, with slower, though not significantly different, times achieved at 85% of knee height.44) Among Koreans aged 60 and older, the mean knee height value is approximately 38 cm,46) and accounting for 3 cm of footwear height, the effective seat height is 41 cm, which corresponds to approximately 89%–95% of the standard chair height (43–46 cm). Thus, current protocols may overestimate performance by using chairs 104%–112% of typical knee height. Given population-specific anthropometry, further research is needed to identify the optimal seat height for Koreans.
The CST ending position differs between guidelines: EWGSOP2 specifies a standing end,4) while KWGS allows both a sitting end (>11 seconds) and a standing end (>10 seconds).5) For the TUG test, EWGSOP2 guidelines recommend walking at a “comfortable, fast, and secure pace,”4) whereas KWGS advises a usual (comfortable) pace.5) Both usual and maximum effort TUG protocols are accepted, although a maximum effort regimen is preferred because of its faster mean time, lower between-study variance, and greater reliability.40) This Delphi study did not reach a consensus on either point, highlighting the need for further discussion.
This study has several limitations. Although the expert panel consisted of specialists in the care and research of older adults and sarcopenia, most were from rehabilitation medicine, potentially limiting the generalizability of the findings. The lack of face-to-face meetings may have limited nuanced discussion, and deeper consensus-building. The Delphi questionnaire had a limited scope; although based on prior studies and guidelines to address key gaps, it could not cover all aspects of physical performance assessment, and some issues may have been overlooked. The use of predominantly closed-ended questions may also have influenced responses. Further, the small sample size and lack of consensus on several items reflect ongoing debate or limited evidence in certain areas. These limitations underscore the need for further research, including larger expert panels and clinical trials, to support the standardization of muscle strength and physical performance assessments in sarcopenia.
In conclusion, despite updates to sarcopenia guidelines and numerous studies on muscle strength and physical performance assessments, measurement variability persists due to differences in tools and protocols. The results of this Delphi study highlight the ongoing need for standardization—while some components have been clarified, and inconsistencies remain. Accurate and consistent assessments are critical for a reliable diagnosis, risk identification, and outcome evaluation. However, the efforts to improve precision must be balanced with practicality to ensure accessibility and ease of use. In addition to technical issues such as calibration and cutoff values, the practical challenges—particularly variability across clinical and community settings—must be addressed. Continued efforts are warranted to standardize assessment protocols and enhance sarcopenia diagnosis and management.
Notes
The authors would like to thank all the participants for their valuable contributions to this Delphi study.
CONFLICT OF INTEREST
The authors declare no conflicts of interest.
FUNDING
This research was supported by a grant of Patient-Centered Clinical Research Coordinating Center (PACEN) funded by the Ministry of Health & Welfare, Republic of Korea (Grant No. RS-2020-KH095863).
AUTHOR CONTRIBUTIONS
Conceptualization, SKL, JYL; Formal analysis, SKL; Funding acquisition, JYL; Investigation & validation, SKL, JWB, SYL, KHM, JYL, SEB, YHC, JHC, JYC, JYH, HCJ, HWJ, KIK, YJK, YSK, JHL, JIL, SYL, KBL, BJO, SJP, GYS, WS, CWW, JIY, SDY; Methodology, SKL, JWB, SYL, KHM, JYL; Project administration, JYL; Supervision, SKL, JYL; Writing—original draft, SKL; Writing—review & editing, SKL, JWB, SYL, KHM, JYL. All authors read and agreed to the published version of the manuscript.
SUPPLEMENTARY MATERIALS
Supplementary materials can be found via https://doi.org/10.4235/agmr.25.0070.
Delphi Round 1 Questionnaire
Delphi Rount 2 Questionnaire
