Methods matter: the relationship between strength and hypertrophy depends on methods of measurement and analysis

Andrew D. Vigotsky; Brad J. Schoenfeld; Christian Than; J. Mark Brown

doi:10.7717/peerj.5071

Methods matter: the relationship between strength and hypertrophy depends on methods of measurement and analysis

Andrew D. Vigotsky ¹, Brad J. Schoenfeld², Christian Than³, J. Mark Brown³

1Department of Biomedical Engineering, Northwestern University, Evanston, IL, United States of America

2Department of Health Sciences, City University of New York, Herbert H. Lehman College, Bronx, NY, United States of America

3School of Biomedical Sciences, University of Queensland, St. Lucia, Queensland, Australia

DOI: 10.7717/peerj.5071

Published: 2018-06-27
Accepted: 2018-06-04
Received: 2018-02-27

Academic Editor: Scotty Butcher

Subject Areas: Kinesiology, Statistics
Keywords: Hierarchical linear models, Repeated measures, Strength, Hypertrophy, Analysis of covariance, Regression

Copyright: © 2018 Vigotsky et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Cite this article: Vigotsky AD, Schoenfeld BJ, Than C, Brown JM. 2018. Methods matter: the relationship between strength and hypertrophy depends on methods of measurement and analysis. PeerJ 6:e5071 https://doi.org/10.7717/peerj.5071

The authors have chosen to make the review history of this article public.

Abstract

Purpose

The relationship between changes in muscle size and strength may be affected by both measurement and statistical approaches, but their effects have not been fully considered or quantified. Therefore, the purpose of this investigation was to explore how different methods of measurement and analysis can affect inferences surrounding the relationship between hypertrophy and strength gain.

Methods

Data from a previous study—in which participants performed eight weeks of elbow flexor training, followed by an eight-week period of detraining—were reanalyzed using different statistical models, including standard between-subject correlations, analysis of covariance, and hierarchical linear modeling.

Results

The associative relationship between strength and hypertrophy is highly dependent upon both method/site of measurement and analysis; large differences in variance accounted for (VAF) by the statistical models were observed (VAF = 0–24.1%). Different sites and measurements of muscle size showed a range of correlations coefficients with one another (r = 0.326–0.945). Finally, exploratory analyses revealed moderate-to-strong relationships between within-individual strength-hypertrophy relationships and strength gained over the training period (ρ = 0.36–0.55).

Conclusions

Methods of measurement and analysis greatly influence the conclusions that may be drawn from a given dataset. Analyses that do not account for inter-individual differences may underestimate the relationship between hypertrophy and strength gain, and different methods of assessing muscle size will produce different results. It is suggested that robust experimental designs and analysis techniques, which control for different mechanistic sources of strength gain and inter-individual differences (e.g., muscle moment arms, muscle architecture, activation, and normalized muscle force), be employed in future investigations.

Introduction

The combined actions of neural input, muscles, and the joint(s) about which those muscles act serve to produce sufficient endpoint force for physical function, allowing the performance of activities of daily living, as well as the spectrum of athletic endeavors. Due to the complexity of the neuromuscular and musculoskeletal systems, many factors can influence strength, including, but not limited to, muscle moment arm, muscle size, activation, muscle architecture, and normalized muscle force (or specific tension) (Vigotsky, Contreras & Beardsley, 2015). Muscle size is of particular interest, as (1) it is highly plastic (Fluck & Hoppeler, 2003) and (2) a clear positive relationship exists between baseline muscle cross-sectional area (CSA) and strength, with greater CSAs correlating with greater strength capacities (Maughan & Nimmo, 1984; Maughan, Watson & Weir, 1984; Schantz et al., 1983). However, this relationship is not necessarily linear, as several additional factors interactively influence strength capacity (Vigotsky, Contreras & Beardsley, 2015); studying the role of and relationship between muscle size and strength is therefore less straightforward under longitudinal contexts.

While the cross-sectional correlation between muscle mass and strength remains well-established, some researchers have recently challenged the belief that resistance training (RT)-induced hypertrophy significantly impacts the ability to produce force, claiming improvements in these outcomes are separate and unrelated adaptations (Buckner et al., 2016a). Indeed, data remain somewhat equivocal on the relationship between changes in size and changes in strength resulting from regimented RT: A considerable range of correlation coefficients have been observed, from ∼0 to ∼0.6 (Ahtiainen et al., 2016; Appleby, Newton & Cormie, 2012; Baker, Wilson & Carlyon, 1994; Balshaw et al., 2017; Cribb et al., 2007; Erskine, Fletcher & Folland, 2014; Erskine et al., 2010; Loenneke et al., 2017; Maeo et al., 2018; Pope et al., 2016; Rasch & Morehouse, 1957; Watanabe et al., 2018). The discrepancies in findings between studies may be related, in part, to the statistical measures employed to analyze relationships between muscle hypertrophy and strength gain. For instance, analyses in a majority of studies are based on between-subject data using only two time points, but within-subject analyses are more appropriate for the question at hand. Inferentially, drawing individual-level conclusions from group-level data is a statistical fallacy, known as the ecological fallacy (Robinson, 1950). Pragmatically, this problem can be better understood by differentiating between the question that each analysis addresses. Between-subject analyses answer the question, “Do those who grow more also get stronger than those who grow less?” Conversely, within-subject analyses answer the question, “Is the growth of one’s muscle related to their increases in strength?” Due to individual differences, the former (between-subject) may not necessarily map to the latter (within-subject). For example, if subject A has a 30% larger muscle moment arm than subject B, then one may expect subject A to have a 30% greater slope between increases in muscular strength (force) and externally-measured strength (moment), all else being equal. To address the ecological fallacy and answer the within-subject question, more sophisticated statistical approaches are needed (Goldstein, 2011; Jackson, Best & Richardson, 2006; Robinson, 1950).

A hierarchical approach can assist in avoiding the pitfall of the ecological fallacy (Goldstein, 2011; Jackson, Best & Richardson, 2006). Traditionally, each participant’s change in strength and change in size, from pre- to post-intervention, are calculated and regressed among one another (Ahtiainen et al., 2016; Appleby, Newton & Cormie, 2012; Baker, Wilson & Carlyon, 1994; Balshaw et al., 2017; Cribb et al., 2007; Erskine, Fletcher & Folland, 2014; Erskine et al., 2010; Maeo et al., 2018; Pope et al., 2016; Rasch & Morehouse, 1957; Watanabe et al., 2018). However, a hierarchical modeling approach allows for one to look at time points nested within participants, such that each participant’s points are kept “separate” from other participants (Gelman & Hill, 2007; Goldstein, 2011; Raudenbush & Bryk, 2002). Within the hierarchical model, each participant can receive varying intercepts and/or varying slopes, which allows for inter-individual differences to be appropriately accounted for (Gelman & Hill, 2007; Goldstein, 2011; Raudenbush & Bryk, 2002). To carry out hierarchical modeling with varying slopes and intercepts, multiple (≥3) time points are required (i.e., to quantify model variance), so most training datasets cannot be used to answer this question, as a majority only collect data at two time points (pre- and post-intervention). To date, only one study has employed a within-subject analysis: Loenneke et al. (2017) used analysis of covariance (ANCOVA) (Bland & Altman, 1995a) and found appreciably greater coefficients of determination in within- relative to between-subject models for the same muscle and strength test (e.g., R² = 0.004 vs. 0.35). However, in contrast to hierarchical linear models, ANCOVA has an affine assumption; participants receive different intercepts, but all are constrained to the same slope (Bland & Altman, 1995a). Therefore, further work is needed to understand how model choice affects the strength of the relationship between hypertrophy and changes in strength.

The purpose of this study was to investigate the relationship between changes in muscle size and strength in the elbow flexors using a variety of statistical and measurement approaches, while also employing both between- and within-subject analyses over multiple time-points during periods of both training and detraining. It was hypothesized that different statistical models would produce different outcomes, with between-subject correlations showing the weakest relationships and hierarchical linear modeling showing the strongest.

Methods

Participants

The study reanalyzed data from a previously published study, the methods of which have been described (Than et al., 2016). In brief, young, recreationally active individuals (mean ± SD, age = 24 ± 3 years, BMI = 22 ± 2, n = 19) were recruited for participation in the study. Participants reported exercising at least three times per week via various sporting activities but did not perform resistance training for the elbow flexors. Informed consent was obtained for all participants. The original study was approved by the University of Queensland Medical Research Ethics Committee (no. 2014001416).

Muscle size

Measures of muscle thickness were obtained via B-mode ultrasound imaging (Mindray DP-50) using a 7.5 MHz linear transducer probe. Images were taken at baseline and after each week of training throughout the 16-week study period. Scanning was carried out by a trained sonographer on both the dominant and non-dominant elbow flexors at 30, 50, and 70% of total length of the biceps brachii whilst participants were seated with the antebrachium in a neutral position. After Weeks 4, 8, and 16, CSA scans were acquired for both upper limbs via panoramic B-mode ultrasound (S3000 Siemens/Acuson system) using a 4–9 MHz linear transducer operating at 9 MHz. Imaging for CSA was obtained via lateral acquisition at 50% width of the biceps brachii. Values for both muscle thickness and CSA were determined using ImageJ (version 1.48; National Institutes of Health, Bethesda, MD, USA). Muscle thickness was not assessed for Week 4 due to a conflict in scheduling with CSA ultrasounds. All ultrasound measures were completed by a paid qualified professional, and not by the researchers of the paper. If the probe lost contact at any point during the measurement, the measurement was retaken. Test-retest intraclass correlation coefficients (ICC; model 2,1) of 0.99 and 0.97 for CSA and muscle thickness, respectively, have been previously reported (Jenkins et al., 2015). Because an ICC(2,1) model was used, these results are generalizable to the experienced rater in this study (Koo & Li, 2016).

Resistance training protocol

Resistance training for the non-dominant brachium was carried out five days per week for the initial eight weeks of the study, followed by a subsequent eight-week detraining period. Training consisted of unilateral dumbbell elbow flexion performed with a supinated forearm. During each session, participants performed nine sets of 12 repetitions with a 90-second rest interval afforded between sets. Loads were based on maximal voluntary isometric contraction (MVIC) values that were obtained each week using a Sundoo SN Analogue Force Gauge (model number SN-500) at 90° elbow flexion. Subjects began each workout using 70% of that week’s MVIC recording. If the full number of target repetitions (i.e., 12) was not achieved on a given set, the load was lowered to the next level of load until completion—e.g., if a participant achieved 8 repetitions at 70%, the load was decreased to 50% so that all 12 repetitions could be performed. Loads were progressively lowered on successive sets to 50% and 30% of MVIC as needed so that subjects could complete the target repetition range with proper form. The dominant brachium of each subject served as the control for the study throughout the training and detraining periods. Subjects were instructed to refrain from exercise involving the elbow flexors, other than activities of daily living, throughout the 16-week study period.

Statistical analysis

Several statistical analyses were carried out to investigate how methods of both measurement and analysis may affect the conclusions drawn from a study investigating the relationship between strength and hypertrophy. All analyses were carried out in R (version 3.4.3) (R Core Development Team, 2017). First, standard bivariate linear regression analyses of pre- and post-measures were utilized to investigate the relationship between muscle size (thickness or CSA) and strength, using a between-subject model. This was done for two different conditions: training and detraining. For each condition, a data point (Δ_size, Δ_strength) was calculated for each participant, where, in the general case, Δ = post − pre, where pre and post are the values before and after a given condition (training or detraining), respectively, as has been done in a number of previous investigations (Ahtiainen et al., 2016; Erskine, Fletcher & Folland, 2014; Loenneke et al., 2017). Second, an ANCOVA was utilized to replicate the method of analysis used by Loenneke et al. (2017). In this analysis, strength was treated as a dependent variable, participants were treated as a categorical factor (dummy-coded), and size was treated as a covariate. Variance accounted for (VAF) was calculated using the formula $VAF = \frac{S S_{size}}{S S_{size} + S S_{residual}}$ , where SS is type III sum of squares (Bland & Altman, 1995a). This is equivalent to a partial η² for the size covariate. Lastly, because the ANCOVA method has a number of assumptions and does not allow for varying slopes, a more robust hierarchical linear model was used for the final analysis (Quené & Van den Bergh, 2004). In this analysis, the outcome measure (y_ij) was the net joint moment during MVIC, and muscle size was used as a level-one predictor variable (x_ij), which were group-mean centered for analyses. Subject was treated as a level-two variable. Finally, varied slopes and intercepts were permitted, creating the final model: $L e v e l 1$ $y_{i j} = β_{0 j} + β_{1 j} x_{i j} + ϵ_{i j}$ $L e v e l 2$ $β_{0 j} = γ_{00} + r_{0 j} β_{1 j} = γ_{10} + r_{1 j}$

The model was fit using restricted maximum likelihood in the lme4 package (Bates et al., 2015). Sample variance of the residuals (s ²) were used to calculate VAF (or R²) using the following formula: $VAF = 1 - \frac{s^{2}}{s_{uncond}^{2}}$ , where $s_{uncond}^{2}$ is the sample variance of the residuals in the unconditional model, which contained only varied intercepts and no fixed effects (i.e., the same model, but with β_1j = 0). This approach is mathematically equivalent to the VAF found for the ANCOVA using type III sums of squares (see Appendix A). Intraclass correlation coefficients (ICC) were calculated on the unconditional models to estimate the proportion of original variance explained by subject. To estimate 95% confidence intervals (CI) of the VAFs, each model was bootstrapped 2,000 times with replacement. The 0.025 and 0.975 quantiles of the VAF estimates were calculated as the lower and upper bounds of each estimate’s 95% CI.

To understand how the different measures of hypertrophy relate to one another, within- and between-subject correlation matrices were constructed using the different thickness measures and CSA. The between-subject analysis included all thickness and CSA measures, across all subjects, for any time point at which both CSA and thickness were measured. The within-subject correlation matrix was constructed in a similar manner: (1) a correlation coefficient was calculated for each participant (r_i); (2) using a Fisher z-transformation, r_i was transformed to a z-score (z_i); (3) a weighted average was obtained using the number of points (n_i) from each participant ( $\bar{z} = \frac{\sum z_{i} (n_{i} - 3)}{\sum (n_{i} - 3)}$ , for i participants); and (4) $\bar{z}$ was transformed back to Pearson’s r (Borenstein et al., 2009; Cooper, Hedges & Valentine, 2009; Corey, Dunlap & Burke, 1998; Hedges & Olkin, 1985). Because CSA measures were only taken with thickness at two time points, within-subject correlation coefficients could not be estimated between CSA and muscle thickness.

Further exploratory analyses were performed to investigate if those with stronger strength-hypertrophy relationships also got stronger. To do this, Pearson correlation coefficients were calculated for each individual across the entire study (i.e., including both training and detraining periods). The resulting correlation coefficients were then correlated with Δ_strength from the training period using Spearman’s rank-order correlations (ρ). Spearman’s ρ was used due to the heteroscedastic nature of the residuals. Qualitative interpretations of correlation coefficients and VAFs can be found in Table 1, which are in accordance with Hopkins (2002). R code for all procedures can be found in the Supplemental Files.

Table 1:

Correlation coefficient and variance accounted for interpretations.

Interpretation	Correlation coefficient (r or ρ)	Variance accounted for (%)
Trivial	[0, 0.1)	[0, 1)
Small	[0.1, 0.3)	[1, 9)
Moderate	[0.3, 0.5)	[9, 25)
Large/strong	[0.5, 0.7)	[25, 49)
Very large/strong	[0.7, 0.9)	[49, 81)
Nearly perfect	[0.9, 1)	[81, 100)
Perfect	1	100

DOI: 10.7717/peerj.5071/table-1

Notes:

Adapted from Hopkins (2002). Note that all intervals are of the form x_low ≤ x_o < x_high.

Results

Differences in VAFs ranged from zero to an order of magnitude (Table 2). Similar differences were also observed between different statistical models for a given measure (Table 2). Intraclass correlation coefficients from the hierarchical linear models suggest that most of the original variance could be accounted for by including a level for subject (ICC = 0.89–0.91). Heterogeneity in correlation coefficients was observed when comparing different measures of muscle thickness, which ranged from r = 0.503 to r = 0.945 for between-subject correlations and from r = 0.326 to r = 0.875 for weighted within-subject correlations (Table 3). Finally, Pearson’s r of each individual’s strength-hypertrophy relationship was a moderate to strong predictor of strength for all measurements (US_30% ρ = 0.644; US_50% ρ = 0.356; US_70% ρ = 0.413; US_avg ρ = 0.480; CSA ρ = 0.449).

Table 2:

Percent (%) variance accounted for (95% CI) using different types of models.

Measure	Between-subjects		Within-subjects
	Training	Detraining	ANCOVA	HLM
Thickness (30%)	3.6 (0–61.9)	1.0 (0–45.1)	0.2 (0–6.1)	7.4 (0.8–16.0)
Thickness (50%)	0.8 (0–21.6)	0.0 (0–23.7)	0.3 (0–9.7)	24.1 (6.7–42.0)
Thickness (70%)	1.4 (0–39.1)	1.6 (0–38.0)	2.2 (0–10.9)	7.5 (2.1–23.7)
Thickness (Average)	0.4 (0–21.1)	0.0 (0–26.4)	1.2 (0–12.9)	18.1 (6.6–30.4)
Cross-sectional area	0.4 (0–32.2)	1.2 (0–35.4)	11.7 (1.1–34.2)	12.1 (2.0–69.5)

DOI: 10.7717/peerj.5071/table-2

Notes:

30%, 50%, and 70% represent the position of the ultrasound probe on the brachium. Average represents the average of all three of the measured thicknesses at a given time point. Cross-sectional area was measured at 50%.

ANCOVA: analysis of covariance
HLM: hierarchical linear model

Table 3:

Correlation matrix of measures of muscle size.

	Thickness (30%)	Thickness (50%)	Thickness (70%)	Thickness (Average)	Cross-sectional area
Thickness (30%)		0.503^a	0.618^a	0.778^a	0.557^a
Thickness (50%)	0.344^b		0.869^a	0.916^a	0.742^a
Thickness (70%)	0.326^b	0.687^b		0.945^a	0.730^a
Thickness (Average)	0.659^b	0.875^b	0.871^b		0.773^a
Cross-sectional area

DOI: 10.7717/peerj.5071/table-3

Notes:

aBetween-subject correlation.

bWeighted within-subject correlation.

Discussion

To the authors’ knowledge, this is the first study to investigate the relationship between hypertrophy and changes in muscle strength using hierarchical linear modeling, which allows for robust within-individual analysis, in addition to the use of multiple types of measures of muscle size. Our results demonstrate that not only does measurement approach substantially affect outcomes, but so does the type of statistical model employed. These findings have important methodological implications for improving our understanding of the associative relationship between hypertrophy and changes in strength.

Previous literature has approached the question of how changes in muscle size relate to changes in strength from a between-subject perspective. However, it can be argued that a repeated-measures design allows for a more direct evaluation of the strength-hypertrophy relationship. Individual differences in muscle moment arms (MA), normalized muscle force (NMF), pennation angles (θ_p), voluntary activation (α), et cetera will greatly confound the relative relationship between changes in strength and muscle size (in this case, physiological CSA(PCSA)). All of the aforementioned components are multipliers in the formula used to calculate a muscle’s contribution to a joint moment (M = α⋅PCSA⋅NMF⋅cosθ_p⋅MA) (Vigotsky, Contreras & Beardsley, 2015). To date, only one previous investigation has utilized a quantitative within-subject approach to investigate the relationship between hypertrophy and changes in strength (Loenneke et al., 2017); although, qualitative within-subject changes are depicted in a classic study by DeLorme (1945). Specifically, Loenneke et al. (2017) employed an ANCOVA with subject as a factor and muscle size as a covariate; from the resulting sum of squares, VAF could be calculated (Bland & Altman, 1995a). ANCOVA is limited, however, in that it, in its basic form, assumes parallelism between all relationships, has several assumptions that may confound results (e.g., sphericity, compound symmetry, and homoscedasticity), and is not robust to missing data points (Bland & Altman, 1995a; Bland & Altman, 1995b; Quené & Van den Bergh, 2004). The parallel or affine assumption is of particular interest because there are several heterogeneities that confound this assumption (i.e., α, MA, NMF, and θ_p). Repeated-measures hierarchical models are a robust way to investigate longitudinal relationships within a group or person (Gelman & Hill, 2007; Raudenbush & Bryk, 2002). By comparing these statistical models, a clear difference is apparent (Table 2). For all measurements, the hierarchical linear model resulted in greater VAFs than the ANCOVA (Table 2). These differences may be due to the hierarchical linear model allowing for varying slopes or, alternatively, some of the inherent assumptions and limitations of ANCOVAs (Quené & Van den Bergh, 2004). Interestingly, the VAFs found in this present study are much lower than those found by Loenneke et al. (2017). It is unclear from where these differences arise; that is, if they are due to measurement technique, differences in mechanisms of strength gain, differences in upper vs. lower extremities, or some other factor. However, our data provide a methodological proof of principle by delineating how different statistical models may drastically affect the conclusions formed from a given dataset, even when performed on the same set of regressors. Due to the robustness of hierarchical linear models, it is recommended that such analyses are used over ANCOVAs for future investigations with similar methods.

How muscle size is assessed will likely affect the strength of the relationship between changes in muscle size and strength. The measurement techniques utilized by previous and present investigations (Ahtiainen et al., 2016; Appleby, Newton & Cormie, 2012; Baker, Wilson & Carlyon, 1994; Balshaw et al., 2017; Buckner et al., 2016a; Cribb et al., 2007; Erskine, Fletcher & Folland, 2014; Loenneke et al., 2017; Pope et al., 2016) have been limited in that they do not account for changes in architectural characteristics (Lieber & Ward, 2011). There are several ways to measure muscle size, including limb circumference (DeLorme, 1945), estimates of total and segmental muscle mass (dual-energy X-ray absorptiometry and bioelectrical impedance analysis) (Karelis et al., 2013), muscle thickness (Than et al., 2016), anatomical CSA (Erskine, Fletcher & Folland, 2014; Trezise, Collier & Blazevich, 2016), muscle volume (Balshaw et al., 2017; Erskine, Fletcher & Folland, 2014; Erskine et al., 2010), and PCSA (Erskine et al., 2010). There are strong physiological and mechanical rationales with basic science evidence to suggest that not all of these measures are equal, even when accounting for measurement error (Lieber & Ward, 2011; Powell et al., 1984). For example, although muscle volume appears to be a strong predictor of strength in some contexts (even greater than anatomical CSA) (Akagi et al., 2009; Fukunaga et al., 2001), it does not perform as well in others (Baxter & Piazza, 2014), perhaps at least partly due to inter- and intra-muscular variation in architecture (Blazevich, Gill & Zhou, 2006; Lieber & Ward, 2011; Ward et al., 2009) and adaptation (Earp et al., 2015; Ema et al., 2013; Franchi et al., 2017; Narici et al., 1996; Wakahara et al., 2013; Wakahara et al., 2012). Muscle volume is not only sensitive to changes in sarcomeres in parallel (PCSA), but also sarcomeres in series (fiber length). Sarcomeres in parallel will contribute to the magnitude of force production, while sarcomeres in series will affect the shapes of the force-length and force-velocity curves. Functionally speaking, not all muscle volume is equal (Lieber & Ward, 2011). Importantly, in series hypertrophy appears to be limited to the initial weeks of commencing resistance training, further reinforcing potential issues when extrapolating correlative findings from novice to trained individuals (Blazevich et al., 2007). Similarly, thickness and anatomical CSA, as measured in this study, are also limited, as they only represent one part of the muscle and do not account for the intricacies of muscle architecture. This is further evidenced by Franchi et al. (2017), who found that, cross-sectionally, muscle thickness, anatomical CSA, and muscle volume are related, but the relative changes between muscle thickness and muscle volume did not strongly correlate following a training period. This is important when considering the formula for PCSA, in that the volume of the entire muscle must be taken into account (Lieber & Ward, 2011); not just thickness or anatomical CSA. Moreover, the variability in correlation coefficients between these measures may be a cause for concern (Table 3), in that it suggests not all measures of muscle size are necessarily capturing the same effects, which is elucidated further by the statistical models (Table 2).

Since PCSA has been shown to be a strong predictor of force production both in vivo (Fukunaga et al., 1996) and in vitro (Powell et al., 1984), it is considered the gold standard for relating muscle form (architecture) to function (force production) (Lieber & Ward, 2011). PCSA is, in essence, the “effective” CSA, as it is the average CSA perpendicular to the fibers’ line of action. Thus, PCSA controls for pennation and is representative of the number of sarcomeres in parallel, making it highly indicative of a muscle’s potential to generate force through the tendon (Lieber & Ward, 2011). It is imperative to consider these differences in measurement techniques in the context of this study and similar investigations (Ahtiainen et al., 2016; Erskine, Fletcher & Folland, 2014; Erskine et al., 2010; Loenneke et al., 2017). Although this study (Table 2) and others (Loenneke et al., 2017) have observed what is analogous to a strong correlation (r ≥ 0.5) (Hopkins, 2002) with repeated-measures designs, substandard measurements of muscle size were used in the present study. Therefore, it is likely that PCSA measurements would produce different results (Aagaard et al., 2001). While PCSA is expensive to obtain and typically relies on MRI, newer technologies, such as 3D ultrasound, show promise as valid, affordable alternatives to MRI for estimating muscle volume and PCSA (Barber, Barrett & Lichtwark, 2009; Barber et al., 2011; Haberfehlner et al., 2016). Moving forward, it seems prudent that investigators utilize PCSA rather than other measures of muscle size, as the theory that hypertrophy leads to strength gains is predicated on this measure rather than other measures of muscle size.

The question of how changes in strength and changes in muscle size are related is one with broad clinical implications, ranging from the treatment and prevention of sarcopenia and dynapenia to exercise prescription for strength athletes. Clinically, if changes in muscle size are not important for strength, then exercise programs need not focus on variables that are more important for hypertrophy than strength, such as volume (Ralston et al., 2017; Schoenfeld, Ogborn & Krieger, 2017). Changes in strength do indeed arise from non-hypertrophic factors (Folland & Williams, 2007), including a myriad of neural adaptations (Enoka, 1988), in addition to changes in muscle moment arms (Sugisaki et al., 2015; Vigotsky, Contreras & Beardsley, 2015) and normalized muscle force production (Erskine et al., 2010), in which lateral force transmission has been suggested to play a role (Jones, Rutherford & Parker, 1989). This implies that changes in strength are interactive rather than linear. As such, how this relationship is investigated and modeled should reflect such complexities. First, with more reductionist strength testing (i.e., single-joint isometric testing), it can be argued that the “skill” component of strength is less relevant (as opposed to one-repetition maximum tests (Buckner et al., 2016b)), since little coordination is necessary and even untrained individuals see little-to-no changes in voluntary activation and co-contraction (Behm, 1995; Erskine, Fletcher & Folland, 2014; Erskine et al., 2010; Noorkoiv, Nosaka & Blazevich, 2014). Moreover, neural measures, such as voluntary activation, can be more accurately assessed during isometric efforts than during dynamic efforts (Farina, 2006; Vigotsky et al., 2017) and thus can more easily be incorporated into a final model. Second, measures of muscle size should reflect those in the model (i.e., using PCSA). While this is expensive and time consuming, it will provide more appropriate biomechanical insight (Lieber & Ward, 2011). Third, moment arm measures should be subject-specific and occur over the duration of an experiment, as moment arms may change with training (Sugisaki et al., 2015; Vigotsky, Contreras & Beardsley, 2015). Finally, longer duration studies may be more appropriate for several reasons: (1) individual response trajectories will vary, as evidenced by the high ICCs in this present investigation and the heterogeneous rank orders between time points in previous work (Churchward-Venne et al., 2015); (2) edema can greatly confound gross imaging measures of muscle size, depending on when the measurements are performed (Damas et al., 2016); (3) the magnitude of the difference between measurement points will be greater, which in turn will decrease the relative role of measurement error in parameter and VAF estimates (Fuller, 1987); and (4) to understand the extent to which contributions may or may not change over time. While this present study did not incorporate these recommendations, since it was based on previously collected data (Than et al., 2016), future studies should do so to properly isolate the associative contribution of muscle size (PCSA) to strength increases.

Thus far, our discussion has primarily focused on the associative, rather than causal, relationship between hypertrophy and strength gain. A conducive discussion of the causal nature of this relationship requires an operational definition of causality. In formal logic, causality is often broken down into two conditions: (1) necessary conditions, which state that B will not occur without A (“if not A, then not B”); and (2) sufficient conditions, which state that A will result in B (“if A, then B”) (Epp, 2011; Hall, 1987). However, a less formal concept of causality is also possible without these conditions having been met, in the form of contributory causality. A contributory cause is neither necessary nor sufficient (Hall, 1987; Riegelman, 1979). Those who experience an effect need not experience its putative cause, and those who experience the putative cause need not experience its effect (Riegelman, 1979). For instance, although smoking causes lung cancer, not all of those who smoke develop lung cancer (i.e., it is not sufficient), and not all of those who develop lung cancer are smokers (i.e., it is not necessary); therefore, smoking may be viewed as a contributory cause of lung cancer (Riegelman, 1979). The arguments put forth by Buckner et al. (2016a), Dankel et al. (2018) and Mattocks et al. (2017) do indeed rule out hypertrophy as being a necessary or sufficient cause for strength gain, but we suggest that the contributory nature of hypertrophy to strength should not be dismissed on this basis. In other words, changes in strength can occur without changes in muscle size and vice versa, but this does not preclude muscle size from contributing to strength. Experimentally, it is important to consider the emergent, nonlinear, and interactive properties of strength; there are many moving parts that should be accounted for when attempting to understand such a complex system, which may concurrently change in different directions (e.g., increase in size but decrease agonist activation). Indeed, a systems rather than reductionist approach may be most appropriate for understanding strength emergence. In studying this system, it is necessary to measure all factors (confounders) that may contribute to strength to truly understand the role of hypertrophy, especially because different protocols may elicit differential adaptations (Jenkins et al., 2017). Thus, longitudinal, within-subject studies that incorporate all of the measures included in the formula to determine strength (PCSA, MA, activation and co-contraction, synergist characteristics, and NMF) are likely needed to better understand the emergent properties of strength. Finally, because the problem is so complex, the contributory role of hypertrophy in strength gain may not be able to be fully established from one study or line of evidence. Instead, a body of literature consisting of many forms of evidence—ranging from animal and agent-based models to observational and experimental human studies—may be required to elucidate the contributory role of hypertrophy in strength gain.

This study and its discussion have focused primarily on single muscle group hypertrophy and single-joint isometric strength gain. The larger question of multi-joint and dynamic strength gain is perhaps more relevant, but unfortunately much more complex (Vigotsky et al., 2018). Starting with relatively simpler systems and research questions may bear more fruit, while also providing a conceptual basis that can be used when studying more complex systems and research questions.

This is the first study to utilize repeated-measures hierarchical linear modeling to investigate the relationship between muscle size and strength. We herein demonstrate that repeated-measures hierarchical linear models produce different results than other within-subject models (ANCOVA), in addition to between-subject models, which is in line with previous work by Loenneke et al. (2017). Moreover, it was found that different measures of muscle size can produce vastly different results. As such, we have advocated for more rigorous and reductionist experimental designs to better understand the mechanistic origins of single-joint strength following exercise programs, by suggesting that researchers measure PCSA and single-joint isometric strength, in addition to potential confounding variables.¹ These findings are important for the interpretation of previous studies, in addition to the design of future studies, on this same topic.

Conclusions

The strength of the associational relationship between muscle hypertrophy and strength gain is highly dependent upon the statistical model employed. We have demonstrated that hierarchical linear modeling, which allows for varying slopes and intercepts, provides greater estimates of the strength of the relationship between muscle hypertrophy and strength gain. Moreover, different assessments of muscle size do not perfectly correlate, and therefore, different methods of assessment may lead to different conclusions. These findings should be taken into consideration when planning and interpreting studies on the relationship between muscle hypertrophy and strength gain.

Supplemental Information

Note that these recommendations only apply to studies that are investigating the strength-hypertrophy relationship with a reductionist approach. We are in no way suggesting that PCSA and single-joint isometric measures be used for all resistance training studies.

[1] Aagaard P, Andersen JL, Dyhre-Poulsen P, Leffers AM, Wagner A, Magnusson SP, Halkjaer-Kristensen J, Simonsen EB. 2001. A mechanism for increased contractile strength of human pennate muscle in response to strength training: changes in muscle architecture. The Journal of Physiology 534:613-623

[2] Ahtiainen JP, Walker S, Peltonen H, Holviala J, Sillanpaa E, Karavirta L, Sallinen J, Mikkola J, Valkeinen H, Mero A, Hulmi JJ, Hakkinen K. 2016. Heterogeneity in resistance training-induced muscle strength and mass responses in men and women of different ages. Age 38(1):10

[3] Akagi R, Takai Y, Ohta M, Kanehisa H, Kawakami Y, Fukunaga T. 2009. Muscle volume compared to cross-sectional area is more appropriate for evaluating muscle strength in young and elderly individuals. Age and Ageing 38:564-569

[4] Appleby B, Newton RU, Cormie P. 2012. Changes in strength over a 2-year period in professional rugby union players. Journal of Strength and Conditioning Research 26:2538-2546

[5] Baker D, Wilson G, Carlyon R. 1994. Periodization: the effect on strength of manipulating volume and intensity. Journal of Strength and Conditioning Research 8:235-242

[6] Balshaw TG, Massey GJ, Maden-Wilkinson TM, Morales-Artacho AJ, McKeown A, Appleby CL, Folland JP. 2017. Changes in agonist neural drive, hypertrophy and pre-training strength all contribute to the individual strength gains after resistance training. European Journal of Applied Physiology 117:631-640

[7] Barber L, Barrett R, Lichtwark G. 2009. Validation of a freehand 3D ultrasound system for morphological measures of the medial gastrocnemius muscle. Journal of Biomechanics 42:1313-1319

[8] Barber L, Hastings-Ison T, Baker R, Barrett R, Lichtwark G. 2011. Medial gastrocnemius muscle volume and fascicle length in children aged 2 to 5 years with cerebral palsy. Developmental Medicine and Child Neurology 53:543-548

[9] Bates D, Mächler M, Bolker B, Walker S. 2015. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67:1-48

[10] Baxter JR, Piazza SJ. 2014. Plantar flexor moment arm and muscle volume predict torque-generating capacity in young men. Journal of Applied Physiology 116:538-544

[11] Behm DG. 1995. Neuromuscular implications and applications of resistance training. Journal of Strength and Conditioning Research 9:264-274

[12] Bland JM, Altman DG. 1995a. Calculating correlation coefficients with repeated observations: part 1–correlation within subjects. BMJ 310:446

[13] Bland JM, Altman DG. 1995b. Calculating correlation coefficients with repeated observations: part 2–correlation between subjects. BMJ 310:633

[14] Blazevich AJ, Cannavan D, Coleman DR, Horne S. 2007. Influence of concentric and eccentric resistance training on architectural adaptation in human quadriceps muscles. Journal of Applied Physiology 103:1565-1575

[15] Blazevich AJ, Gill ND, Zhou S. 2006. Intra- and intermuscular variation in human quadriceps femoris architecture assessed in vivo. Journal of Anatomy 209:289-310

[16] Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. 2009. Introduction to meta-analysis. Chichester: John Wiley & Sons.

[17] Buckner SL, Dankel SJ, Mattocks KT, Jessee MB, Mouser JG, Counts BR, Loenneke JP. 2016a. The problem Of muscle hypertrophy: revisited. Muscle and Nerve 54:1012-1014

[18] Buckner SL, Jessee MB, Mattocks KT, Mouser JG, Counts BR, Dankel SJ, Loenneke JP. 2016b. Determining strength: a case for multiple methods of measurement. Sports Medicine 47(2):193-195

[19] Churchward-Venne TA, Tieland M, Verdijk LB, Leenders M, Dirks ML, De Groot LC, Van Loon LJ. 2015. There are no nonresponders to resistance-type exercise training in older men and women. Journal of the American Medical Directors Association 16:400-411

[20] Cooper HM, Hedges LV, Valentine JC. 2009. The handbook of research synthesis and meta-analysis. New York: Russell Sage Foundation.

[21] Corey DM, Dunlap WP, Burke MJ. 1998. Averaging correlations: expected values and bias in combined pearson rs and fisher’s z transformations. The Journal of General Psychology 125:245-261

[22] Cribb PJ, Williams AD, Stathis CG, Carey MF, Hayes A. 2007. Effects of whey isolate, creatine, and resistance training on muscle hypertrophy. Medicine and Science in Sports and Exercise 39:298-307

[23] Damas F, Phillips SM, Lixandrao ME, Vechin FC, Libardi CA, Roschel H, Tricoli V, Ugrinowitsch C. 2016. Early resistance training-induced increases in muscle cross-sectional area are concomitant with edema-induced muscle swelling. European Journal of Applied Physiology 116:49-56

[24] Dankel SJ, Buckner SL, Jessee MB, Grant Mouser J, Mattocks KT, Abe T, Loenneke JP. 2018. Correlations do not show cause and effect: not even for changes in muscle size and strength. Sports Medicine 48:1-6

[25] DeLorme TL. 1945. Restoration of muscle power by heavy-resistance exercises. JBJS 27:645-667

[26] Earp JE, Newton RU, Cormie P, Blazevich AJ. 2015. Inhomogeneous quadriceps femoris hypertrophy in response to strength and power training. Medicine and Science in Sports and Exercise 47:2389-2397

[27] Ema R, Wakahara T, Miyamoto N, Kanehisa H, Kawakami Y. 2013. Inhomogeneous architectural changes of the quadriceps femoris induced by resistance training. European Journal of Applied Physiology 113:2691-2703

[28] Enoka RM. 1988. Muscle strength and its development. New perspectives. Sports Medicine 6:146-168

[29] Epp SS. 2011. Discrete mathematics with applications. Boston: Brooks/Cole, Cengage Learning.

[30] Erskine RM, Fletcher G, Folland JP. 2014. The contribution of muscle hypertrophy to strength changes following resistance training. European Journal of Applied Physiology 114:1239-1249

[31] Erskine RM, Jones DA, Williams AG, Stewart CE, Degens H. 2010. Inter-individual variability in the adaptation of human muscle specific tension to progressive resistance training. European Journal of Applied Physiology 110:1117-1125

[32] Farina D. 2006. Interpretation of the surface electromyogram in dynamic contractions. Exercise and Sport Sciences Reviews 34:121-127

[33] Fluck M, Hoppeler H. 2003. Molecular basis of skeletal muscle plasticity–from gene to form and function. Reviews of Physiology Biochemistry and Pharmacology 146:159-216

[34] Folland JP, Williams AG. 2007. The adaptations to strength training: morphological and neurological contributions to increased strength. Sports Medicine 37:145-168

[35] Franchi MV, Longo S, Mallinson J, Quinlan JI, Taylor T, Greenhaff PL, Narici MV. 2017. Muscle thickness correlates to muscle cross sectional area in the assessment of strength training induced hypertrophy. Scandinavian Journal of Medicine and Science in Sports Epub ahead of print

[36] Fukunaga T, Miyatani M, Tachi M, Kouzaki M, Kawakami Y, Kanehisa H. 2001. Muscle volume is a major determinant of joint torque in humans. Acta Physiologica Scandinavica 172:249-255

[37] Fukunaga T, Roy RR, Shellock FG, Hodgson JA, Edgerton VR. 1996. Specific tension of human plantar flexors and dorsiflexors. Journal of Applied Physiology 80:158-165

[38] Fuller WA. 1987. Measurement error models. New York: Wiley.

[39] Gelman A, Hill J. 2007. Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.

[40] Goldstein H. 2011. Multilevel statistical models. Chichester: Wiley.

[41] Haberfehlner H, Maas H, Harlaar J, Becher JG, Buizer AI, Jaspers RT. 2016. Freehand three-dimensional ultrasound to assess semitendinosus muscle morphology. Journal of Anatomy 229:591-599

[42] Hall W. 1987. A simplified logic of causal inference. Australian and New Zealand Journal of Psychiatry 21:507-513

[43] Hedges LV, Olkin I. 1985. Statistical methods for meta-analysis. Orlando: Academic Press.

[44] Hopkins WG. 2002. A scale of magnitudes for effect statistics. A new view of statistics, vol. 502.

[45] Jackson C, Best N, Richardson S. 2006. Improving ecological inference using individual-level data. Statistics in Medicine 25:2136-2159

[46] Jenkins ND, Miller JM, Buckner SL, Cochrane KC, Bergstrom HC, Hill EC, Smith CM, Housh TJ, Cramer JT. 2015. Test-retest reliability of single transverse versus panoramic ultrasound imaging for muscle size and echo intensity of the biceps brachii. Ultrasound in Medicine and Biology 41:1584-1591

[47] Jenkins NDM, Miramonti AA, Hill EC, Smith CM, Cochrane-Snyman KC, Housh TJ, Cramer JT. 2017. Greater neural adaptations following high- vs. low-load resistance training. Frontiers in Physiology 8 Article 331

[48] Jones DA, Rutherford OM, Parker DF. 1989. Physiological changes in skeletal muscle as a result of strength training. Quarterly Journal of Experimental Physiology 74:233-256

[49] Karelis AD, Chamberland G, Aubertin-Leheudre M, Duval C, Ecological mobility in Aging and Parkinson (EMAP) Group. 2013. Validation of a portable bioelectrical impedance analyzer for the assessment of body composition. Applied Physiology, Nutrition, and Metabolism Physiologie Appliquée, Nutrition et Métabolisme 38:27-32

[50] Koo TK, Li MY. 2016. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine 15:155-163

[51] Lieber RL, Ward SR. 2011. Skeletal muscle design to meet functional demands. Philosophical Transactions of the Royal Society B: Biological Sciences 366:1466-1476

[52] Loenneke JP, Rossow LM, Fahs CA, Thiebaud RS, Grant Mouser J, Bemben MG. 2017. Time-course of muscle growth, and its relationship with muscle strength in both young and older women. Geriatrics & Gerontology International 17(11):2000-2007

[53] Maeo S, Shan X, Otsuka S, Kanehisa H, Kawakami Y. 2018. Neuromuscular adaptations to work-matched maximal eccentric vs concentric training. Medicine and Science in Sports and Exercise Epub ahead of print

[54] Mattocks KT, Buckner SL, Jessee MB, Dankel SJ, Mouser JG, Loenneke JP. 2017. Practicing the test produces strength equivalent to higher volume training. Medicine and Science in Sports and Exercise 49(9):1945-1954

[55] Maughan RJ, Nimmo MA. 1984. The influence of variations in muscle fibre composition on muscle strength and cross-sectional area in untrained males. Journal of Physiology 351:299-311

[56] Maughan RJ, Watson JS, Weir J. 1984. Muscle strength and cross-sectional area in man: a comparison of strength-trained and untrained subjects. British Journal of Sports Medicine 18:149-157

[57] Narici MV, Hoppeler H, Kayser B, Landoni L, Claassen H, Gavardi C, Conti M, Cerretelli P. 1996. Human quadriceps cross-sectional area, torque and neural activation during 6 months strength training. Acta Physiologica Scandinavica 157:175-186

[58] Noorkoiv M, Nosaka K, Blazevich AJ. 2014. Neuromuscular adaptations associated with knee joint angle-specific force change. Medicine and Science in Sports and Exercise 46:1525-1537

[59] Pope ZK, Hester GM, Benik FM, DeFreitas JM. 2016. Action potential amplitude as a noninvasive indicator of motor unit-specific hypertrophy. Journal of Neurophysiology 115:2608-2614

[60] Powell PL, Roy RR, Kanim P, Bello MA, Edgerton VR. 1984. Predictability of skeletal muscle tension from architectural determinations in guinea pig hindlimbs. Journal of Applied Physiology: Respiratory, Environmental and Exercise Physiology 57:1715-1721

[61] Quené H, Van den Bergh H. 2004. On multi-level modeling of data from repeated measures designs: a tutorial. Speech Communication 43:103-121

[62] R Core Development Team. 2017. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. software

[63] Ralston GW, Kilgore L, Wyatt FB, Baker JS. 2017. The effect of weekly set volume on strength gain: a meta-analysis. Sports Medicine 47(12):2585-2601

[64] Rasch PJ, Morehouse LE. 1957. Effect of static and dynamic exercises on muscular strength and hypertrophy. Journal of Applied Physiology 11:29-34

[65] Raudenbush SW, Bryk AS. 2002. Hierarchical linear models: applications and data analysis methods. Thousand Oaks: Sage Publications.

[66] Riegelman R. 1979. Contributory cause: unnecessary and insufficient. Postgraduate Medicine 66:177-179

[67] Robinson WS. 1950. Ecological correlations and the behavior of individuals. American Sociological Review 15:351-357

[68] Schantz P, Randall-Fox E, Hutchison W, Tyden A, Astrand PO. 1983. Muscle fibre type distribution, muscle cross-sectional area and maximal voluntary strength in humans. Acta Physiologica Scandinavica 117:219-226

[69] Schoenfeld BJ, Ogborn D, Krieger JW. 2017. Dose-response relationship between weekly resistance training volume and increases in muscle mass: a systematic review and meta-analysis. Journal of Sports Sciences 35:1073-1082

[70] Sugisaki N, Wakahara T, Murata K, Miyamoto N, Kawakami Y, Kanehisa H, Fukunaga T. 2015. Influence of muscle hypertrophy on the moment arm of the triceps brachii muscle. Journal of Applied Biomechanics 31:111-116

[71] Than C, Tosovic D, Seidl L, Mark Brown J. 2016. The effect of exercise hypertrophy and disuse atrophy on muscle contractile properties: a mechanomyographic analysis. European Journal of Applied Physiology 116:2155-2165

[72] Trezise J, Collier N, Blazevich AJ. 2016. Anatomical and neuromuscular variables strongly predict maximum knee extension torque in healthy men. European Journal of Applied Physiology 116:1159-1177

[73] Vigotsky AD, Bryanton MA, Nuckols G, Beardsley C, Contreras B, Evans J, Schoenfeld BJ. 2018. Biomechanical, anthropometric, and psychological determinants of barbell back squat strength. Journal of Strength and Conditioning Research Epub ahead of print

[74] Vigotsky AD, Contreras B, Beardsley C. 2015. Biomechanical implications of skeletal muscle hypertrophy and atrophy: a musculoskeletal model. PeerJ 3:e1462

[75] Vigotsky AD, Halperin I, Lehman GJ, Trajano GS, Vieira TM. 2017. Interpreting signal amplitudes in surface electromyography studies in sport and rehabilitation sciences. Frontiers in Physiology 8:985