Self-reports vs. physical measures of spinal stiffness

Background Objectively measured reduction in lumbar posterior-to-anterior (PA) stiffness is associated with pain relief in some, but not all persons with low back pain. Unfortunately, these measurements can be time consuming to perform. In comparison, the Lumbar Spine Instability Questionnaire (LSIQ) is intended to measure spinal instability and the Lumbar Spine Disability Index (LSDI) is created for self-reporting functional disability due to increased spinal stiffness. Given the above, the aim of this study is to compare measures of the LSIQ and LSDI with objective measures of lumbar PA stiffness as measured by a mechanical device, Vertetrack (VT), in patients with persistent non-specific low back pain (nsLBP). Methods Twenty-nine patients with nsLBP completed the LSIQ and LSDI at baseline and after two weeks. On these same occasions, PA spinal stiffness was measured using the VT. Between measurements, patients received four sessions of spinal manipulation. The resulting data was analyzed to determine the correlation between the self-report and objective measures of stiffness at both time points. Further, the patients were categorized into responders and non-responders based on pre-established cut points depending on values from the VT and compared those to self-report measures in order to determine whether the LSIQ and the LSDI were sensitive to change. Results Twenty-nine participants completed the study. Measures from the LSIQ and LSDI correlated poorly with objectively measured lumbar PA stiffness at baseline and also with the change scores. The change in objectively measured lumbar PA stiffness following spinal manipulation did not differ between those who improved, and those who did not improve according to the pre-specified cut-points. Finally, a reduction in lumbar PA stiffness following intervention was not associated with improvement in LSIQ and LSDI outcomes. Conclusions The current data indicate that the LSIQ and LSDI questionnaires do not correlate with measures obtained objectively by VT. Our results suggest that these objective and self- reported measures represent different domains and as such, cannot stand in place of one another.


INTRODUCTION
Low back pain (LBP) is the primary cause of years lived with disability globally (Haldeman et al., 2012;Collaborators GB of of DS 2013, 2015. No specific nociceptive source can be identified in the majority of these cases, and they are therefore classified as nonspecific (ns) (Hartvigsen et al., 2018). Theoretically, classifying patients with nsLBP into subgroups based on clinical characteristics may generate better treatment outcomes (Wong & Kawchuk, 2016;Flynn et al., 2002). Clinical assessment of segmental spinal stiffness is one way of subdividing patients, which is often used by practitioners of spinal manipulation (SM) to decide where to apply treatment (Fritz, Whitman & Childs, 2005;Tuttle, 2009). However, manual assessment of segmental spinal stiffness has relatively poor intra-and interrater reliability, and therefore numerous devices have been developed for obtaining quantified measures of spinal stiffness, albeit mainly for research use, and the reported test-retest reliability of these devices is generally high (Wong & Kawchuk, 2016). One such device is the Vertetrack (VT) (Brown et al., 2017). The VT has produced reliable measurements quantifying the load-displacement values for within-session and betweensession assessments in asymptomatic patients (Hadizadeh, Kawchuk & Parent, 2019;Wong et al., 2013), and it has demonstrated a high level of accuracy in a recent validation study (Young, 2019).
Patients with nsLBP display greater average lumbar posterior-to-anterior (PA) stiffness as measured by mechanical devices than asymptomatic people (Kawchuk et al., 2015). Objectively measured reduction in lumbar PA stiffness is associated with pain relief in some, but not all persons with nsLBP (Wong & Kawchuk, 2016). A decrease in stiffness has not previously been shown to occur in an asymptomatic cohort following spine mobilisation (Allison et al., 2001).
Spinal manipulation (SM) has been shown to alter lumbar PA stiffness measures, and a reduction in stiffness is related to self-reported measures of disability (Stanton et al., 2017;Wong et al., 2015). Feeling stiff in the lower back is reported to be a predictor of disability (Thakral et al., 2014) and a primary target in interventions for many musculoskeletal conditions including LBP (Stanton et al., 2017). Further studies are needed to obtain better insight into how measures of spinal stiffness relate to clinical practice. Specifically, there are now objective measures of lumbar spinal stiffness available but they are time consuming to perform and it is not yet clear how such measures relate to a number of clinically relevant self-reported outcomes (Wong & Kawchuk, 2016). Two such potentially relevant self-reported outcome instruments are the Lumbar Spine Instability Questionnaire (LSIQ) and the Lumbar Stiffness Disability Index (LSDI).
Recently, the clinimetric properties of the LSIQ were assessed in a sample of patients with nsLBP (Saragiotto et al., 2018). The authors found that the LSIQ has acceptable test-retest reliability, but also concluded that it remains unclear whether the LSIQ measures clinical instability or some other construct in nsLBP. Furthermore, the LSIQ was also reported to show poor internal consistency and unclear construct validity. Additional clarification of the underlying concepts of the LSIQ has therefore been advocated. Still, the measure has been used in clinical studies. Individuals that report feeling more unstable (compared with those that feel less unstable)-and who have higher scores of the LSIQ-have reported better outcomes from a motor control exercise intervention, compared to those completing a graded activity program (Macedo et al., 2014). Therefore, while the LSIQ is intended to measure instability, it has not been validated against direct measures of instability therefore opening the possibility that it may indeed reflect other biomechanical measures such as stiffness.
The LSDI has demonstrated acceptable internal consistency, retest reliability, and external validity when assessed in a group of 32 adult lumbar arthrodesis patients (Hart et al., 2013a). Also, increased patient-reported difficulty in performing activities of daily living (ADL), as indicated by a higher LSDI score, is correlated strongly with decreased lumbar range of motion as measured on flexion-extension lateral radiographs (Hart et al., 2013a).
Given the above, the aims of this study were to examine: 1. How measures from the LSIQ, LSDI and the VT change following the provision of SM in patients with persistent nsLBP. 2. How baseline measures from the LSIQ and LSDI correlate to baseline values of lumbar PA stiffness as measured by the VT. 3. How changes in LSIQ and LSDI measures correlate to potential changes in VT measures following SM intervention. 4. If there was a difference in VT change scores following SM intervention between those who responded and those not responding according to LSIQ and LSDI measures following SM intervention. 5. If a reduction in lumbar PA stiffness was associated with improving in LSIQ and LSDI measures following SM intervention.

Participants
Patients with persistent low back pain were recruited from the Spine Centre of Southern Denmark, a large regional hospital department with specialist focus on spinal pain syndromes, located in Middelfart, Denmark. Patients were referred to the department from primary practice (general medical practitioners, chiropractors, medical consultants) and other hospitals in the region. See Table 1 for inclusion and exclusion criteria. Research design: A clinical trial with repeated measures.

Questionnaires
The LSIQ is a self-report measure consisting of 15 items where higher scores are assumed to indicate greater clinical lumbar instability (Fig. 1). A single point is given for every ''yes'' answer, thus the score of the LSIQ ranges from 0-15 (Cook, Brismée & Sizer, 2006). The LSDI consists of 10 items assessing the impact of low back stiffness on ADL such as dressing, hygiene, mobility, and sexual activity (Fig. 2). Responses to each item are scored from 0 (''No effect at all'') to 4 (''I cannot do this at all''). The raw score of the LSDI ranges To be enrolled in the study, the participant had to: • Fulfil informed written consent.
• Have the ability to speak and read Danish.
• Be between the age of 18 and 60.
• Have a body mass index <35 • Have had LBP >3 months, defined as pain on the posterior aspect of the body between the 12th thoracic vertebrae and the gluteal folds.
• Have no previous back surgery and not have had surgery in general in the last 4 months.
• Have received no spinal manipulation in the last month.
• Take no other pain medication than paracetamol, NSAIDs or weak synthetic opioids • Have no competing diagnoses which could (a) confound the diagnosis of nsLBP e.g., osteoporosis, cancer, fibromyalgia etc.

(b) interfere with the allocated treatment
Participants were excluded during the study if they: • Were not completing the allocated intervention (minimum 75% of scheduled treatments).
• Did not fill out the questionnaires • Received other treatment than that administered as part of the study.
• Deviated from the agreed upon medication at baseline measures within the treatment period.
• Were unable to hold breath for 10 s.
from 0-40, and a percentage score is calculated from the raw score. Higher scores indicate greater disability due to stiffness (Hart et al., 2013a). An English-Danish bi-lingual clinician at the Spine Centre experienced in the translation process of questionnaires performed the translation of both questionnaires into Danish. Demographic data was obtained from the SpineData questionnaire (Kent et al., 2015). This questionnaire is used at the hospital and has general questions about demographics, pain intensity, duration etc.

Vertetrack measurements
The VT applies a pre-selected vertical load continuously over a specific spinal region (Brown et al., 2017). It consists of a solid, aluminum gantry on lockable caster wheels that can be positioned over a participant lying in the prone position on a standard plinth (Fig. 3). The frame is used to provide a rigid support for the indenter apparatus, which applies a vertical load to the region of interest. The indenter apparatus consists of a loading rod suspended within a linear bearing to permit near-frictionless vertical translation as the load is moved along the spine on a pair of roller wheels (diameter 70 mm, width 30 mm). These wheels straddle the midline either side of the test subject's spinous processes, thus providing a rolling contact point for the application of PA loads. During application, various sensors measure the tissue deformation from the applied load as well as the position of the indenter along the spine. Using this setup, it is possible to position the indenter apparatus at defined waypoints along the spine, and then by a number of stepper motors have the indenter apparatus follow this pre-defined trajectory whilst applying a fixed PA-load to the subject through the two roller wheels. The result is a continuous and real time quantification of the bulk deformation of any spinal region for a given mass over a defined trajectory. Using a series of fixed loads in 10 Newton increments, the force-deformation profile of the spinal region of interest can be produced. In this study, waypoints were identified at each lumbar spinous process using ultrasound, and marked on the surface of the skin with an ink pen. During data collection, the participants were asked to fully exhale and hold their breath. The roller was lowered onto the participant's back and set in motion to follow the pre-defined waypoints. The testing procedure lasted approximately 10 s, and when testing was complete, the stepping motor system retracted the load from the participants back.
Prior to inclusion, potential participants underwent the usual clinical diagnostic procedures at the Spine Centre, including an extensive 'SpineData' research questionnaire. Potential participants were invited to participate only if a diagnosis of persistent nsLBP had been established. Participants included in the study completed the LSIQ and LSDI at baseline, and were scheduled for lumbar PA stiffness testing in the VT thereafter. The same procedure was repeated at a two-week follow up session.

Spinal manipulative therapy (SMT)
During the course of these two weeks, the participants were treated with spinal manipulation at the Spine Centre. The participants were placed in the side-position, and a standard manipulation lumbar-roll technique was applied (Thomas, Faculty & Clinic, (0000)). The intervention consisted of 4 treatment sessions of SM over a two-week period. The number of treatment sessions was chosen based on previous research, which reports that the majority of patients who improve with SM, will do so within four treatment sessions in the first two weeks (Stig, Nilsson & Leboeuf-Yde, 2001). The 4th session of SM was done within 5-10 min prior to the last VT follow-up session. See Fig. 4 for study flow diagram.

Ethics
The project was conducted in accordance with the Helsinki-II declaration, and the project was approved by the Regional Committees on Health Research Ethics for Southern Denmark (S-20160201) and the Danish Data Protection Agency. ClinicalTrials.gov identifier: NCT04086667. All participants provided informed, written consent.

Data analysis
Descriptive statistics were performed on clinical characteristics of the participants. Testing for normality for relevant variables was done using the Shapiro-Wilk's test, all data was deemed normally distributed. Spinal stiffness data from the VT was categorized into segmental stiffness (SS), i.e., the individual stiffness scores of motion segments L1-L5, and mean lumbar stiffness (MLS), calculated as the mean of all SS-scores. Questionnaire data were treated as continuous. VT data were treated as continuous data. Specifically, SS measures were calculated from the second point to the second last point of the raw force-displacement curve (see Fig. 5). Hence, the SS measures equal the force (N) of the applied mass divided by the displacement (mm). The MLS was calculated as the mean of all five SS measures for each participant.
Paired t -tests were performed to examine the difference in questionnaire and VT measures before and after intervention.
Spearman's analysis was performed to test the baseline correlation between measures of each questionnaire and the VT measures. Change scores from baseline to follow-up were calculated for SS, MLS and both questionnaires. Spearman's analysis was again performed to test the correlation between change scores in measures from each questionnaire and change scores in the VT measures. Scatterplots were created to illustrate the relation between baseline and change score means of the questionnaires and VT measures.
Participants were classified as responders or non-responders to the SM intervention based on whether or not they achieved a decrease of two or more points in the LSIQ at follow-up compared to baseline. This was a predefined and arbitrary cut point, deemed relevant to our study only. Participants not achieving a two-point reduction were classified as non-responders.
This procedure was also performed for the LSDI, but here a cut point corresponding to a 12,5% (i.e., 5 out of 40 points) reduction in LSDI-scores at follow-up compared to baseline, classified participants as responders. This was based on a previous study reporting an 11% improvement (i.e., decrease) in LSDI-score following arthrodesis over a single lumbar segment. A Welch two-sample t -test was performed to determine if there was a difference in the change scores of VT measures between the responders and non-responders, determined by the LSIQ and LSDI change respectively.
Finally, a 5% reduction in each VT measure (both SS and MLS) classified participants as responders. This specific cut point, which was also predefined and arbitrary, was considered relevant to our study only. A chi 2 test was performed to determine whether a decrease in lumbar PA stiffness was associated with improvement in LSIQ and LSDI.

RESULTS
A total of 35 patients were recruited into the study (Table 2). Of those, 29 patients completed the full trial. All six dropouts missed two or more treatments during the two-week intervention period. Age, sex, duration of back pain and baseline back pain scores among the included 29 patients were not different from the 6 who did not complete the trial. From the resulting 29 participants, questionnaire and VT data were normally distributed at baseline and at follow-up. The change scores of the LSIQ and the SS change score for L1, L3, L4 and L5 were also normally distributed. The LSDI change score, the L2 change score and the MLS change score were not normally distributed. Figure 6 illustrates relations between measures from the LSIQ, LSDI and MLS. Table 3 presents the mean LSIQ, LSDI and VT measures at baseline, follow-up and for the change scores. From baseline to follow-up, the mean LSIQ and LSDI scores decreased by 1.3 (P = 0.003) and 4.5 (P = 0.018) points respectively. The MLS-score increased by 0.029 N/mm (P = 0.994).
Twenty-one participants improved following the intervention period according to the pre-determined LSIQ cut-point, and 14 improved according to the LSDI-score (Table 3). Respectively, 10, 7, 9, 8 and 10 participants ''improved'' according to the segmental stiffness changes of L1-L5 (ie. a decrease in VT-scores). Seven participants improved as determined by the change in MLS. No statistically significant difference was found between responders and non-responders for either LSIQ or LSDI (Table 4). A reduction in objectively measured lumbar PA stiffness following treatment was not associated with improvement in either the LSIQ or LSDI following the SM intervention (Table 4).

DISCUSSION
Improvement in both LSIQ and LSDI scores were observed in patients with persistent nsLBP following a two-week intervention of 4 sessions of SM. The mean lumbar PA stiffness measured by the VT increased slightly during the intervention period among all participants, however not significantly. Neither the cross-sectional measures nor the change scores of the LSIQ and LSDI, correlated with objectively measured lumbar PA stiffness as measured by the VT. Those improving in the LSIQ or LSDI scores following intervention did not differ in the change scores of objectively measured PA stiffness compared to those not improving in the questionnaires scores. Finally, a decrease in lumbar PA stiffness was Table 4 Results. Correlation between LSDI and SS/MLS, both at baseline and for the change scores. Welch two-sample t -test and chi-square test for association between responders/non-responders (see ''data analysis''. not associated with improving in the LSIQ and LSDI outcome following SM intervention. These findings suggest that these subjective and objective measures do not measure similar domains. Stanton et al. recently concluded that perceived and actual, measured stiffness did not correlate well and that the experience of feeling stiff did not reflect actual biomechanical back stiffness as measured also by the VT (Stanton et al., 2017). They also found no difference in objective spinal stiffness between those with and without reported stiffness and LBP. From this work, they concluded that bodily feelings of stiffness may reflect a multisensory perceptual inference that aids bodily protection and the conscious perception of stiffness may not be the result only from joint relevant sensory information. Therefore, our results support the interpretation that it may not be feasible to rely on self-report when assessing stiffness of the spine -and also probably not when assessing constructs such as instability. Our results support this interpretation. It seems difficult to achieve measures of low back perceptions from a questionnaire, and even harder to gain knowledge as to what degree these perceptions reflect pain-related biomechanical changes of low back stiffness.

Spearman
There are several possibilities as to why the objective measure of spinal stiffness did not change during the duration of this study. Most likely, the subset of participants we tested were not responders to SMT. As mentioned in the introduction, prior work has shown that some, but not all, persons who respond to SMT with a significant change in disability also show a significant change in spinal stiffness. Clearly, as is the case with all studies, not every recruited participant will be a responder.
Findings from a recent study, suggests that the 15 items of the LSIQ may be indicative of clinical instability individually, but that many of the questionnaire items are characteristics of LBP in general (Saragiotto et al., 2018). Analysis of the differential item functioning (item bias) identified several items that were significantly and meaningfully biased by factors other than lumbar instability, which is the proposed construct of the questionnaire (Saragiotto et al., 2018). This creates the possibility that the LSIQ evaluates properties other than instability such as stiffness. Unfortunately, the questionnaire did not correlates with objectively measured lumbar PA stiffness in our study.
Interestingly, the mean LSDI scores in a study that examined patients with lumbar arthrodesis were similar to the mean baseline and follow-up scores found in our study (Hart et al., 2013b). In fact, the mean baseline LSDI score (=32,3) among the patients in our study, who did not undergo any spinal fusion, was almost as high as the score among 21 patients with fusion of five or more lumbar segments (=35,4), and even higher than the mean score among the 24 patients with one-level fusion (=24,2). Another study sample of lumbar fusion patients reported LSDI scores similar to the LSDI scores in our study (Hart et al., 2014). This raises the question whether stiffness, which according to the LSDI is implied to cause limitation of ADL, is really due to the segmental fusion. Furthermore, it raises the question whether the LSDI reflects perceptions of lumbar stiffness or some other construct. The LSDI may detect some kind of disability in performing ADLs, but it can be questioned whether this is truly because of the segmental stiffness created by spinal fusion. So, as with the case of the LSIQ, this could be part of the explanation of a poor relationship between the self-reported LSDI and the objective measure of stiffness from the VT. In addition, our findings may also reflect that the VT quantifies neutral zone spine stiffness while the questions on the LSDI are directed at ranges of motion beyond the neutral zone.

Study strengths
This study examined the relation between self-reported and objectively measured spinal stiffness both cross-sectionally and longitudinally, which expanded our assessment to include the association of the potential changes that might have occurred both in the questionnaire scores and measured stiffness. In addition, the self-report instruments were bench marked against an objective measure of spinal stiffness.

Study limitations
The fact that translation of the questionnaires into Danish was not performed using the recommended reverse translation procedures might have an influence on the reliability of the conclusions from this study. Another limitation to this study is that division into improvement/no improvement in the measures of the LSIQ, LSDI and VT following intervention were set by our estimate of a relevant cut point. However, this was a necessary procedure because of the lack of previously utilizable cut points in the literature. Further, a number of factors can potentially influence the PA spinal stiffness measured by the VT. Voluntary/involuntary paraspinal muscle activation is one example. As for discussion of the limitations and challenges of the VT we refer to previous work (Wong & Kawchuk, 2016;Wong et al., 2013). Finally, it is possible that any effects of spinal manipulation could be attenuated by not taking the participants' stiffness and/or instability status at baseline into consideration.