As a clinical neuropsychologist working in an interdisciplinary movement disorders program, I have developed tremendous respect for the role occupational therapists (OTs) play in supporting independence and quality of life of our patients. As a scientist, my research has focused on examining whether emerging technologies in healthcare can improve quality, increase access, or reduce costs (1-3). I was therefore quite keen to review this paper which reports on application of an immersive virtual reality (VR) platform to train older adults on instrumental activities of daily living. Unfortunately, my enthusiasm quickly waned upon seeing the study design and interpretation. At best, results from this study show that older adults with mild cognitive impairment (MCI) may be able to tolerate a VR-based intervention. After a second round of reviews, my strong suggestion to the editor was to reject this paper. My concerns were that the flaws in this paper will: (I) encourage others to conduct similar research without regard for scientific rigor, and (II) lead the appropriately skeptical reader to dismiss with prejudice other healthcare technology with real empirical support. While I am disappointed to see this paper in print, I am most grateful to the editor for the invitation to submit this response outlining my concerns and suggestions for improved study design.
The title states that this paper presents “a feasibility study” (4); however, multiple efficacy measures are listed in the abstract itself, reflecting the preponderance of reporting in the results section on efficacy outcomes. The diminutive sample of n=7 older adults representing 3 different types of mild cognitive impairment is marginally adequate for drawing conclusions about feasibility of administering the training program. It is wildly inadequate for exploring potential benefits using inferential statistical tests, particularly on measures such as the Consortium to Establish a Registry for Alzheimer’s Disease (CERAD) Word Recall and Trail Making, which have large practice effects. While one could overlook the normal corrections made for increased rate of Type-I error with multiple tests, the authors have more cognitive measures than participants! The use of reliable change indices to account for practice effect, measurement error, and chance (e.g., 90%) could help support that the nominal changes were unlikely to represent artifact (5). However, in an unblinded and uncontrolled study such as this, making assumptions that any improvements were due to the intervention itself versus other factors (e.g., positive interactions with staff, behavioral activation, expectancy bias, placebo effect, etc.) cannot be made. For instance, subject #2 presented with moderate to severe depression at baseline based on responses to the Beck Depression Inventory (BDI =31), which resolved to minimal or mild at the end of the study (BDI =14). Not surprisingly, there was a corresponding improvement for Instrumental Activities of Daily Living (I-ADLs), as people are more functional with less depression, and there is direct overlap in item content for the corresponding scales. After this was noted during the review process, the authors dismissed the possibility that improvement in I-ADLs was being driven by reduction in depression arguing, “there is no marked improvement in the CERAD items in case 2”. This is a particularly misleading and disingenuous statements, as this individual showed nominal improvement on the two measures that were listed as “improved in all participants” (Word List Delayed Recall and Trail Making Test), as well as several others. Do those improvements indicate real change or not?
Despite the claim in the abstract, it is also worth noting that not all participants actually improved on the Word List Delayed Recall or Trails B. Specifically, case 5 registered a score of 0 on delayed word recall at both examinations, and 2 of 7 subjects were unable to complete Trails B at baseline. The authors also chose to highlight 2 of 9 cognitive measures where there was no evidence of decline to suggest possible benefit. Putting aside the question of statistically or clinically meaningful change, this is a clear example of cherry-picking results to support a hypothesis. Tab. 5 from the paper has been recreated as Figure 1 using the data provided by the authors, and cells showing nominal pre-post decline in performance are shaded in black with white text, those that remained the same are white, and those that improved are shaded in gray. A quick visual inspection shows that there are as many nominal declines, 21, as there are gains, 21. Using the same rubric as the authors but in reverse, one could make the statement that “All participants evinced decline on at least one measure, all participants declined on Wordlist Instructions and Constructional Praxis, and the average participant declined on 3 measures”, and extrapolate to indicate that this finding raises a significant safety issue with the intervention. This obviously wrong conclusion is the basis for the scandalmongering title of this editorial. However, there is another important point to be made: the authors report that the funding for the study was provided by bHaptics, the developer of the intervention. Had this study been funded by a competitor of bHaptics, which interpretation of the cognitive data would have been presented?
Last, I would like to highlight what I believe is a fundamental conceptual flaw. In the Introduction the authors state, “It is possible that I-ADL impairment may accelerate decline in the cognitive functioning of people with dementia.” The study cited by the authors reported that physical disability impacting ADLs and some I-ADLs was associated with cognitive decline. It did not suggest that older adults spontaneously and inexplicably stopped performing I-ADLs, and cognition subsequently declined. More importantly, the construct validity of cognitive tests is based on correspondence with functional abilities in the real world. For example, performance on a word list recall task is interpreted to reflect memory in day-to-day life; performance on Trails B (a timed test of psychomotor sequencing, alternating between numbers and letters) may relate to multitasking. Decline in cognitive test performance only matters if it impacts I-ADLs. Therefore, the appropriate outcome for an intervention to increase functional independence for I-ADLs is a real-world measure of I-ADLs (self-report or through objective testing) (6,7), not performance on a cognitive test that at best provides a proxy.
The authors correctly state in the Limitations section that, “Further studies including control groups are needed to verify effectiveness.” A non-inferiority comparison design, where individuals are randomly assigned to the VR intervention or treatment as usual (with equivalent levels of activity and interactions with OTs), and outcomes are obtained by raters blinded to participant condition, is strongly recommended. If the technology does improve quality, increase access, or reduce costs of delivering healthcare services, this level of scientific rigor during the validation process is critical to produce believable results that will lead to acceptance and adoption. When studies are funded by groups with financial interest in the outcome, as in this case, the importance of such safeguards cannot be overstated (4).
Provenance and Peer Review: This article was commissioned by the editorial office, Annals of Palliative Medicine. The article did not undergo external peer review.
Conflicts of Interest: The author has completed the ICMJE uniform disclosure form (available at https://apm.amegroups.com/article/view/10.21037/apm-23-140/coif). The author has no conflicts of interest to declare.
Ethical Statement: The author is accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
- Gur RC, Ragland JD, Moberg PJ, et al. Computerized neurocognitive scanning: I. Methodology and validation in healthy people. Neuropsychopharmacology 2001;25:766-76. [Crossref] [PubMed]
- Turner TH, Renfroe JB, Duppstadt-Delambo A, et al. Validation of a Behavioral Approach for Measuring Saccades in Parkinson's Disease. J Mot Behav 2017;49:657-67. [Crossref] [PubMed]
- Turner TH, Horner MD, Vankirk KK, et al. A pilot trial of neuropsychological evaluations conducted via telemedicine in the Veterans Health Administration. Telemed J E Health 2012;18:662-7. [Crossref] [PubMed]
- Shin HT, Kim DY, Bae CR, et al. Fully-immersive virtual reality instrumental activities of daily living training for mild dementia: a feasibility study. Ann Palliat Med 2023;12:280-90. [Crossref] [PubMed]
- Maassen GH. Principles of defining reliable change indices. J Clin Exp Neuropsychol 2000;22:622-32. [Crossref] [PubMed]
- Patterson TL, Goldman S, McKibbin CL, et al. UCSD Performance-Based Skills Assessment: development of a new measure of everyday functioning for severely mentally ill adults. Schizophr Bull 2001;27:235-45. [Crossref] [PubMed]
- Patterson TL, Lacro J, McKibbin CL, et al. Medication management ability assessment: results from a performance-based measure in older outpatients with schizophrenia. J Clin Psychopharmacol 2002;22:11-9. [Crossref] [PubMed]