How different are accuracies of clinicians’ prediction of survival by assessment methods?
Highlight box
Key findings
• We showed the discrepancy in the accuracy of the clinicians’ prediction of survival (CPS) among three different analysis methods: overall accuracy, the area under the receiver operating characteristics curve, and actual survival (AS) comparison.
What is known and what is new?
• The CPS is known to be inaccurate and optimistic, however, the accuracy of CPS has been evaluated using heterogeneous methods.
• This study is the first to examine the accuracy of CPS through three different statistical approaches using the same patient population.
What is the implication, and what should change now?
• Researchers should consider that the accuracy of CPS may differ according to the evaluation methods.
• Given the low accuracy of CPS compared to that of AS, it is preferable to give wide ranges in prognostic communication. This is especially true for shorter survival times.
Introduction
Prognostic information is critical for patients, their families, and health care professionals for decision making regarding the continuation or discontinuation of treatment and referral to specialists (1). Prognostication in patients with advanced cancer often relies on a subjective factor, the clinicians’ prediction of survival (CPS) (2,3). However, CPS has been known to be inaccurate and has overestimation tendency (4-7).
The accuracy of CPS has been evaluated using heterogeneous methods; additionally, the measures to formulate CPS are different (7). There are three typical approaches for formulating CPS: a categorical estimate (e.g., 0–2 weeks; 1–2 months), a continuous estimate (e.g., number of days), and a probability estimate (e.g., the probability of surviving for 3 weeks). There are four typical approaches to define the accuracy of CPS: an estimate ±33% actual survival (AS); within a maximum time (e.g., within 30 days); within a range (e.g., within 7–14 days); and estimate discrimination (8,9). A systematic review has reported the accuracy of the different formulations of CPS in palliative care: categorical estimates of survival as varying from 23% to 78%; continuous estimates as ranging from an underestimation of 86 days to an overestimation of 93 days on average; and discriminant probability estimates as ranging between 0.74 and 0.78 (7). Compared with those of other prognostic tools, the accuracy of CPS has often been evaluated using discrimination (10). However, no study has compared methods to assess the accuracy of CPS using the same dataset (11). Understanding the strengths and limitations of the CPS in prognostic communication is essential for clinicians. To date, the accuracy of CPS has been reported inconsistently (4), but is unclear whether this was due to the heterogeneity of the study populations or care settings.
Thus, we believed that it is ideal for the accuracy of the CPS for different evaluation methods to be compared in a multicenter study on a larger scale. This study aimed to examine the accuracy of temporal CPS using different statistical approaches in patients admitted to palliative care units (PCUs) with far-advanced cancer in three East Asian sectors. We present this article in accordance with the STARD reporting checklist (available at https://apm.amegroups.com/article/view/10.21037/apm-23-393/rc).
Methods
Participants
The current study was a secondary analysis of an international multicenter prospective cohort study. The parent study, East Asian collaborative cross-cultural Study to Elucidate the Dying process (EASED), examined the dying process and end-of-life care of patients with advanced cancer in PCUs in Japan, Korea, and Taiwan. Eligible patients were consecutively admitted to participating PCUs during the study period. The inclusion criteria for the study were (I) adults aged ≥18 years in Japan and Korea and ≥20 years in Taiwan; (II) patients diagnosed with locally extensive or metastatic cancer; and (III) patients newly admitted to a participating PCU. The exclusion criteria for the study were (I) scheduled discharge within 1 week (in order to reduce follow-up loss) and (II) refusal of the patient or the patient’s family (when the patient lacked the capacity to communicate) to enroll in this study.
Data collection
The assigned palliative care clinicians prospectively recorded all variables on structured data collection sheets on the first day of admission. Discharged patients were followed for 6 months after PCU admission in Japan and Taiwan, and 6 months after PCU discharge in Korea. We defined the mortality as all deaths inside and outside the PCUs. Thus, we calculated survival time by subtracting the admission date from the death date (i.e., death date minus admission date in death cases, or last follow-up date minus admission date in patients who were alive at follow-up). We dealt the patients who were alive at the last follow-up as censored data.
Measurements
CPS is a quick prognostic indicator in palliative care. We obtained temporal CPS at enrollment using the following question: “How long do you think this patient will live (days)?”.
Additionally, clinician characteristics were gathered as follows: sex; specialization, palliative care, family medicine, internal medicine, surgery, anesthesiology, or others (multiple choices if applicable); clinical experience (years); clinical experience in palliative care (years); and number of patients with advanced cancer treated in a year.
We also obtained the patients’ background information, including age, sex, primary cancer site, living environment, marital condition, highest educational level, religion, and Eastern Cooperative Oncology Group Performance Status.
Statistical analysis
First, we performed a descriptive analysis to summarize the baseline and clinical characteristics of patients and clinicians.
Second, we classified the patients into groups of days (≤7 days) and weeks (≤30 days) based on the CPS and AS. We defined the cutoff of categories based on clinical practice and the coauthors’ consensus. Since the patients and families frequently asked the palliative care clinicians about survival according to calendar periods intuitively, we examined the differences in the distribution between the two categories. Additionally, we examined the difference in distribution between the two categories, and we computed Spearman’s correlation coefficient between CPS and AS.
Third, we assessed the discrimination ability of the CPS using the area under the receiver operating characteristic curve (AUROC) for two different timeframes: days (≤7 days) and weeks (≤30 days). The AUROC is the probability of classifying binary outcomes as its threshold varies and ranges from 0.5 (no discriminatory ability) to one (perfect discriminatory ability). We considered CPS as continuous variables to calculate the AUROCs.
Fourth, we categorized the patients into groups of “accurate”, “underestimate”, and “overestimate” based on the CPS. We defined “accurate” estimation as CPS within ±33% of AS (12). Thus, we defined “underestimate” and “overestimate” estimation as CPS being less than −33% of AS and more than +33% of AS, respectively.
All analyses were performed using the IBM Statistical Package for Social Science (SPSS) Statistics for Windows, version 24.0 (IBM Corp., Armonk, NY, USA) and R version 4.2.0. Statistical significance was set as a P value of <0.05.
Ethics
All study procedures for the primary EASED study received approval by the local Institutional Review Boards (IRBs) of all participating institutions in Japan, Korea, and Taiwan. The current study was conducted in accordance with the ethical standards of the Declaration of Helsinki (as revised in 2013). This study followed ethical guidelines for medical and health research involving human subjects presented by the Ministry of Health, Labor, and Welfare in Japan. For a noninvasive observational study, Japanese law does not require individual informed consent from participants. Instructions posted on the ward or institutional website, thus all Japanese patients were given the information that they had the opportunity to decline participation. In Korea and Taiwan, we obtained informed consent from patients or their families (in cases where the patient was incapable to provide consent). The IRBs of representative institutes in the three sectors approved this secondary analysis: Seirei Mitakahara General Hospital in Japan (Approved No. 16-29), Dongguk University Ilsan Hospital in Korea (DUIH 2017-01-042-009), and National Taiwan University in Taiwan (201611032RIND).
Results
Patient and clinician characteristics
A total of 2,638 patients were available across 37 PCUs (22 in Japan, 11 in South Korea, and 4 in Taiwan). The patients were enrolled between January 2017 and September 2018. Of these, 67 patients were excluded because of missing data of follow-up or death date (Japan, 22; Korea, 30; Taiwan, 15). Thus, we analyzed 2,571 patients (Japan, 1,874; Korea, 305; Taiwan, 392), which included 1,332 men [Japan, 951 (50.7%); Korea, 166 (54.4%); Taiwan, 215 (54.8%)]. The median survival duration was 18.0 days [95% confidence interval (CI): 16.9–19.1] in all analyzed patients, 18.0 days (95% CI: 16.6–19.4) in Japan, 22.0 days (95% CI: 19.0–25.0) in Korea, and 14.0 days (95% CI: 12.1–15.9) in Taiwan. The basic characteristics of patients were described elsewhere (13).
The characteristics of participating clinicians (total, 180; Japan, 87; Korea, 29; Taiwan, 64) are shown in Table 1. This study included 58.9% (106/180) male clinicians [Japan, 66 (75.9%); Korea, 8 (27.6%); Taiwan, 32 (50.0%)]. Clinicians’ backgrounds (i.e., specialties, clinical experience in general and palliative care) varied among the three sectors. The most common specialty of Japanese clinicians was palliative care, but it was family medicine for Korean and Taiwanese clinicians.
Table 1
Characteristics | Japan (n=87) | Korea (n=29) | Taiwan (n=64) |
---|---|---|---|
Sex (male) | 66 (75.9) | 8 (27.6) | 32 (50.0) |
Specialty | |||
Internal medicine | 13 (15.1) | 5 (17.2) | 1 (1.6) |
Palliative care | 60 (69.8) | 0 | 18 (28.1) |
Family medicine | 4 (4.7) | 22 (75.9) | 51 (79.7) |
Others | 9 (10.5) | 2 (6.9) | 11 (17.2) |
Clinical experiences (years) | 11.2±6.6 | 12.7±7.8 | 5.8±3.5 |
Clinical experiences of palliative care (years) | 5.5±5.1 | 6.8±5.5 | 2.8±3.1 |
Number of patients with far advanced cancer seen in a year | 101.3±104.7 | 129.3±151.9 | 111.1±141.3 |
Data are expressed as n (%) or mean ± standard deviation.
Differences in accuracy of different CPS evaluation methods
Table 2 lists the distributions of CPS and AS for the two timeframes. As for “days” category, the distribution of AS was larger than that of CPS. Meanwhile, the “weeks” category had a larger distribution of CPS than that of AS. Spearman’s correlation coefficients for CPS and AS were 0.50, 0.36, and 0.30 for the “days”, “weeks”, and “more than 30 days” categories (all P<0.01). A scatter plot of the CPS and AS is shown in Figure S1.
Table 2
Category | CPS* | AS** |
---|---|---|
Days (≤7 days) | 438 (17.0) | 642 (25.0) |
Weeks (≤30 days) | 1,796 (69.9) | 1,708 (66.4) |
Data are presented as number (%). *, 775 (30.1%) estimates in CPS were predicted to live >30 days. **, 863 (33.6%) patients in AS lived >30 days. CPS, clinicians’ prediction of survival; AS, actual survival.
Table 3 lists the AUROCs of the CPS according to two timeframes. The AUROCs were 86.2% (95% CI: 84.5–87.8%) in “days” and 82.2% (95% CI: 80.5–83.9%) in “weeks”.
Table 3
Time frame | Prevalence, n/N (%) | Sensitivity (%) | Specificity (%) | PPV (%) | NPV (%) | Overall accuracy (%) | AUROCs (%) |
---|---|---|---|---|---|---|---|
Days’ (≤7 days) prediction | 642/2,571 (25.0) | 51.1 (47.2–55.0) | 94.3 (93.2–95.3) | 74.9 (70.6–78.9) | 81.2 (79.3–83.0) | 83.5 (82.0–84.9) | 86.2 (84.5–87.8) |
Weeks’ (≤30 days) prediction | 1,708/2,571 (66.4) | 85.4 (83.7–87.1) | 61.0 (57.6–64.2) | 85.3 (83.7–86.7) | 67.9 (64.5–71.2) | 77.2 (75.5–78.8) | 82.2 (80.5–83.9) |
Data are presented values with (95% confidence intervals) except for prevalence. Prevalence is defined as death events in each time frame per total study population. PPV, positive predictive value; NPV, negative predictive value; AUROC, area under receiver operating characteristic curve; CI, confidence interval.
Table 4 shows the comparison of accuracy of CPS according to timeframes. The clinicians formulated “accurate” prediction of survival for 27.6% of patients in “days” category and 29.3% in “weeks” category.
Table 4
Categorization based on CPS | Accurate (n=746) | Underestimated (n=564) | Overestimated (n=1,261) |
---|---|---|---|
Days (≤7 days) | 27.6 | 24.2 | 48.2 |
Weeks (≤30 days) | 29.3 | 22.2 | 48.5 |
Total | 29.0 | 21.9 | 49.0 |
Data are presented % values. Accurate estimation was defined as clinicians’ prediction of survival being within ±33% of the actual survival. CPS, clinicians’ prediction of survival.
Discussion
To our best knowledge, this study is the first to examine the accuracy of CPS with different statistical approaches using the same patient population. The AUROCs showed an excellent level of discrimination above 80%; however, the accuracy evaluated by an estimate of ±33% of AS was only approximately 30%.
Our novel finding is that the accuracies of CPS differed according to the evaluation methods using the same dataset. Discrepancies in the results were observed between ±33% of AS, overall accuracies, and AUROCs; this discrepancy might be challenging, in terms of comparing each methodology for the accuracy of CPS directly (14-16). In a previous systematic review (6), the accuracy of CPS ranged from 23% to 78%, and the lowest percentage was derived from the method within 33% of AS comparison. AUROCs are recommended to assess the accuracy of CPS (8,9) because AUROC is not affected by the skewed distribution of datasets, which may lead to overestimation of overall accuracy (17). However, AUROCs may not capture clinically significant differences in prognostic discrimination (18). Another practical point of view is that the percentages shown in the AUROC are not the same as the predicted probability that the patient would die in the time period.
The distribution of CPS and AS showed differences (Table 2). This may indicate that palliative care clinicians do not always provide accurate predictions and tend to overestimate them. Nearly half of the clinicians who participated in our study formulated optimistic CPS. These results are consistent with those of previous studies (8). A recent guideline mentioned that a shorter time of the clinician-patient relationship and a longer time since the last contact were associated with decreased accuracies (4). In general, the first time for palliative care clinicians to see patients is just several weeks before or at the time of admission to the PCU. Thus, the relatively shorter observation period may cause difficulties for clinicians who participated to formulate CPS. Interestingly, accuracy was maintained across the two timeframes. Considering a “horizon effect”, recognizing a shorter prognosis is more accurate for clinicians rather than a longer prognosis (19) and the degree of accuracy could vary markedly.
The AUROCs for 7 and 30 days showed an excellent level of discrimination in our results (Table 3). The patients had a median survival time of weeks (18.0 days), when the patients had more predictable illness trajectories (20). Clinician characteristics, such as training in prognostication and clinical experience, are known to affect the accuracy of CPS (14). The participating clinicians had experience in palliative care; therefore, the accuracy of CPS may be better than the CPS reported in other studies (7,11).
Our study has several clinical implications. First, discrepancies in the accuracies of our results clearly showed the characteristics of the CPS. CPS has a high discriminative ability of more than 80%, as proven in recent studies (10,21). Namely, CPS can effectively differentiate between patients with better poorer prognosis. Additionally, prognostic accuracy increases when death approaches. However, accurately predicting the number of days remaining in the life of the patient becomes difficult using CPS; particularly, approximately 30% of accurate CPS reflect the difficulties. This tendency worsens as patient survival duration decreases. This is because the ±33% AS comparison method requires a precise prediction as the final days become shorter. This is in contrast to the accuracies represented by the overall accuracies and AUROCs. Clinicians need to consider this discrepancy in prognostic discussions. Our study found a lower accuracy from the method of ±33% AS comparison; considering this, it is suggested that clinicians should communicate the prognosis of patients using a wider range (e.g., the median, typical range, and best/worst case) instead of a temporal figure (22).
Our study had some limitations. First, this study was performed in PCUs. Therefore, our findings might not be generalized in other care settings such as home palliative care or general wards. Second, the participating clinicians only formulated continuous prognostic estimates. This originates from the nature of the secondary analysis. Ideally, CPS should have been collected as a categorical estimate to be consistent with the analytical methods. The results may change if participating clinicians can formulate categorical or probabilistic estimates. Third, the short median survival time of the participating patients could produce lower accuracy of CPS in assessment within ±33% times the AS because the shorter AS induced a higher error. However, patients with weeks of survival are commonly admitted to PCUs. Therefore, our study’s patient population can be considered representative of the clinical practice in East Asian sectors. Fourth, the cutoff of categories for survival of days and weeks in this study was based on clinical practice and the coauthors’ consensus. The difference in cutoff values from previous studies might have affected the results.
Conclusions
We showed a discrepancy in the accuracy of CPS among different analysis methods. Clinicians should consider that the accuracy of CPS could differ according to the evaluation methods. Given the unestablished standardization of the statistical techniques used to assess the accuracy of CPS, it is preferable to have prognostic communication based on multiple standardized approaches. Studies on the understanding of patients and families of prognostic accuracy are lacking. Future studies should investigate the gap between the explanation of the prediction models by clinicians and the estimation of patients and families.
Acknowledgments
We are grateful to Editage (www.editage.co.kr) for proofreading the manuscript for grammar and clarity.
Funding: This work was supported in part by a grant-in-aid from the Japanese Hospice Palliative Care Foundation and KAKENHI (grant numbers 16H05212, 16KT0007, and 20K20618).
Footnote
Reporting Checklist: The authors have completed the STARD reporting checklist. Available at https://apm.amegroups.com/article/view/10.21037/apm-23-393/rc
Data Sharing Statement: Available at https://apm.amegroups.com/article/view/10.21037/apm-23-393/dss
Peer Review File: Available at https://apm.amegroups.com/article/view/10.21037/apm-23-393/prf
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://apm.amegroups.com/article/view/10.21037/apm-23-393/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). All study procedures for the primary EASED study were approved by the local Institutional Review Boards (IRBs) of all participating institutions in Japan, Korea, and Taiwan. Japanese law does not require individual informed consent from participants in a noninvasive observational trial, such as the present study. Therefore, we used an opt-out method rather than obtaining written or oral informed consent. All patients received information on the study through instructions posted on the ward or institutional website and had the opportunity to decline participation. In Korea and Taiwan, informed consent was obtained from patients or their families (in cases where the patient lacked the capacity to provide consent). The IRBs of representative institutes in the three sectors approved this secondary analysis: Seirei Mitakahara General Hospital in Japan (Approved No. 16-29), Dongguk University Ilsan Hospital in Korea (DUIH 2017-01-042-009), and National Taiwan University in Taiwan (201611032RIND).
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Hui D. Prognostication of Survival in Patients With Advanced Cancer: Predicting the Unpredictable? Cancer Control 2015;22:489-97. [Crossref] [PubMed]
- Farinholt P, Park M, Guo Y, et al. A Comparison of the Accuracy of Clinician Prediction of Survival Versus the Palliative Prognostic Index. J Pain Symptom Manage 2018;55:792-7. [Crossref] [PubMed]
- Hui D, Moore J, Park M, et al. Phase Angle and the Diagnosis of Impending Death in Patients with Advanced Cancer: Preliminary Findings. Oncologist 2019;24:e365-73. [Crossref] [PubMed]
- Stone P, Buckle P, Dolan R, et al. Prognostic evaluation in patients with advanced cancer in the last months of life: ESMO Clinical Practice Guideline. ESMO Open 2023;8:101195. [Crossref] [PubMed]
- Hui D, Kilgore K, Nguyen L, et al. The accuracy of probabilistic versus temporal clinician prediction of survival for patients with advanced cancer: a preliminary report. Oncologist 2011;16:1642-8. [Crossref] [PubMed]
- Cheon S, Agarwal A, Popovic M, et al. The accuracy of clinicians’ predictions of survival in advanced cancer: a review. Ann Palliat Med 2016;5:22-9. [PubMed]
- White N, Reid F, Harris A, et al. A systematic review of predictions of survival in palliative care: how accurate are clinicians and who are the experts. PLoS One 2016;11:e0161407. [Crossref] [PubMed]
- Baba M, Maeda I, Morita T, et al. Survival prediction for advanced cancer patients in the real world: A comparison of the Palliative Prognostic Score, Delirium-Palliative Prognostic Score, Palliative Prognostic Index and modified Prognosis in Palliative Care Study predictor model. Eur J Cancer 2015;51:1618-29. [Crossref] [PubMed]
- Maltoni M, Scarpi E, Pittureri C, et al. Prospective comparison of prognostic scores in palliative care cancer populations. Oncologist 2012;17:446-54. [Crossref] [PubMed]
- Hiratsuka Y, Suh SY, Hui D, et al. Are Prognostic Scores Better Than Clinician Judgment? A Prospective Study Using Three Models. J Pain Symptom Manage 2022;64:391-9. [Crossref] [PubMed]
- Hui D, Paiva CE, Del Fabbro EG, et al. Prognostication in advanced cancer: update and directions for future research. Support Care Cancer 2019;27:1973-84. [Crossref] [PubMed]
- Amano K, Maeda I, Shimoyama S, et al. The accuracy of physicians’ clinical predictions of survival in patients with advanced cancer. J Pain Symptom Manage 2015;50:139-46.e1. [Crossref] [PubMed]
- Lee ES, Hiratsuka Y, Suh SY, et al. Clinicians’ Prediction of Survival and Prognostic Confidence in Patients with Advanced Cancer in Three East Asian Countries. J Palliat Med 2023;26:790-7. [Crossref] [PubMed]
- Tavares T, Oliveira M, Gonçalves J, et al. Predicting prognosis in patients with advanced cancer: A prospective study. Palliat Med 2018;32:413-6. [Crossref] [PubMed]
- Hui D, Park M, Liu D, et al. Clinician prediction of survival versus the Palliative Prognostic Score: Which approach is more accurate? Eur J Cancer 2016;64:89-95. [Crossref] [PubMed]
- Ermacora P, Mazzer M, Isola M, et al. Prognostic evaluation in palliative care: final results from a prospective cohort study. Support Care Cancer 2019;27:2095-102. [Crossref] [PubMed]
- Wang H, Wassan J, Zheng H. Measurements of accuracy in biostatistics. J Bioinform Comput Biol 2019;1:685-90.
- Stone P, White N, Oostendorp LJM, et al. Comparing the performance of the palliative prognostic (PaP) score with clinical predictions of survival: A systematic review. Eur J Cancer 2021; Epub ahead of print. [Crossref] [PubMed]
- Chow E, Harth T, Hruby G, et al. How accurate are physicians’ clinical predictions of survival and the available prognostic tools in estimating survival times in terminally ill cancer patients? A systematic review. Clin Oncol (R Coll Radiol) 2001;13:209-18. [Crossref] [PubMed]
- White N, Reid F, Vickerstaff V, et al. Specialist palliative medicine physicians and nurses accuracy at predicting imminent death (within 72 hours): a short report. BMJ Support Palliat Care 2020;10:209-12. [Crossref] [PubMed]
- Hui D, Ross J, Park M, et al. Predicting survival in patients with advanced cancer in the last weeks of life: How accurate are prognostic models compared to clinicians’ estimates? Palliat Med 2020;34:126-33. [Crossref] [PubMed]
- Mori M, Fujimori M, Ishiki H, et al. Adding a Wider Range and “Hope for the Best, and Prepare for the Worst” Statement: Preferences of Patients with Cancer for Prognostic Communication. Oncologist 2019;24:e943-52. [Crossref] [PubMed]