Biological evaluation of ruthenium(II) complexes appended curcumin derivatives: Synthesis, spectral characterization, anti-oxidant and anti-cancer studies

medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 1 2 3 4 5 6 Risk prediction tools for pressure injury occurrence: An umbrella review of systematic reviews reporting model development and validation methods 1,2 7 Bethany Hillier 8 Katie Scandrett 9 April Coombe 1 1,2 10 Tina Hernandez-Boussard 11 Ewout Steyerberg 12 Yemisi Takwoingi 13 Vladica Velickovic 14 Jacqueline Dinnes 3 4 1,2 5,6 1,2* 15 16 17 Affiliations 18 19 1 Department of Applied Health Sciences, College of Medicine and Health, University of Birmingham, Edgbaston, Birmingham, UK 20 21 2 NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, UK 22 3 23 24 4 Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands 25 5 26 27 6 Institute of Public Health, Medical, Decision Making and Health Technology Assessment, UMIT, Hall, Tirol, Austria 28 Email addresses 29 30 31 b.hillier@bham.ac.uk (BH); k.e.scandrett@bham.ac.uk (KS); a.r.coombe@bham.ac.uk (AC); boussard@stanford.edu (THB); e.w.steyerberg@lumc.nl (ES); y.takwoingi@bham.ac.uk (YT); vladica.velickovic@hartmann.info (VV) 32 * Corresponding author: j.dinnes@bham.ac.uk (JD) 33 Keywords 34 Development, internal, external validation, prediction, prognostic, pressure injury, ulcer, overview Department of Medicine, Stanford University, Stanford, CA USA Evidence Generation Department, HARTMANN GROUP, Heidenheim, Germany 1 NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . ABSTRACT 35 36 Background 37 38 39 40 41 Pressure injuries (PIs) place a substantial burden on healthcare systems worldwide. Risk stratification of those who are at risk of developing PIs allows preventive interventions to be focused on patients who are at the highest risk. The considerable number of risk assessment scales and prediction models available underscore the need for a thorough evaluation of their development, validation and clinical utility. 42 43 Our objectives were to identify and describe available risk prediction tools for PI occurrence, their content and development and validation methods used. 44 Methods 45 46 47 48 The umbrella review was conducted according to Cochrane guidance. MEDLINE, Embase, CINAHL, EPISTEMONIKOS, Google Scholar and reference lists were searched to identify relevant systematic reviews. Risk of bias was assessed using adapted AMSTAR-2 criteria. Results were described narratively. All included reviews contributed to build a comprehensive list of risk prediction tools. 49 Results 50 51 52 53 54 55 56 57 58 We identified 32 eligible systematic reviews only seven of which described the development and validation of risk prediction tools for PI. Nineteen reviews assessed the prognostic accuracy of the tools and 11 assessed clinical effectiveness. Of the seven reviews reporting model development and validation, six included only machine learning models. Two reviews included external validations of models, although only one review reported any details on external validation methods or results. This was also the only review to report measures of both discrimination and calibration. Five reviews presented measures of discrimination, such as area under the curve (AUC), sensitivities, specificities, F1 scores and G-means. For the four reviews that assessed risk of bias assessment using the PROBAST tool, all models but one were found to be at high or unclear risk of bias. 59 Conclusions 60 61 62 Available tools do not meet current standards for the development or reporting of risk prediction models. The majority of tools have not been externally validated. Standardised and rigorous approaches to risk prediction model development and validation are needed. 63 Registration 64 The protocol was registered on the Open Science Framework (https://osf.io/tepyk). 65 66 2 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . INTRODUCTION 67 68 69 70 71 72 73 Pressure injuries (PI) carry a significant healthcare burden. A recent meta-analysis estimated the global burden of PIs to be 13%, two-thirds of which are hospital-acquired PIs (HAPI).1 The average cost of a HAPI has been estimated as $11k per patient, totalling at least $27 billion a year in the United States based on 2.5 million reported cases.2 Length of hospital stay is a large contributing cost, with patients over the age of 75 who develop HAPI having on average a 10-day longer hospital stay compared to those without PI.3 74 75 76 77 78 79 80 81 PIs result from prolonged pressure, typically on bony areas like heels, ankles, and the coccyx, and are more common in those with limited mobility, including those who are bedridden or wheelchair users. PIs can develop rapidly, and pose a threat in community, hospital and long-term care settings. Multicomponent preventive strategies are needed to reduce PI incidence4 with timely implementation to both reduce harm and burden to healthcare systems.5 Where preventive measures fail or are not introduced in adequate time, PI treatment involves cleansing, debridement, topical and biophysical agents, biofilms, growth factors and dressings6 7 8, and in severe cases, surgery may be necessary.5 9 82 83 84 85 86 87 88 89 90 91 92 A number of clinical assessment scales for assessing the risk of PI are available (e.g. Braden10 11, Norton12, Waterlow13) but are limited by reliance on subjective clinical judgment. Statistical risk prediction models may offer improved accuracy over clinical assessment scales, however appropriate methods of development and validation are required.14 15 16 Although methods for developing risk prediction models have developed considerably,14 15 17 18 methodological standards of available models have been shown to remain relatively low.17 19-22 Machine learning (ML) algorithms to develop prediction models are increasingly commonplace, but these models are at similarly high risk of bias23 and do not necessarily offer any model performance benefit over the use of statistical methods such as logistic regression.24 Methods for systematic reviews of risk prediction model studies have also improved,25-27 with tools such as PROBAST (Prediction model Risk of Bias Assessment Tool)28 now available to allow critical evaluation of study methods. 93 94 95 96 97 98 99 Although several systematic reviews of PI risk assessment scales and risk prediction models for PI (subsequently referred to as risk prediction tools) are available29-38, these have been demonstrated to frequently focus on single or small numbers of scales or models, use variable review methods and show a lack of consensus about the accuracy and clinical effectiveness of available tools.39 We conducted an umbrella review of systematic reviews of risk prediction tools for PI to gain further insight into the methods used for tool development and validation, and to summarise the content of available tools. 100 METHODS 101 102 103 104 105 106 Protocol registration and reporting of findings We followed guidance for conducting umbrella reviews provided in the Cochrane Handbook for Intervention Reviews.40 The review was reported in accordance with guidelines for Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)41 (see Appendix 1), adapted for risk prediction model reviews as required. The protocol was registered on the Open Science Framework (https://osf.io/tepyk). 107 108 109 Electronic searches of MEDLINE, Embase via Ovid and CINAHL Plus EBSCO from inception to June 2024 were developed, tested and conducted by an experienced information specialist (AC), Literature search 3 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 110 111 112 113 114 115 116 117 employing well-established systematic review and prognostic search filters42-44 combined with specific keyword and controlled vocabulary terms relating to PIs. Additional simplified searches were undertaken in EPISTEMONIKOS and Google Scholar due to the more limited search functionality of these two sources. The reference lists of all publications reporting reviews of prediction tools (systematic or non-systematic) were reviewed to identify additional eligible systematic reviews and to populate a list of PI risk prediction tools. Title and abstract screening and full text screening were conducted independently and in duplicate by two of four reviewers (BH, JD, YT, KS). Any disagreements were resolved by discussion or referral to a third reviewer. 118 119 120 121 122 123 124 125 Eligibility criteria for this umbrella review Published English-language systematic reviews of risk prediction models developed for adult patients at risk of PI in any setting were included. Reviews of clinical risk assessment tools or models developed using statistical or ML methods were included, both with or without internal or external validation. The use of any PI classification system6 45-47 as a reference standard was eligible. Reviews of the diagnosis or staging of those with suspected or existing PIs or chronic wounds, reviews of prognostic factor and predictor finding studies, and models exclusively using pressure sensor data were excluded. 126 127 128 129 Systematic reviews were required to report a comprehensive search of at least two electronic databases, and at least one other indicator of systematic methods (i.e. explicit eligibility criteria, formal quality assessment of included studies, sufficient data presented to allow results to be reproduced, or review stages (e.g. search screening) conducted independently in duplicate). 130 131 132 133 134 135 136 137 Data extraction and quality assessment Data extraction forms (Appendix 3) were developed using the CHARMS checklist (CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies) and Cochrane Prognosis group template.48 49 One reviewer extracted data concerning: review characteristics, model details, number of studies and participants, study quality and results. Extractions were independently checked by a second reviewer. Where discrepancies in model or primary study details were noted between reviews, we accessed the primary model development publications where possible. 138 139 140 141 142 143 The methodological quality of included systematic reviews was assessed using AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews)50, adapted for systematic reviews of risk prediction models (Appendix 4). Quality assessment and data extraction were conducted by one reviewer and checked by a second (BH, JD, KS), with disagreements resolved by consensus. Our adapted AMSTAR-2 contains six critical items, and limitations in any of these items reduce the overall validity of a review.50 144 145 146 147 148 149 150 Reviews were considered according to whether any information concerning model development and validation was reported. This specifically refers to reporting methods of model development or validation, and/or the presentation of measures of both discrimination and calibration. This is in contrast to evaluations of prognostic accuracy, where models are applied at a binary threshold (e.g., for high or low risk), and present only discrimination metrics with no further consideration of model performance. Available data were tabulated, and a narrative synthesis provided. 151 152 153 All risk prediction models identified are listed in Appendix 5 Table S4, including those for which no information about model development or validation was provided at systematic review level. Risk prediction models were classified as ML-based or non-ML models, based on how they were classified Synthesis methods 4 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 154 155 156 157 in included systematic reviews, including cases where models such as logistic regression were treated as ML-based models. Where possible, the predictors included in the tools were extracted at review level and categorised into relevant groups in order to describe the candidate predictors associated with risk of PI. No statistical synthesis of systematic review results was conducted. 158 159 160 161 Reviews reporting results as prognostic accuracy (i.e. risk classification according to a binary decision) or clinical effectiveness (i.e. impact on patient management and outcomes) are reported elsewhere.39 Hereafter, the term clinical utility is used to encompass both accuracy and clinical effectiveness. 162 RESULTS 163 164 165 166 167 168 169 170 Characteristics of included reviews Following de-duplication of search results, 7200 unique records remained, of which 118 were selected for full text assessment. We obtained the full text of 111 publications of which 32 met all eligibility criteria for inclusion (see Figure 1). Seven reviews reported details about model development and internal validation36 37 51-55, two of which also considered external validation52 54; 19 reported accuracy data29 31-35 38 54 56-66; and 11 reported clinical effectiveness data.30 56 58 61 66-72 One review54 reported both model development and accuracy data, and four reviews reported both accuracy and effectiveness data.56 58 61 66 171 172 173 174 175 176 177 178 179 180 181 Table 1 provides a summary of systematic review methods for all 32 reviews according to whether or not they reported any tool development methods (see Appendix 5 for full details). The seven reviews reporting prediction tool development and validation were all published within the last six years (2019 to 2024) compared to reviews focused on the clinical utility of available tools (published from 2006 to 2024). Reviews focused on model development methods almost exclusively focused on MLbased models (all but one60 of the seven reviews limited inclusion to ML models), and frequently did not report study eligibility criteria related to study participants or setting (Table 1). In comparison, only two reviews (8%) concerning the clinical utility of models included ML-based models,38 54 but more often reported eligibility criteria for population or setting: hospital settings (n = 3),33 38 54 or surgical settings (n=8),34 61 63 64 70 31, hospital or acute settings (n=2)67 71, long-term care settings (n=2)29 35 or the elderly (n=1).60 182 183 184 185 186 187 188 189 190 191 On average, reviews about tool development included more studies than reviews of clinical utility (median 22 compared to 15), more participants (median 408,504 compared to 7,684) and covered more prediction tools (median 21 compared to 3) (Table 1). Ten reviews (38%) about clinical utility included only one risk assessment scale, whereas reviews of tool development included at least 3 different risk prediction models. The PROBAST tool for quality assessment of prediction model studies was used in 57% (n=4) of tool development reviews37 52-54, whereas validated test-accuracy specific tools such as QUADAS were used less frequently (10/26, 38%) in reviews of clinical utility. Two reviews of tool development did not report any quality assessment of included studies (29%), compared to 4 (15%) of reviews of clinical utility. Meta-analysis was conducted in two of seven (29%) reviews of tool development compared to more than half of reviews of clinical utility (15, 58%). 192 193 194 195 196 197 Methodological quality of included reviews The quality of included reviews was generally low (Table 2; Appendix 5 for full assessments). The majority of reviews (71% (5/7) reviews on tool development and 78% (18/23) reviews on clinical utility) partially met the AMSTAR-2 criteria for the literature search (i.e. searched two databases, reported search strategy or key words, and justified language/publication restrictions), with only three (two reviews56 72 on clinical utility, and one review54 on both tool development and clinical 5 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 198 199 200 201 202 203 204 205 utility) meeting all criteria for ‘Yes’ (i.e. searching grey literature and reference lists, with the search conducted within 2 years of publication). Twenty-two reviews (69%) conducted study selection in duplicate (5/7 (71%) of reviews about tool development and 17/26 (65%) of clinical utility reviews). Conflicts of interest were reported in all seven tool development reviews and 77% of clinical utility reviews (20/26). Reviews scored poorly on the remaining AMSTAR-2 items, with around 50% or fewer reviews meeting the stipulated AMSTAR-2 criteria. Nine reviews (28%) used an appropriate method of quality assessment of included studies and provided itemisation of judgements per study. No review scored ‘Yes’ for all AMSTAR-2 items in either category. 206 Figure 1. PRISMA flowchart: identification, screening and selection process 41 Identification of studies via databases Id en tifi ca tio n Sc re en in g Records identified (n = 10,326): MEDLINE (n = 1,872) EMBASE (n = 2,390) CINAHL (n = 4,200) Epistemonikos (n = 1,426) Google Scholar (n = 437) Reference lists (n = 1) Duplicate records removed through automated deduplication (n = 3,126) Records screened (n = 7,200) Records excluded (n = 7,082) Articles selected for retrieval (n = 118) Articles not retrieved (n = 7) Full-text articles excluded (n=79) Not a systematic review (n = 32) No risk prediction models (n = 14) Wrong research question (n = 17) No English language translation (n = 7) Duplicate (n = 3) Wrong outcome (n = 2) Updated version included (n = 2) Wrong population (n = 1) No results (n = 1) Full-text articles assessed for eligibility (n = 111) Total reviews included (n = 32) In cl ud ed 207 Reviews reporting about accuracy or clinical effectiveness (n = 26)* Reviews reporting details about tool development or validation (n = 7)* 54 List of full-text articles excluded, with reasons, is given in Appendix 5. *Note that one review is included in both. 6 Table 2. Summary of AMSTAR-2 assessment results Reviews reporting model development and/or validation (n=7) ITEM 1 Research question / inclusion criteria 1 ITEM 2 Protocol Reviews reporting prognostic accuracy and/or clinical effectiveness (n=26) 6 5 2 5 ITEM 3 Study design inclusions 1 6 ITEM 4 Search strategy 1 6 8 3 3 4 1 3 ITEM 10 Funding of included studies 3 1 1 7 6 24 12 10 5 4 12 10 5 ITEM 14 Heterogeneity investigation 2 5 14 Yes 40% 12 15 7 20% 13 4 2 0% 12 5 ITEM 13 RoB – impact on results ITEM 15 Conflicts of interest 7 2 2 ITEM 12 RoB – impact on synthesis 11 24 7 7 ITEM 11 Appropriate statistical synthesis 9 2 1 5 15 6 ITEM 9 RoB / quality assessment 18 17 7 ITEM 8 Included studies descriptions 24 2 ITEM 7 Excluded studies list 17 2 5 ITEM 6 Data extraction in duplicate 1 11 20 60% Partial Yes 80% No 100% 0% 20% 40% 6 60% 80% 100% N/A AMSTAR – A MeaSurement Tool to Assess systematic Reviews; Item 1 – Adequate research question/ inclusion criteria?; Item 2 – Protocol and justifications for deviations?; Item 3 – Reasons for study design inclusions?; Item 4 – Comprehensive search strategy?; Item 5 – Study selection in duplicate?; Item 6 – Data extraction in duplicate?; Item 7 – Excluded studies list (with justifications)?; Item 8 – Included studies description adequate?; Item 9 – Assessment of RoB/quality satisfactory?; Item 10 – Studies’ sources of funding reported?; Item 11 – Appropriate statistical synthesis method?; Item 12 – Assessment of impact of RoB on synthesised results?; Item 13 – Assessment of impact of RoB on review results?; Item 14 – Discussion/investigation of heterogeneity?; Item 15 – Conflicts of interest reported?; N/A – not applicable; RoB – risk of bias. Further details on AMSTAR items are given in Appendix 4, and results per review are given in Appendix 5. Note that where AMSTAR-2 assessment was applied to overlapping reviews (n=3) for prognostic accuracy and clinical effectiveness separately, and resulted in differing judgements for each review question, the judgements for the prognostic accuracy review question are displayed here for simplicity. 7 It is made available under a CC-BY 4.0 International license . ITEM 5 Study selection in duplicate 21 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 208 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 Of the 32 reviews, 26 reviews focused on the clinical utility (accuracy or effectiveness) of prediction tools. These clinical utility reviews provided no details about the development or validation of included models (except for one review54), and gave only limited detail about setting and study design (see Appendix 5). Reviews reporting the accuracy of prediction tools largely treated the tools as diagnostic tests to be applied at a single threshold (e.g., for high or low risk) and they did not focus on the broader aspects of prognostic model performance, such as calibration and the temporal relationship between prediction and the outcome, PI occurrence. These reviews included a total of 70 different prediction tools, predominantly derived by clinical experts, as opposed to empiricallyderived models (that is, with statistical or ML methods). The methodology underlying their development is not always explicit, with scales in routine clinical usage apparently based on epidemiological evidence and clinical judgment about predictors that may not meet accepted principles for the development and reporting of risk prediction models. The most commonly included tools were the Braden10 11 (included in 21 reviews), Waterlow13 (n=14 reviews), Norton12 (n=11 reviews), and Cubbin and Jackson scales97 98 (n=8 reviews). 224 225 226 227 228 229 230 231 232 The seven systematic reviews that reported detailed information about model development and validation included 70 prediction models, 48 of which were unique to these seven reviews. Between three51 and 3536 model development studies were included; one review52 also included eight external validation studies and another review54 included one external validation study. Electronic health records (EHRs) were used for model development in all studies in one review37 and for the majority of models (>66%) in the remaining reviews, where reported.51 54 55 53 Three reviews52 54 55 reported the use of prospectively or retrospectively collected data. No review included information about the thresholds used define whether a patient is at risk of developing PIs. Five reviews included detail about the predictors included in each model. 233 234 235 236 237 238 The largest review36 reported that logistic regression was the most commonly reported modelling approach (20/35 models), followed by random forest (n=18), decision tree (n=12) and support vector machine (n=12) approaches. Logistic regression was also the most frequently used approach in three other reviews (18/2355, 16/2152 and 15/2253). Primary studies frequently compared the use of different ML methods using the same datasets, such that ‘other’ ML methods were reported with little to no further detail (e.g. 19 studies in the review by Dweekat and colleagues36). 239 240 241 242 243 244 Approaches to internal validation were not well reported in the primary studies. One review52 found no information on internal validation for 76% (16/21) of studies; with re-sampling reported in two and tree-pruning, cross-validation and split sample reported in one study each. Another review36 reported finding no information about internal validation for 20% of studies (7/35) and the use of cross-validation (n=10), split sample (n=10) techniques, or both (n=8) for the remainder. Crossvalidation was used in more than half (12/22) of studies in another.53 245 246 247 248 249 250 251 252 Only one review reported details on methods for selection of model predictors52: 29% (6/21) selected predictors by univariate analysis prior to modelling and 9 used stepwise selection for final model predictors; 11 (52%) clearly reported candidate predictors, and all 21 clearly reported final model predictors. Another review54 stated that feature selection (or predictor selection) was performed improperly and that some studies used univariate analyses to select predictors, but further details were not provided. One review52 reported 15 models (71%) with no information about missing data, and only two using imputation techniques (imputation using another data set, and multiple imputation by chained equations). Another review54 reported 7 models (39%) with no Findings 8 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 253 254 information about missing data, missing data excluded or negligible for 4 models (22%), and single or multiple imputation techniques used for 5 (28%) and 3 (17%) models, respectively. 255 256 257 258 259 260 261 Model performance measures were reported by three reviews37 52 53, all of which noted considerable variation in reported metrics and model performance including C-statistics (0.71 to 0.89 in 10 studies53), F1 score (0.02 to 0.99 in 9 studies53), G-means (0.628 to 0.822 in four studies37), and observed versus expected ratios (0.97 to 1 in 3 studies52). Four reviews37 53-55 reported measures of discrimination associated with included models. Across reviews, reported sensitivities ranged between 0.04 and 1, specificities ranged between 0.69 and 1, and AUC values ranged between 0.50 and 1. 262 263 264 265 266 267 268 269 270 271 272 Shi and colleagues52 included eight external validations using data from long-term care (n=4) or acute hospital care (n=4) settings (Appendix 5 Table S5). All were judged to be at unclear (n=4) or high (n=4) risk of bias using PROBAST. Model performance metrics for five models (TNH-PUPP89, Berlowitz 11-item model99, Berlowitz MDS adjustment model90, interRAI PURS88, Compton ICU model94) included C-statistics between 0.61 and 0.9 and reported observed versus expected ratios were between 0.91 and 0.97. The review also reported external validation studies for the ‘SS scale’100 and the prePURSE study tool91, but no model performance metrics were given. A meta-analysis of Cstatistics and O/E ratios was performed, including values from both development and external validation cohorts (Table 3). Parameters related to model development were not consistently reported: C-statistics ranged between 0.71 and 0.89 (n = 10 studies); observed versus expected ratios ranged between 0.97 and 1 (n=3 studies). 273 274 275 276 277 278 279 280 Pei and colleagues54 reported that one81 (1/18, 6%) of the model development studies included in their review also conducted an external validation. However, review authors presented accuracy metrics that originated from the internal validation, as opposed to the external validation (determined from inspection of the primary study). Additionally, no details on external validation methods and no measures of calibration were presented. Pei and colleagues54 judged this study to be of high risk of bias using PROBAST, as with the majority of studies (16/18, 89%) included in their review. More detailed information about individual models, including predictors, specific model performance metrics and sample sizes, is presented in Appendix 5. 281 282 283 284 285 A total of 124 risk prediction tools were identified (Table 4); 111 tools were identified from the 32 included systematic reviews and 13 were identified from screening the reference lists of literature reviews that used non-systematic methods that were considered during full text assessment. Full details obtained at review-level are reported in Appendix 5 Table S4. 286 287 288 289 290 291 292 293 294 Tools were categorised as having been developed with (60/124, 48%) or without (64/124, 52%) the use of ML methods (as defined by review authors). Prospectively collected data was used for model development for 21% of tools (26/124), retrospectively collected data for 41% (51/124), or was not reported (47/124). Information about the study populations was poorly reported, however study setting was reported for 112 prediction tools. Twenty-seven tools were reported to have been developed in hospital inpatients, and 22 were developed in long-term care settings, rehabilitation units or nursing homes or hospices. Where reported (n=100), sample sizes ranged from 15101 to 1,252,313.102 The approach to internal validation used for the prediction tools (e.g. cross-validation or split sample) was not reported at review-level for over two thirds of tools (83/124, 67%). 295 296 We could extract information about the predictors for only 66 of the 124 tools (Table 5 and Appendix 5). The most frequently included predictor was age (33/66, 50%), followed by pre-disposing Included tools and predictors 9 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 297 298 299 300 301 302 303 diseases/conditions (32/66, 48%), medical treatment/care received (28/66, 42%) and mobility (27/66, 41%). Tools often (31/66, 47%) included multiple pre-existing conditions or comorbidities and multiple types of treatment or medication as predictors. Other common predictors include laboratory values, continence, nutrition, body-related values (e.g. weight, height, body temperature), mental status, activity, gender and skin assessment (27% to 35% of tools). Ten tools incorporated scores from other established risk prediction scales as a predictor, with eight including Braden10 11 scores, one including the Norton12 score and one including the Waterlow13 score. 304 305 Only one review52 reported the presentation format of included tools, coded as ‘score system’ (n=11), ‘formula equation’ (n=3), ‘nomogram scale’ (n=2), or ‘not reported’ (n=6). 306 10 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 307 Table 4. Summary of tool characteristics, extracted at review-level ML-based models (N=60, 48%) Tool characteristics Non-ML tools (N=64, 52%) Total (N=124) No. of included reviewsA considered in 0 0 (0) 13 (20) 1 31 (52) 23 (36) 13 (10) 54 (44) 2 6 (10) 9 (14) 15 (12) >2 23 (38) 19 (30) 42 (34) 2020 (2000 – 2023) 1998 (1962 – 2015) 2008 (1962 – 2023) Development study details Median (range) year of publication Source of data 8 (13) 18 (28) 26 (21) Retrospective Prospective 41 (68) 10 (16) 51 (41) NS 11 (18) 36 (56) 47 (38) Hospital 16 (27) 11 (17) 27 (22) 8 (13) 14 (22) 22 (18) 33 (55) 24 (38) 57 (46) Setting Long-term care (incl. end-of-life and rehab) Acute care (incl. surgical and ICU) Mixed settings 1 (2) 1 (2) 2 (2) Other 2 (3) 2 (3) 4 (3) NS 0 (0) 12 (19) 12 (10) 36 (60) 34 (53) 70 (56) 4 (7) 3 (5) 7 (6) 20 (33) 27 (42) 47 (38) 1 (1) Study population age Adults Any NS Baseline condition 1 (2) 0 (0) No PIs at baseline PIs at baseline 11 (18) 19 (30) 30 (24) NS 48 (80) 45 (70) 93 (75) ML algorithms 48 (80) 0 (0) Logistic regression 40 (67) 15 (23) Development methods Development method/algorithmB 48 (39) C 55 (44) Cox regression 0 (0) 5 (8) 5 (4) Fine-Gray model 2 (3) 0 (0) 2 (2) Clinical expertise 0 (0) 2 (3) NS 0 (0) 44 (69) Cross-validation 21 (35) 3 (5) Data splitting 28 (47) 0 (0) 28 (23) Not done / NS 22 (37) 61 (95) 83 (67) 7 (3 – 23) 8 (3 – 12) 7 (3 – 23) 686 (15 – 1252313) Internal validation methodB Median (range) no. of final predictorsE F 2 (2) D 44 (35) G 24 (19) Study cohort 308 309 310 311 312 313 314 Median (range) total sample size Median (range) number of events Median (range) proportion of events (% of sample size) 2674 (27 – 1252313) 285 (15 – 31150) 207 (8 – 86410) 51 (9 – 1350) 98 (8 – 86410) 10.43% (0.42% – 14.84% (1.18% – 14.69% (0.42% – 80.00%) 46.67%) 80.00%) Note that tools were categorised as ML or non-ML tools based on the descriptions from authors of the included systematic reviews that the tools were identified in. number not equal to N (100%); C A the 32 included systematic reviews; B tools use multiple methods, therefore total one study also used discriminant analysis for model development; clinical expertise, but development methods were not clearly reported; E G many seemed to use counting of final predictors may vary between models: some authors may count individual factors, while others consider domains or subscales; models did not implement internal validation; D F one review 36 implies 5 ‘resampling’ (not described further) was used for the development of 2 models; ML – machine learning; NS – not stated; ICU – intensive care unit; PI – pressure injury. 11 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 315 Table 5. Predictor categories and frequency (%) of inclusion in N=66 tools. Predictor category 316 317 318 Age Pre-disposing conditions Receiving medical treatment/care Mobility Laboratory values Continence Nutrition Body Mental Status Activity Gender Skin General Health Braden10 11 score Length of stay Pressure injury Surgery duration Ability to ambulate Medical unit, ward, visit Ethnicity or place of birth Friction, shear, pressure Body position Pain Hygiene Isolation Smoking Norton12 or Waterlow13 score 'Special' (not explained) No. of tools predictor appears in 33 (50) 32 (48) 28 (42) 27 (41) 23 (35) 22 (33) 22 (33) 21 (32) 21 (32) 21 (32) 21 (32) 18 (27) 14 (21) 8 (12) 8 (12) 7 (11) 6 (9) 6 (9) 5 (8) 5 (8) 3 (5) 3 (5) 3 (5) 2 (3) 2 (3) 2 (3) 2 (3) 2 (3) Figures are given as count (% out of 66 tools with information on predictors). Note that multiple predictors may fall within the same predictor category. For instance, the category ‘skin’ may encompass both 'skin moisture' and 'skin integrity’, with the frequency count reflecting the entire predictor category rather than individual predictors. 319 12 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 DISCUSSION 338 339 340 341 342 343 344 345 346 347 348 Model development algorithms included logistic regression, decision trees and random forests, with a vast number of ML-based models having been developed in the last five years. Although logistic regression is considered a statistical approach107, it does share some characteristics with ML methods.108 Modern ML frameworks and libraries have streamlined the automation of logistic regression, including feature selection, hyperparameter optimisation, and cross-validation, solidifying its role within the ML ecosystem; however, logistic regression may still appear in non-ML contexts, as some developers continue to apply it using more traditional methods. Most (6/7, 86%) of our set of reviews reported the use of logistic regression as part of an ML-based approach, however this reflects the classifications used by included systematic reviews as opposed to our own assessment of the methods used in the primary studies, and may therefore be an overestimation of the use of ML models. 349 350 351 352 353 354 355 In contrast to logistic regression approaches, decision trees and random forests may not produce a quantitative risk probability. Instead, they commonly categorise patients into binary ‘at risk’ or ‘not at risk’ groups. Although the risk probabilities generated in logistic regression prediction models can be useful for clinical decision making, it was not possible to derive any information about thresholds used to define ‘at risk’ or ‘not at risk’, and for most reviews, it was unclear what the final model comprised of. This lack of transparency poses potential hurdles in applying these models effectively in clinical settings. 356 357 358 359 360 361 362 363 A recent systematic review of risk of bias in ML-developed prediction models found that most models are of poor methodological quality and are at high risk of bias.23 In our set of reviews, of the four reviews that conducted a risk of bias assessment using the PROBAST tool, all models but one103 were found to be at high or unclear risk of bias.37 52-54 This raises significant concerns about the accuracy of clinical risk predictions. This issue is particularly critical in light of emerging evidence104 on skin tone classification versus ethnicity/race-based methods in predicting pressure ulcer risk. These results underscore the need for developing bias-free predictive models to ensure accurate and equitable healthcare outcomes, especially in diverse patient populations. This umbrella review summarises data from 32 eligible systematic reviews of PI risk prediction tools. Quality assessment using an adaptation of AMSTAR-2 revealed that most reviews were conducted to a relatively poor standard. Critical flaws were identified, including inadequate or absent reporting of protocols (23/32, 72%), inappropriate statistical synthesis methods (13/17, 76%) and lack of consideration for risk of bias judgements when discussing review results (17/32, 53%). Despite the large number of risk prediction models identified, only seven reviews reported information about model development and validation, predominantly for ML-based prediction models. The remaining reviews reported the accuracy (sensitivity and specificity), or effectiveness of identified models. The studies included in the ‘accuracy’ reviews that we identified, typically reported a binary classification of participants as high or low risk of PI based on the risk prediction tool scores, rather than constituting external validations of models. For many (44/64, 69%) prediction tools that were developed without the use of ML, we were not able to determine whether reliable and robust statistical methods were used or whether models were essentially risk assessment tools developed based on expert knowledge. For nearly half (58/124, 47%) of the identified tools, predictors included in the final models were not reported. Details of study populations and settings were also lacking. It was not always clear from the reviews whether the poor reporting occurred at review level or in the original primary study publications. 13 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 364 365 366 367 368 369 Where the method of internal validation was reported, split-sample and cross-validation were the most commonly used techniques, however, detail was limited, and it was not possible to determine whether appropriate methods had been used. Although split-sample approaches have been favoured for model validation, more recent empirical work suggests that bootstrap-based optimism correction105 or cross-validation106 are preferred approaches. None of the included reviews reported the use of optimism correction approaches. 370 371 372 373 374 375 376 377 378 379 380 381 382 Only two reviews included external validations of previously developed models52 54, however limited details of model performance were presented. External validation is necessary to ensure a model is both reproducible and generalisable109 110, bringing the usefulness of the models included in these reviews into question. The PROGRESS framework suggests that multiple external validation studies should be conducted using independent datasets from different locations.15 In the two reviews that included model validation studies52 54, it is unclear whether these studies were conducted in different locations. Where reported, they were all conducted in the same setting as the corresponding development study. PROGRESS also suggests that external validations are carried out in a variety of relevant settings. Shi and colleagues52 described four of eight validations as using ‘temporal’ data, which suggests that the validation population is largely the same as the development population but with use of data from different timeframes. This approach has been described as lying somewhere ‘between’ internal and external validation, further emphasising the need for well-designed external validation studies.109 383 384 385 386 387 388 389 Importantly, model recalibration was not reported for any external validations. Evidence suggests greater focus should be placed on large, well-designed external validation studies to validate and improve promising models (using recalibration and updating111), rather than developing a multitude of new ones.15 18 Model validation and recalibration should be a continuous process, and this is something that future research should address. Following external validation, effectiveness studies should be conducted to assess the impact of model use on decision making, patient outcomes and costs.15 390 391 392 393 394 395 The effective use of prediction tools is also influenced by the way in which the model’s output is presented to the end-user. Only one review52 reported the presentation format of included tools, such as formula equations and nomograms. In conjunction with this, identifying and mitigating modifiable risk factors can help prevent PIs. Additional effort is needed in the development of risk prediction tools to extract predictors that are risk modifiers and provide end-users with this information, to make the predictions more interpretable and actionable. 396 397 398 399 400 401 402 403 404 Risk stratification in itself is not clinically useful unless it leads to an effective change in patient management. For instance, in high-risk groups, additional types of preventive interventions can be triggered, or default preventive measures can be applied more intensively (e.g., more frequent repositioning) based on the results of the risk assessment. While sensitivity and specificity are valid performance metrics, their optimisation must consider the cost of misclassification. Net benefit calculations, which can be visualised through decision curves,112 provide a more reliable means of evaluating the clinical utility of risk assessment for PIs across a range of thresholds at which clinical action is indicated. These calculations can assist in providing a balanced use of resources while maximising positive health outcomes, such as lowering incidence of PI. 405 406 407 408 It is also important to assess whether the tool can improve outcomes with existing preventive interventions and whether it integrates well into clinical workflows (i.e., clinical effectiveness). A well-developed tool with good calibration and discrimination properties may be of limited value if these practical concerns are not addressed. Therefore, model developers should check the expected 14 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 409 410 411 value of prognosis and how the tool can guide prevention when employed in practice, before planning model development. If it’s determined that there is no value in predicting certain outcomes – that brings into question whether the model should even be developed.113 412 413 414 415 416 417 418 419 420 421 422 423 Despite the advances in methods for developing risk prediction models, scales developed using clinical expertise such as the Braden Scale10 11, Norton Scale12, Waterlow Score13 and Cubbin-Jackson Scales97 98 are extensively discussed in numerous clinical practice guidelines for patient risk assessment, and are commonly used in clinical practice.6 114 Although guidelines recognise their low accuracy, they are still acknowledged, while other risk prediction models are not even considered. This may be due to the availability of at least some clinical trials evaluating the clinical utility of scales.39 Some scales, such as the Braden scale10 11, are so widely used that they have become an integral component of risk assessment for PI in clinical practice, and have even been incorporated into EHRs. Their widespread use may impede the progress towards development, validation and evaluation of more accurate and innovative risk prediction models. Striking a balance between tradition and embracing advancements is crucial for effective implementation in healthcare settings and improving patient outcomes. 424 425 426 427 428 429 430 431 432 433 434 Our umbrella review is the first to systematically identify and evaluate systematic reviews of risk prediction models for PI. The review was conducted to a high standard, following Cochrane guidance40, and with a highly sensitive search strategy designed by an experienced information specialist. Although we excluded non-English publications due to time and resource constraints, where possible these publications were used to identify additional eligible risk prediction models. To some extent our review is limited by the use of AMSTAR-2 for quality assessment of included reviews. AMSTAR-2 was not designed for assessment of diagnostic or prognostic studies and, although we made some adaptations, many of the existing and amended criteria relate to the quality of reporting of the reviews as opposed to methodological quality. There is scope for further work to establish criteria for assessing systematic reviews of prediction models. 435 436 437 438 439 440 441 442 443 444 445 446 The main limitation, however, was the lack of detail about risk prediction models and risk prediction model performance that could be determined from the included systematic reviews. To be as comprehensive as possible in model identification, we were relatively generous in our definition of ‘systematic’, and this may have contributed to the often-poor level of detail provided by included reviews. It is likely, however, that reporting was poor in many of the primary studies contributing to these reviews. Excluding the ML-based models, more than half of available risk prediction scales or tools were published prior to the year 2000. The fact that the original versions of reporting guidelines for diagnostic accuracy studies115 and risk prediction models116 were not published until 2003 and 2015 respectively, is likely to have contributed to poor reporting. In contrast, the ML-based models were published between 2000 and 2023, with a median year of 2020. Reporting guidelines for development and validation of ML-based models are more recent117 118, but aim to improve the reporting standards and understanding of evolving ML technologies in healthcare. 447 448 449 450 451 452 453 CONCLUSIONS Strengths and limitations There is a very large body of evidence reporting various risk prediction scales, tool and models for PI which has been summarised across multiple systematic reviews of varying methodological quality. Only five systematic reviews reported the development and validation of models to predict risk of PIs. It seems that for the most part, available models do not meet current standards for the development or reporting of risk prediction models. Furthermore, most available models, including ML-based models have not been validated beyond the original population in which they were 15 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 454 455 456 457 458 developed. Identification of the optimal risk prediction model for PI from those currently available would require a high-quality systematic review of the primary literature, ideally limited to studies conducted to a high methodological standard. It is evident from our findings that there is still a lack of consensus on the optimal risk prediction model for PI, highlighting the need for more standardised and rigorous approaches in future research. 459 16 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 460 Table 1. Summary of included systematic review characteristics Review characteristics Median (range) year of publication Reviews on model development and validation (N=7) 2022 (2019 – 2023) Reviews on accuracy or clinical effectiveness (N=26) 2017 (2006 – 2024) All included reviews (N=32) 2019 (2006 – 2024) Eligibility criteria Participants Adults only Any age No age restriction reported 2 (29) A 15 (58) B 16 (50) 0 (0) 2 (8) 2 (6) 5 (71) 9 (35) 14 (44) 0 (0) 6 (23) 6 (19) 7 (100) 20 (77) 26 (81) A,B Presence of PI at baseline No PIs at baseline NS Setting Any healthcare setting 0 (0) 2 (8) 2 (6) 3 (43) 3 (12) 5 (16) Acute care (incl. surgical and ICU) 0 (0) 8 (31) 8 (25) Hospital or acute care 0 (0) 2 (8) 2 (6) Long-term care 0 (0) 2 (8) 2 (6) Hospital Long-term, acute or community settings 0 (0) 1 (4) 1 (3) 4 (57) 8 (31) 12 (38) Any prediction tool or scale 0 (0) 9 (35) 9 (28) Specified clinical scale(s) 0 (0) 12 (46) 12 (38) 6 (86) 2 (8) 7 (22) NS Risk assessment tools ML-based prediction models ML or statistical models 1 (14) 0 (0) 1 (3) PI prevention strategies 0 (0) 1 (4) 1 (3) NS 0 (0) 2 (8) 2 (6) 1 (3) PI classification system Any 0 (0) 1 (4) Accepted standard classifications 0 (0) 2 (8) 2 (6) Several specified classification systems 0 (0) 3 (12) 3 (9) 0 (0) 1 (4) 1 (3) 7 (100) 19 (73) 25 (78) 0 (0) 4.5 (17) Prospective or retrospective 1 (14) 2.5 (10) NS 6 (86) 19 (73) 24 (75) 15 (47) (NPUAP, EPUAP, AHCPR or TDCPS) Other NS Source of data Prospective only C C 4.5 (14) 3.5 (41) C C Study design restrictions Yes 1 (14) 14 (54) No 0 (0) 3 (12) 3 (9) NS 6 (86) 9 (35) 14 (44) 5 (2 – 9) 6 (2 – 14) 5 (2 – 14) 2000-2009 0 (0) 3 (12) 3 (9) 2010-2019 1 (14) 16 (62) 17 (53) 2020-2023 6 (86) 7 (27) 12 (38) Review methods Median (range) no. sourcesD searched Publication restrictions: End date (year) Language English only 5 (71) 10 (38) 15 (47) 2 languages 1 (14) 3 (12) 3 (9) >2 languages 0 (0) 3 (12) 3 (9) No restrictions 0 (0) 4 (15) 4 (13) 17 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . NS 1 (14) 6 (23) PROBAST 4 (57) 1 (4) Quality assessment tool E 7 (23) F 4 (13) QUADAS 0 (0) 2 (8) 2 (6) QUADAS-2 0 (0) 8 (31) 8 (25) 4 (13) JBI tools 1 (14) 3 (12) CASP 0 (0) 2 (8) 2 (6) Cochrane RoB tool 0 (0) 1 (4) 1 (3) F Other 0 (0) 6 (23) 6 (19) None 2 (29) 4 (15) 6 (19) 2 (29) 15 (58) 16 (50) Meta-analysis included Method of meta-analysis (% of reviews incl. meta-analysis) 1 (50) Univariate RE/FE model (depending on G 2 (13) heterogeneity assessment) Univariate RE model 1 (50) 6 (40) Hierarchical model (for DTA studies) 0 (0) 2 (13) Unclear/NS 0 (0) 5 (33) G 3 (19) G 6 (38) G 2 (13) G 5 (31) G Volume of evidence Median (range) no. studies Median (range) no. participants 461 462 463 464 465 466 467 468 469 470 471 472 473 22 (3 – 35) 15 (1 – 70) 17 (1 – 70) 408,504 (6,674 – 7,684 (528 – 408,504) 11,729 (528 – 1,278,148) 3 (1 – 28) 4 (1 – 35) 1,278,148) Median (range) no. tools 21 (3 – 35) Figures are number (%) of reviews, unless otherwise specified. but only restricted by aged ≥14 years; B 60 one review A one review 55 specified restricting to “adult” populations, restricted to aged >60 years; C one review 56 states either prospective or retrospective data eligible for Research Question 1, but prospective only for Research Question 2, hence 0.5 added to each category; D including databases, bibliographies or registries; number within domain not necessarily equal to N (100%); present any PROBAST results; RR 57 , or OR. 58 G F E reviews may fall into multiple categories, therefore total one review 38 reported use of PROBAST in methods, but did not 52 one review conducts univariate meta-analysis for a single estimate, e.g. c-statistic 62 , AUC , AHCPR – Agency for Health Care Policy and Research; CASP – Critical Appraisal Skills Programme; DTA – diagnostic test accuracy; EPUAP – European Pressure Ulcer Advisory Panel; FE – fixed effects; ICU – intensive care unit; JBI – Joanna Briggs Institute; ML – machine learning; NPUAP – National Pressure Ulcer Advisory Panel; NS – not stated; PI – pressure injury; PROBAST – Prediction model Risk of Bias Assessment; QUADAS (2) – Quality Assessment of Diagnostic Accuracy Studies (Version 2); RE – random effects; TDCPS – Torrance Developmental Classification of Pressure Sore. 18 Table 3. Results of reviews reporting model development and validation DEV/ VAL (no. studies) DEV (23) Setting of included studies; data Model Internal validation sources development methods algorithms LR n=18; RF n=13; Split sample n=17; Setting of included studies NS, NS n=6 but the review’s inclusion criteria DT n=5; NN n=5; SVM n=5; Finespecified hospital settings Gray Model n=2; Retrospective n=15; prospective KNN n=2; XGBoost n=2; Adaboost n=1; n=5; BART n=1; EBM both retrospective and n=1; Gaussian prospective n=1; Naïve Bayes n=1; case-control study n=1; GB n=1; GBM n=1; experimental study design n=1 LDA n=1; NB n=1 EHRs n=20; international or national database n=3 Dweekat36 (2023) DEV (34); unclear (1)A HAPI/CAPI n=32; SRPI n=2; detection of PI (effect on length of stay) n=1; nursing home residents n=2 Data sources NS Jiang37 (2021) DEV (9) ICU n=3; operating room n=2; acute care hospital n=1; oncology department n=1; endof-life care n=1; mobility-related disabilities n=1 LR n=20; RF n=18; DT n=12; SVM n=12; MLP n=9; KNN n=4; LDA n=1; other n=19 CV n=10; split sample n=10; split sample and CV n=8; NS n=7 Brief description of study quality Only reported measures of discrimination: Accuracy ranged between 0.52 (ML Walther73) and 0.99 (ML Anderson74); Sensitivity ranged between 0.04 (ML Walther73) and 1 (ML Hu75, ML Anderson74); Only one domain was low RoB Specificity ranged between 0.69 (ML Hyun76, across all included studies, which ML Nakagami77) and 1 (ML Cai78, ML was whether the participants were Walther73); free from the outcome (PIs) at the PPV ranged between 0.01 (ML Nakagami77) start of the study. and 1 (ML Cai78); NPV ranged between 0.08 (ML SPURS79, ML Domains with mostly high-risk Cramer80) and 1 (ML Hu75, ML Anderson74, ML (<50%) or moderate-risk (51-81%) Ladios-Martin81); results related to statistical AUC ranged between 0.50 (ML Cai78) and 1 analysis methods, follow-up time, (ML Hu75, ML Cai78) dealing with confounding factors, and measurement of the exposure. No RoB assessment Results not reported; review focused on methods only RoB assessed using JBI critical appraisal checklist for cohort studies, and only summary results provided. DT n=5; LR n=3; NN Split sample n=4; n=2; SVM n=2; BN NS n=9 n=1; GB n=1; MTS n=1; RF n=1 RoB assessed using PROBAST. Overall RoB high for all predictive models. All models at high RoB in analysis domain. RF n=12; LR n=11; DT n=9; SVM n=8; NN n=5; MTS n=1; RoB assessed using PROBAST. Overall, 16/18 (88.9%) papers were at high RoB, 1 (5.6%) was at EHRs used in all models Pei54 (2023) DEV (17); DEV+VAL (1) DEV ICU n=4; hospitalised patients n=8; hospitalised patients CV n=1; Split sample n=5; split sample and CV 19 Summary of model performance results Only reported measures of discrimination: F-score ranged between 0.377 (ML Su MTS82) and 0.670 (ML Su LR82); G-means ranged between 0.628 (ML Kaewprag BN83) and 0.822 (ML Su MTS82); Sensitivity ranged between 0.478 (ML Kaewprag83) and 0.848 (ML Yang84); Specificity ranged between 0.703 (ML Deng85) and 0.988 (ML Su LR82) Only reported measures of discrimination: Summary AUC 0.9449 It is made available under a CC-BY 4.0 International license . Review author (publication year) Barghouthi55 (2023) medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 474 Internal validation Brief description of study quality methods n=10; NS n=2 DEV+VAL Ribeiro51 (2021) Shi52 (2019) DEV (3) ICU n=1 Retrospective n=1 EHRs n=1 SRPI cardiovascular n=2; SRPI critical care n=1 EHRs used in n=2 models DEV (21); VAL DEV (7) General acute care hospital n=7; long-term care n=5; specific acute care (e.g. ICU) n=4; cardiovascular surgery n=2; trauma and burn centres n=1; rehabilitation units n=1; unclear n=1 unclear RoB and only 1 (5.6%) was at low RoB. 14 (77.8%) studies were at high RoB in the analysis domain. The most common factors contributing to the high risk of bias in the analysis domain included an inadequate number of events per candidate predictor, poor handling of missing data and failure to deal with overfitting. Summary of model performance results Summary sensitivity 0.79 (95% CI: 0.78, 0.80); Ncases = 19,893 Summary specificity 0.87 (95% CI: 0.88, 0.87); Nnon-cases = 388,611 Summary likelihood ratios PLR 10.71 (95% CI: 5.98, 19.19) NLR 0.21 (95% CI: 0.08, 0.50) Pooled odds ratio 52.39 (95% CI: 24.83, 110.55) ANN n=1; RF n=1; XGBoost n=1 Split sample n=2; NS n=1 No RoB assessment Only reported measures of discrimination: Accuracy ranged between 0.79 (ML Alderden 86 ) and 0.82 (ML Chen87). LR n=16; cox regression n=5; ANN n=1; C4.5 ML (DT induction algorithm) n=1; DA n=1; DT n=1; NS n=1 CV n=1; treepruning n=1; split sample n=1; resampling n=2; NS n=16 RoB assessed using PROBAST. C-statisticsC ranged between 0.61 (interRAI PURS88) and 0.90 (TNH-PUPP89); O/E ratiosC ranged between 0.91 (Berlowitz MDS90) and 1.0 (prePURSE study tool91) Retrospective n=11; prospective n=10 VAL Long-term care n=3; specific acute care (e.g. ICU) n=2; general (acute care) hospital n=2 DEV Overall RoB unclear for two models. Overall RoB high for the remaining 19 models. Analysis and outcome domains were mostly at Pooled C-statisticsC high RoB. TNH-PUPP89: 0.86 (95% CI 0.81–0.90), n=2 VAL Fragmment scale92: 0.79 (95% CI 0.77–0.82), Overall RoB unclear for three n=1D validation studies. Overall RoB high Berlowitz 11-item model93: 0.75 (95% CI 0.74– for the remaining four validation 0.76), n=2 studies. Analysis and outcome Berlowitz MDS model90: 0.73 (95% CI 0.72– domains were mostly at high RoB. 0.74), n=2 interRAI PURS88: 0.65 (95% CI 0.60–0.69), n=3 Compton94: 0.81 (95% CI 0.78–0.84), n=2 C Pooled O/E ratios Berlowitz 11-item model93: 0.99 (95% CI 0.95– 1.04), n=2 Retrospective n=4; prospective n=3 20 It is made available under a CC-BY 4.0 International license . Setting of included studies; data Model sources development algorithms NB n=3; KNN n=2; awaiting surgery n=3; cancer MLP n=1; XGBoost patients n=1; end-of-life n=2; BART n=1; inpatients n=1 LASSO n=1; BN Retrospective n=14; prospective n=1; ANN n=1; EN n=1; GBM n=1; n=3 OtherB n=1 EHRs n=12; MIMIC-IV database n=1; CONCERN database n=1 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Review author DEV/ (publication VAL year) (no. studies) Zhou53 (2022) Setting of included studies; data Model sources development algorithms Internal validation Brief description of study quality methods Summary of model performance results Berlowitz MDS90: 0.94 (95% CI 0.88–1.01), n=2 475 476 477 478 Only reported measures of discrimination: SRPI n=3; ICU n=11; hospitalised LR n=15; RF n=10; CV n=12; NS n=10 RoB assessed using PROBAST. F1 score ranged between 0.02 (ML Overall RoB unclear for five DT n=9; SVM n=9; n=6; rehabilitation centre n=1; Nakagami77) and 0.99 (ML Song [2]95); studies. Overall RoB high for 15 ANN n=8; BN n=3; hospice n=1 models. RoB not assessed in two XGBoost n=3; GB AUC ranged between 0.78 (ML Delparte96) and studies due to use of unstructured 0.99 (ML Song [2]95); n=2; AdaBoost n=1; EHR n=18; MIMIC-III database data. CANTRIP n=1; n=4 Sensitivity ranged between 0.08 (ML Cai78) and LSTM n=1; EN n=1; 0.99 (ML Song [2]95); KNN n=1; MTS n=1; Specificity ranged between 0.63 (ML NB n=1 Delparte96) and 1 (ML Cai78) A Appears to be a model validation study but the review only included model development studies. B Other includes: average perception, Bayes point machine, boosted DT, boosted decision forest, decision jungle and locally deep SVM. All reported for one study81. C Values from fixed-effects meta-analyses, pooling development and external validation study estimates together. D One data source but included two C-statistic values (one for model development and one for internal validation) that were subsequently pooled. 479 480 481 482 483 484 485 486 AUC – area under curve; ANN – artificial neural network; BART – Bayesian additive regression tree; BN – Bayesian network; CAPI – community-acquired pressure injury; CANTRIP - reCurrent Additive Network for Temporal RIsk Prediction; CONCERN – Communicating Narrative Concerns Entered; CV – cross-validation; DEV – development; DOR – diagnostic odds ratio; DT – decision tree; EBM – explainable boosting machine; EHRs – electronic health records; EN – elastic net; GB(M) – gradient boosting (machine); HAPI – hospital-acquired pressure injury; ICU – intensive care unit; JBI – Joanna Briggs Institute; KNN – k-nearest neighbours; LASSO – least absolute shrinkage and selection operator; (L)DA – (linear) discriminant analysis; LSTM – long short-term memory; LR – logistic regression; MIMIC – Medical Information Mart for Intensive Care; ML – machine learning; MLP – multilayer perceptron; MTS – Mahalanobis-Taguchi system; N/A – not applicable; NB – naïve Bayes; NN – neural network; NLR – negative likelihood ratio; NS – not stated; O/E – observed vs expected; PI – pressure injury; PLR – positive likelihood ratio; PROBAST – Prediction model Risk of Bias ASsessment Tool; RF – random forest; RoB – risk of bias; SRPI – surgery-related pressure injury; SVM – support vector machine; VAL – validation; XGBoost – extreme gradient boosting DEV (22) It is made available under a CC-BY 4.0 International license . 21 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Review author DEV/ (publication VAL year) (no. studies) medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 487 Declarations 488 489 Ethics approval and consent to participate Not applicable. 490 491 Not applicable. 492 493 All data produced in the present work are contained in the manuscript and supplementary file. 494 495 496 497 498 499 500 The authors of this manuscript have the following competing interests: VV is an employee of Paul Hartmann AG; ES and THB received consultancy fees from Paul Hartmann AG. VV, ES and THB were not involved in data curation, screening, data extraction, analysis of results or writing of the original draft. These roles were conducted independently by authors at the University of Birmingham. All other authors received no personal funding or personal compensation from Paul Hartmann AG and have declared that no competing interests exist. Consent for publication Availability of data and materials Conflicting Interests 501 502 503 504 This work was commissioned and supported by Paul Hartmann AG (Heidenheim, Germany), part of HARTMANN GROUP. The contract with the University of Birmingham was agreed on the legal understanding that the authors had the freedom to publish results regardless of the findings. 505 506 507 508 509 YT, JD, BH and AC are funded by the National Institute for Health and Care Research (NIHR) Birmingham Biomedical Research Centre (BRC). This paper presents independent research supported by the NIHR Birmingham BRC at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. 510 511 512 Author Contributions Conceptualisation: Bethany Hillier, Katie Scandrett, April Coombe, Tina Hernandez-Boussard, Ewout Steyerberg, Yemisi Takwoingi, Vladica Velickovic, Jacqueline Dinnes 513 Data curation: Bethany Hillier, Katie Scandrett, April Coombe, Jacqueline Dinnes 514 Formal analysis: Bethany Hillier, Katie Scandrett, Jacqueline Dinnes 515 Funding acquisition: Yemisi Takwoingi, Vladica Velickovic, Jacqueline Dinnes 516 Investigation: Bethany Hillier, Katie Scandrett, April Coombe, Yemisi Takwoingi, Jacqueline Dinnes 517 518 Methodology: Bethany Hillier, Katie Scandrett, April Coombe, Tina Hernandez-Boussard, Ewout Steyerberg, Yemisi Takwoingi, Vladica Velickovic, Jacqueline Dinnes 519 Project administration: Bethany Hillier, Yemisi Takwoingi, Jacqueline Dinnes 520 Resources: Bethany Hillier, Katie Scandrett 521 Supervision: Yemisi Takwoingi, Jacqueline Dinnes 522 Writing – original draft: Bethany Hillier, Katie Scandrett, April Coombe, Jacqueline Dinnes Funding 22 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 523 524 Writing – review & editing: Bethany Hillier, Katie Scandrett, April Coombe, Tina Hernandez-Boussard, Ewout Steyerberg, Yemisi Takwoingi, Vladica Velickovic, Jacqueline Dinnes 525 526 527 We would like to thank Mrs. Rosie Boodell (University of Birmingham, UK) for her help in acquiring the publications necessary to complete this piece of work. Acknowledgements 23 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . References 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 1. Li Z, Lin F, Thalib L, et al. Global prevalence and incidence of pressure injuries in hospitalised adult patients: A systematic review and meta-analysis. International Journal of Nursing Studies 2020;105:103-546. doi: 10.1016/j.ijnurstu.2020.103546 2. Padula WV, Delarmente BA. The national cost of hospital-acquired pressure injuries in the United States. Int Wound J 2019;16(3):634-40. doi: 10.1111/iwj.13071 [published Online First: 2019/01/28] 3. Theisen S, Drabik A, Stock S. Pressure ulcers in older hospitalised patients and its impact on length of stay: a retrospective observational study. J Clin Nurs 2012;21(3-4):380-7. doi: 10.1111/j.1365-2702.2011.03915.x [published Online First: 2011/12/09] 4. Sullivan N, Schoelles K. Preventing In-Facility Pressure Ulcers as a Patient Safety Strategy. Annals of Internal Medicine 2013;158(5.2):410-16. doi: 10.7326/0003-4819-158-5-201303051-00008 5. Institute for Quality and Efficiency in Health Care (IQWiG). Preventing pressure ulcers. Cologne, Germany 2006 [updated 2018 Nov 15. Available from: https://www.ncbi.nlm.nih.gov/books/NBK326430/?report=classic accessed Feb 2023]. 6. Haesler E. European Pressure Ulcer Advisory Panel, National Pressure Injury Advisory Panel and Pan Pacific Pressure Injury Alliance. Prevention and Treatment of Pressure Ulcers/Injuries: Clinical Practice Guideline. 2019 [Available from: https://internationalguideline.com/2019 accessed Feb 2023]. 7. Walker RM, Gillespie BM, McInnes E, et al. Prevention and treatment of pressure injuries: A metasynthesis of Cochrane Reviews. Journal of Tissue Viability 2020;29(4):227-43. doi: 10.1016/j.jtv.2020.05.004 8. Shi C, Dumville JC, Cullum N, et al. Beds, overlays and mattresses for preventing and treating pressure ulcers: an overview of Cochrane Reviews and network meta-analysis. Cochrane Database Syst Rev 2021;8(8):Cd013761. doi: 10.1002/14651858.CD013761.pub2 [published Online First: 2021/08/16] 9. Russo CA, Steiner C, Spector W. Hospitalizations Related to Pressure Ulcers, 2006. HCUP Statistical Brief: Agency for Healthcare Research and Quality, Rockville, MD. 2008. 10. Braden B, Bergstrom N. A Conceptual Schema for the Study of the Etiology of Pressure Sores. Rehabilitation Nursing 1987;12(1):8-16. doi: 10.1002/j.2048-7940.1987.tb00541.x 11. Bergstrom N, Braden BJ, Laguzza A, et al. The Braden Scale for Predicting Pressure Sore Risk. Nurs Res 1987;36(4):205-10. 12. Norton D. Geriatric nursing problems. Int Nurs Rev 1962;9:39-41. 13. Waterlow J. Pressure sores: a risk assessment card. Nursing Times 1985;81:49-55. 14. Steyerberg EW, Harrell FE, Jr. Prediction models need appropriate internal, internal-external, and external validation. J Clin Epidemiol 2016;69:245-7. doi: 10.1016/j.jclinepi.2015.04.005 [published Online First: 2015/04/18] 15. Steyerberg EW, Moons KGM, van der Windt DA, et al. Prognosis Research Strategy (PROGRESS) 3: Prognostic Model Research. PLOS Medicine 2013;10(2):e1001381. doi: 10.1371/journal.pmed.1001381 16. Siontis GCM, Tzoulaki I, Castaldi PJ, et al. External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. Journal of Clinical Epidemiology 2015;68(1):25-34. doi: 10.1016/j.jclinepi.2014.09.007 17. Bouwmeester W, Zuithoff NPA, Mallett S, et al. Reporting and Methods in Clinical Prediction Research: A Systematic Review. PLOS Medicine 2012;9(5):e1001221. doi: 10.1371/journal.pmed.1001221 18. Van Calster B, Steyerberg EW, Wynants L, et al. There is no such thing as a validated prediction model. BMC Medicine 2023;21(1):70. doi: 10.1186/s12916-023-02779-w 24 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 19. Wynants L, Calster BV, Collins GS, et al. Prediction models for diagnosis and prognosis of covid19: systematic review and critical appraisal. BMJ 2020;369:m1328. doi: 10.1136/bmj.m1328 20. Ma J, Dhiman P, Qi C, et al. Poor handling of continuous predictors in clinical prediction models using logistic regression: a systematic review. J Clin Epidemiol 2023;161:140-51. doi: 10.1016/j.jclinepi.2023.07.017 [published Online First: 2023/08/02] 21. Dhiman P, Ma J, Qi C, et al. Sample size requirements are not being considered in studies developing prediction models for binary outcomes: a systematic review. BMC Medical Research Methodology 2023;23(1):188. doi: 10.1186/s12874-023-02008-1 22. Moriarty AS, Meader N, Snell KIE, et al. Predicting relapse or recurrence of depression: systematic review of prognostic models. Br J Psychiatry 2022;221(2):448-58. doi: 10.1192/bjp.2021.218 23. Andaur Navarro CL, Damen JAA, Takada T, et al. Risk of bias in studies on prediction models developed using supervised machine learning techniques: systematic review. BMJ 2021;375:n2281. doi: 10.1136/bmj.n2281 24. Christodoulou E, Ma J, Collins GS, et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J Clin Epidemiol 2019;110:12-22. doi: 10.1016/j.jclinepi.2019.02.004 [published Online First: 20190211] 25. Debray TPA, Damen JAAG, Snell KIE, et al. A guide to systematic review and meta-analysis of prediction model performance. BMJ 2017;356:i6460. doi: 10.1136/bmj.i6460 26. Riley RD, van der Windt D, Croft P, et al. Prognosis research in healthcare: concepts, methods, and impact: Oxford University Press 2019. 27. Snell KIE, Levis B, Damen JAA, et al. Transparent reporting of multivariable prediction models for individual prognosis or diagnosis: checklist for systematic reviews and meta-analyses (TRIPOD-SRMA). BMJ 2023;381:e073538. doi: 10.1136/bmj-2022-073538 28. Wolff RF, Moons KGM, Riley RD, et al. PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies. Annals of Internal Medicine 2019;170(1):51-58. doi: 10.7326/M18-1376 29. Chen HL, Shen WQ, Liu P. A Meta-analysis to Evaluate the Predictive Validity of the Braden Scale for Pressure Ulcer Risk Assessment in Long-term Care. Ostomy/wound management 2016;62(9):20-8. 30. Baris N, Karabacak BG, Alpar SE. The Use of the Braden Scale in Assessing Pressure Ulcers in Turkey: A Systematic Review. Advances in skin & wound care 2015;28:349-57. doi: 10.1097/01.ASW.0000465299.99194.e6 31. He W, Liu P, Chen HL. The Braden Scale cannot be used alone for assessing pressure ulcer risk in surgical patients: a meta-analysis. Ostomy/wound management 2012;58:34-40. 32. Huang C, Ma Y, Wang C, et al. Predictive validity of the braden scale for pressure injury risk assessment in adults: A systematic review and meta-analysis. Nursing open 2021;8:2194-207. doi: 10.1002/nop2.792 33. Park SH, Choi YK, Kang CB. Predictive validity of the Braden Scale for pressure ulcer risk in hospitalized patients. Journal of Tissue Viability 2015;24:102-13. doi: 10.1016/j.jtv.2015.05.001 34. Wei M, Wu L, Chen Y, et al. Predictive Validity of the Braden Scale for Pressure Ulcer Risk in Critical Care: A Meta-Analysis. Nursing in critical care 2020;25:165-70. doi: 10.1111/nicc.12500 35. Wilchesky M, Lungu O. Predictive and concurrent validity of the Braden scale in long-term care: A meta-analysis. Wound Repair and Regeneration 2015;23:44-56. doi: 10.1111/wrr.12261 36. Dweekat OY, Lam SS, McGrath L. Machine Learning Techniques, Applications, and Potential Future Opportunities in Pressure Injuries (Bedsores) Management: A Systematic Review. International journal of environmental research and public health 2023;20(1) doi: 10.3390/ijerph20010796 37. Jiang M, Ma Y, Guo S, et al. Using Machine Learning Technologies in Pressure Injury Management: Systematic Review. JMIR Medical Informatics 2021;9(3):e25704. doi: 10.2196/25704 25 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 38. Qu C, Luo W, Zeng Z, et al. The predictive effect of different machine learning algorithms for pressure injuries in hospitalized patients: A network meta-analyses. Heliyon 2022;8(11):e11361. doi: 10.1016/j.heliyon.2022.e11361 39. Hillier B, Scandrett K, Coombe A, et al. Accuracy and clinical effectiveness of risk prediction tools for pressure injury occurrence: An umbrella review (pre-print). MedRxiv 2024 doi: 10.1101/2024.05.07.24307001 40. Pollock M, Fernandes RM BL, Pieper D, Hartling L,. Chapter V: Overviews of Reviews. In: Higgins JPT TJ, Chandler J, Cumpston M, Li T, Page MJ, Welch VA ed. Cochrane Handbook for Systematic Reviews of Interventions version 63 (updated February 2022). Available from www.training.cochrane.org/handbook: Cochrane 2022. 41. Moher D, Liberati A, Tetzlaff J, et al. Preferred Reporting Items for Systematic Reviews and MetaAnalyses: The PRISMA Statement. PLOS Medicine 2009;6(7):e1000097. doi: 10.1371/journal.pmed.1000097 42. Ingui BJ, Rogers MA. Searching for clinical prediction rules in MEDLINE. J Am Med Inform Assoc 2001;8(4):391-7. doi: 10.1136/jamia.2001.0080391 [published Online First: 2001/06/22] 43. Wilczynski NL, Haynes RB. Optimal Search Strategies for Detecting Clinically Sound Prognostic Studies in EMBASE: An Analytic Survey. Journal of the American Medical Informatics Association 2005;12(4):481-85. doi: 10.1197/jamia.M1752 44. Geersing G-J, Bouwmeester W, Zuithoff P, et al. Search Filters for Finding Prognostic and Diagnostic Prediction Studies in Medline to Enhance Systematic Reviews. PLOS ONE 2012;7(2):e32844. doi: 10.1371/journal.pone.0032844 45. NHS. Pressure ulcers: revised definition and measurement. Summary and recommendations 2018 [Available from: https://www.england.nhs.uk/wp-content/uploads/2021/09/NSTPPsummary-recommendations.pdf accessed Feb 2023]. 46. AHCPR. Pressure ulcer treatment. : Agency for Health Care Policy and Research 1994:1-25. 47. Harker J. Pressure ulcer classification: the Torrance system. Journal of Wound Care 2000;9(6):27577. doi: 10.12968/jowc.2000.9.6.26233 48. Moons KGM, de Groot JAH, Bouwmeester W, et al. Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies: The CHARMS Checklist. PLOS Medicine 2014;11(10):e1001744. doi: 10.1371/journal.pmed.1001744 49. Cochrane. DE form example prognostic models - scoping review: The Cochrane Collaboration: The Prognosis Methods Group; [Available from: https://methods.cochrane.org/prognosis/tools accessed Feb 2023]. 50. Shea BJ, Reeves BC, Wells G, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ 2017;358:j4008. doi: 10.1136/bmj.j4008 51. Ribeiro F, Fidalgo F, Silva A, et al. Literature review of machine-learning algorithms for pressure ulcer prevention: Challenges and opportunities: MDPI 2021. 52. Shi C, Dumville JC, Cullum N. Evaluating the development and validation of empirically-derived prognostic models for pressure ulcer risk assessment: A systematic review. International journal of nursing studies 2019;89:88-103. doi: 10.1016/j.ijnurstu.2018.08.005 53. Zhou Y, Yang X, Ma S, et al. A systematic review of predictive models for hospital-acquired pressure injury using machine learning. Nursing open 2022;30 doi: 10.1002/nop2.1429 54. Pei J, Guo X, Tao H, et al. Machine learning-based prediction models for pressure injury: A systematic review and meta-analysis. Int Wound J 2023 doi: 10.1111/iwj.14280 [published Online First: 2023/06/20] 55. Barghouthi EaD, Owda AY, Asia M, et al. Systematic Review for Risks of Pressure Injury and Prediction Models Using Machine Learning Algorithms. Diagnostics (Basel, Switzerland) 2023;13(17) doi: 10.3390/diagnostics13172739 56. Chou R, Dana T, Bougatsos C, et al. Pressure ulcer risk assessment and prevention: a systematic comparative effectiveness review. Annals of internal medicine 2013;159(1):28-38. 26 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 57. García-Fernández FP, Pancorbo-Hidalgo PL, Agreda JJS. Predictive Capacity of Risk Assessment Scales and Clinical Judgment for Pressure Ulcers: A Meta-analysis. Journal of Wound Ostomy & Continence Nursing 2014;41(1):24-34. doi: 10.1097/01.WON.0000438014.90734.a2 58. Pancorbo-Hidalgo PL, Garcia-Fernandez FP, Lopez-Medina IM, et al. Risk assessment scales for pressure ulcer prevention: a systematic review. J Adv Nurs 2006;54(1):94-110. doi: 10.1111/j.1365-2648.2006.03794.x 59. Park SH, Lee HS. Assessing Predictive Validity of Pressure Ulcer Risk Scales- A Systematic Review and Meta-Analysis. Iranian journal of public health 2016;45(2):122-33. 60. Park SH, Lee YS, Kwon YM. Predictive Validity of Pressure Ulcer Risk Assessment Tools for Elderly: A Meta-Analysis. Western journal of nursing research 2016;38:459-83. doi: 10.1177/0193945915602259 61. Tayyib NAH, Coyer F, Lewis P. Pressure ulcers in the adult intensive care unit: a literature review of patient risk factors and risk assessment scales. Journal of Nursing Education and Practice 2013;3(11):28-42. 62. Wang N, Lv L, Yan F, et al. Biomarkers for the early detection of pressure injury: A systematic review and meta-analysis. Journal of Tissue Viability 2022;31:259-67. doi: 10.1016/j.jtv.2022.02.005 63. Zhang Y, Zhuang Y, Shen J, et al. Value of pressure injury assessment scales for patients in the intensive care unit: Systematic review and diagnostic test accuracy meta-analysis. Intensive & critical care nursing 2021;64:103009. doi: 10.1016/j.iccn.2020.103009 64. Zimmermann GS, Cremasco MF, Zanei SSV, et al. Pressure injury risk prediction in critical care patients: an integrative review. Texto & Contexto-Enfermagem 2018;27(3) 65. Chen X, Diao D, Ye L. Predictive validity of the Jackson–Cubbin scale for pressure ulcers in intensive care unit patients: A meta‐analysis. Nursing in Critical Care 2023;28(3):370-78. doi: 10.1111/nicc.12818 66. Mehicic A, Burston A, Fulbrook P. Psychometric properties of the Braden scale to assess pressure injury risk in intensive care: A systematic review. Intensive & critical care nursing 2024;83:103686. doi: 10.1016/j.iccn.2024.103686 67. Gaspar S, Peralta M, Marques A, et al. Effectiveness on hospital-acquired pressure ulcers prevention: a systematic review. International Wound Journal 2019;16(5):1087-102. doi: 10.1111/iwj.13147 68. Ontario HQ. Pressure ulcer prevention: an evidence-based analysis. Ontario health technology assessment series 2009;9(2):1-104. 69. Kottner J, Dassen T, Tannen A. Inter- and intrarater reliability of the Waterlow pressure sore risk scale: A systematic review. International Journal of Nursing Studies 2009;46:369-79. doi: 10.1016/j.ijnurstu.2008.09.010 70. Lovegrove J, Ven S, Miles SJ, et al. Comparison of pressure injury risk assessment outcomes using a structured assessment tool versus clinical judgement: A systematic review. Journal of Clinical Nursing 2021 doi: 10.1111/jocn.16154 [published Online First: 2021/12/01] 71. Lovegrove J, Miles S, Fulbrook P. The relationship between pressure ulcer risk assessment and preventative interventions: a systematic review. Journal of wound care 2018;27(12):862-75. 72. Moore ZEH, Patton D. Risk assessment tools for the prevention of pressure ulcers. Cochrane Database of Systematic Reviews 2019 doi: 10.1002/14651858.CD006471.pub4 73. Walther F, Heinrich L, Schmitt J, et al. Prediction of inpatient pressure ulcers based on routine healthcare data using machine learning methodology. Scientific Reports 2022;12(1):5044. 74. Anderson C, Bekele Z, Qiu Y, et al. Modeling and prediction of pressure injury in hospitalized patients using artificial intelligence. BMC Med Inform Decis Mak 2021;21(1):253. doi: 10.1186/s12911-021-01608-5 [published Online First: 20210830] 75. Hu YH, Lee YL, Kang MF, et al. Constructing Inpatient Pressure Injury Prediction Models Using Machine Learning Techniques. Cin-Computers Informatics Nursing 2020;38(8):415-23. doi: 10.1097/cin.0000000000000604 27 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 76. Hyun S, Moffatt-Bruce S, Cooper C, et al. Prediction Model for Hospital-Acquired Pressure Ulcer Development: Retrospective Cohort Study. Jmir Medical Informatics 2019;7(3) doi: 10.2196/13785 77. Nakagami G, Yokota S, Kitamura A, et al. Supervised machine learning-based prediction for inhospital pressure injury development using electronic health records: A retrospective observational cohort study in a university hospital in Japan. International Journal of Nursing Studies 2021;119 doi: 10.1016/j.ijnurstu.2021.103932 78. Cai JY, Zha ML, Song YP, et al. Predicting the Development of Surgery-Related Pressure Injury Using a Machine Learning Algorithm Model. Journal of Nursing Research 2021;29(1) doi: 10.1097/jnr.0000000000000411 79. Aloweni F, Ang SY, Fook-Chong S, et al. A prediction tool for hospital-acquired pressure ulcers among surgical patients: Surgical pressure ulcer risk score. Int Wound J 2019;16(1):164-75. doi: 10.1111/iwj.13007 [published Online First: 2018/10/05] 80. Cramer EM, Seneviratne MG, Sharifi H, et al. Predicting the Incidence of Pressure Ulcers in the Intensive Care Unit Using Machine Learning. EGEMS (Wash DC) 2019;7(1):49. doi: 10.5334/egems.307 [published Online First: 20190905] 81. Ladios-Martin M, Fernández-de-Maya J, Ballesta-López FJ, et al. Predictive Modeling of Pressure Injury Risk in Patients Admitted to an Intensive Care Unit. Am J Crit Care 2020;29(4):e70-e80. doi: 10.4037/ajcc2020237 82. Su CT, Wang PC, Chen YC, et al. Data Mining Techniques for Assisting the Diagnosis of Pressure Ulcer Development in Surgical Patients. Journal of Medical Systems 2012;36(4):2387-99. doi: 10.1007/s10916-011-9706-1 83. Kaewprag P, Newton C, Vermillion B, et al. Predictive models for pressure ulcers from intensive care unit electronic health records using Bayesian networks. Bmc Medical Informatics and Decision Making 2017;17 doi: 10.1186/s12911-017-0471-z 84. Yang Q, Wang G, Jiang B, et al. Study on risk prediction model of unavoidable pressure ulcers in cancer patients based on decision tree. Journal of Nursing Science 2019;34(13):4-7. 85. Deng X, Wang Q, Li M, et al. Predicting the risk of hospital-acquired pressure ulcers in intensive care unit patients based on decision tree. Chin J Prac Nurs 2016;32:485-89. 86. Alderden J, Pepper GA, Wilson A, et al. Predicting Pressure Injury in Critical Care Patients: A Machine-Learning Model. Am J Crit Care 2018;27(6):461-68. doi: 10.4037/ajcc2018525 87. Chen HL, Yu SJ, Xu Y, et al. Artificial Neural Network: A Method for Prediction of Surgery-Related Pressure Injury in Cardiovascular Surgical Patients. Journal of Wound Ostomy and Continence Nursing 2018;45(1):26-30. doi: 10.1097/won.0000000000000388 88. Poss J, Murphy KM, Woodbury MG, et al. Development of the interRAI Pressure Ulcer Risk Scale (PURS) for use in long-term care and home care settings. BMC geriatrics 2010;10:67. doi: 10.1186/1471-2318-10-67 89. Page KN, Barker AL, Kamar J. Development and validation of a pressure ulcer risk assessment tool for acute hospital patients. Wound Repair and Regeneration 2011;19(1):31-37. doi: 10.1111/j.1524-475X.2010.00647.x 90. Berlowitz DR, Brandeis GH, Morris JN, et al. Deriving a risk-adjustment model for pressure ulcer development using the Minimum Data Set. Journal of the American Geriatrics Society 2001;49(7):866-71. doi: 10.1046/j.1532-5415.2001.49175.x 91. Schoonhoven L, Grobbee DE, Donders ART, et al. Prediction of pressure ulcer development in hospitalized patients: a tool for risk assessment. Quality & Safety in Health Care 2006;15(1):65-70. doi: 10.1136/qshc.2005.015362 92. Perneger TV, Raë AC, Gaspoz JM, et al. Screening for pressure ulcer risk in an acute care hospital: development of a brief bedside scale. J Clin Epidemiol 2002;55(5):498-504. doi: 10.1016/s0895-4356(01)00514-5 28 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 93. Berlowitz DR, Ash AS, Brandeis GH, et al. Rating long-term care facilities on pressure ulcer development: importance of case-mix adjustment. Annals of Internal Medicine 1996;124(6):557-63. 94. Compton F, Hoffmann F, Hortig T, et al. Pressure ulcer predictors in ICU patients: nursing skin assessment versus objective parameters. J Wound Care 2008;17(10):417-20, 22-4. doi: 10.12968/jowc.2008.17.10.31304 95. Song WY, Kang MJ, Zhang LY, et al. Predicting pressure injury using nursing assessment phenotypes and machine learning methods. Journal of the American Medical Informatics Association 2021;28(4):759-65. doi: 10.1093/jamia/ocaa336 96. Delparte JJ, Flett HM, Scovil CY, et al. Development of the spinal cord injury pressure sore onset risk screening (SCI-PreSORS) instrument: a pressure injury risk decision tree for spinal cord injury rehabilitation. Spinal Cord 2021;59(2):123-31. doi: 10.1038/s41393-020-0510-y 97. Cubbin B, Jackson C. Trial of a pressure area risk calculator for intensive therapy patients. Intensive Care Nursing 1991;7(1):40-44. 98. Jackson C. The revised Jackson/Cubbin Pressure Area Risk Calculator. Intensive Crit Care Nurs 1999;15(3):169-75. doi: 10.1016/s0964-3397(99)80048-2 99. Berlowitz DR, Ash AS, Brandeis GH, et al. Rating long-term care facilities on pressure ulcer development: Importance of case-mix adjustment. Annals of Internal Medicine 1996;124(6):557-63. doi: 10.7326/0003-4819-124-6-199603150-00003 100. Suriadi Sanada H, Sugama J, Thigpen B, et al. Development of a new risk assessment scale for predicting pressure ulcers in an intensive care unit. Nursing in critical care 2008;13(1):34-43. 101. Lowery MT. A pressure sore risk calculator for intensive care patients: 'the Sunderland experience'. Intensive Crit Care Nurs 1995;11(6):344-53. doi: 10.1016/s0964-3397(95)804528 102. Sprigle S, McNair D, Sonenblum S. Pressure Ulcer Risk Factors in Persons with Mobility-Related Disabilities. Adv Skin Wound Care 2020;33(3):146-54. doi: 10.1097/01.ASW.0000653152.36482.7d 103. Do Q, Lipatov K, Ramar K, et al. Pressure Injury Prediction Model Using Advanced Analytics for At-Risk Hospitalized Patients. Journal of patient safety 2022;18(7):e1083-e89. 104. McCreath HE, Bates-Jensen BM, Nakagami G, et al. Use of Munsell color charts to measure skin tone objectively in nursing home residents at risk for pressure ulcer development. Journal of Advanced Nursing 2016;72(9):2077-85. doi: 10.1111/jan.12974 105. Austin PC, Steyerberg EW. Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Stat Methods Med Res 2017;26(2):796-808. doi: 10.1177/0962280214558972 [published Online First: 2014/11/19] 106. Smith GC, Seaman SR, Wood AM, et al. Correcting for optimistic prediction in small data sets. Am J Epidemiol 2014;180(3):318-24. doi: 10.1093/aje/kwu140 [published Online First: 2014/06/24] 107. Riley RD, Collins GS. Stability of clinical prediction models developed using statistical or machine learning methods. Biometrical Journal 2023;65(8):2200302. doi: 10.1002/bimj.202200302 108. Salazar D, Vélez J, Salazar Uribe J. Comparison between SVM and Logistic Regression: Which One is Better to Discriminate? Revista Colombiana de Estadística 2012;35:223-37. 109. Ramspek CL, Jager KJ, Dekker FW, et al. External validation of prognostic models: what, why, how, when and where? Clin Kidney J 2021;14(1):49-58. doi: 10.1093/ckj/sfaa188 [published Online First: 2020/11/24] 110. de Hond AAH, Shah VB, Kant IMJ, et al. Perspectives on validation of clinical predictive algorithms. npj Digital Medicine 2023;6(1):86. doi: 10.1038/s41746-023-00832-9 111. Binuya MAE, Engelhardt EG, Schats W, et al. Methodological guidance for the evaluation and updating of clinical prediction models: a systematic review. BMC Med Res Methodol 2022;22(1):316. doi: 10.1186/s12874-022-01801-8 [published Online First: 2022/12/12] 29 medRxiv preprint doi: https://doi.org/10.1101/2024.05.07.24306999; this version posted November 14, 2024. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license . 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 112. Riley RD, Archer L, Snell KIE, et al. Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ 2024;384:e074820. doi: 10.1136/bmj-2023074820 113. Hingorani AD, Windt DAvd, Riley RD, et al. Prognosis research strategy (PROGRESS) 4: Stratified medicine research. BMJ : British Medical Journal 2013;346:e5793. doi: 10.1136/bmj.e5793 114. Qaseem A, Mir TP, Starkey M, et al. Risk Assessment and Prevention of Pressure Ulcers: A Clinical Practice Guideline From the American College of Physicians. Annals of Internal Medicine 2015;162(5):359-69. doi: 10.7326/m14-1567 115. Bossuyt PM, Reitsma JB, Bruns DE, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ 2015;351:h5527. doi: 10.1136/bmj.h5527 [published Online First: 2015/10/28] 116. Moons KG, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015;162(1):W1-73. doi: 10.7326/m14-0698 117. Hernandez-Boussard T, Bozkurt S, Ioannidis JPA, et al. MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care. Journal of the American Medical Informatics Association 2020;27(12):2011-15. doi: 10.1093/jamia/ocaa088 118. Collins GS, Moons KGM, Dhiman P, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024;385:e078378. doi: 10.1136/bmj-2023-078378 30