Advertisement for orthosearch.org.uk
Results 1 - 20 of 1198
Results per page:

Aims. Classifying trochlear dysplasia (TD) is useful to determine the treatment options for patients suffering from patellofemoral instability (PFI). There is no consensus on which classification system is more reliable and reproducible for the purpose of guiding clinicians’ management of PFI. There are also concerns about the validity of the Dejour Classification (DJC), which is the most widely used classification for TD, having only a fair reliability score. The Oswestry-Bristol Classification (OBC) is a recently proposed system of classification of TD, and the authors report a fair-to-good interobserver agreement and good-to-excellent intraobserver agreement in the assessment of TD. The aim of this study was to compare the reliability and reproducibility of these two classifications. Methods. In all, six assessors (four consultants and two registrars) independently evaluated 100 axial MRIs of the patellofemoral joint (PFJ) for TD and classified them according to OBC and DJC. These assessments were again repeated by all raters after four weeks. The inter- and intraobserver reliability scores were calculated using Cohen’s kappa and Cronbach’s α. Results. Both classifications showed good to excellent interobserver reliability with high α scores. The OBC classification showed a substantial intraobserver agreement (mean kappa 0.628; p < 0.005) whereas the DJC showed a moderate agreement (mean kappa 0.572; p < 0.005). There was no significant difference in the kappa values when comparing the assessments by consultants with those by registrars, in either classification system. Conclusion. This large study from a non-founding institute shows both classification systems to be reliable for classifying TD based on axial MRIs of the PFJ, with the simple-to-use OBC having a higher intraobserver reliability score than that of the DJC. Cite this article: Bone Jt Open 2023;4(7):532–538


Bone & Joint Research
Vol. 12, Issue 5 | Pages 313 - 320
8 May 2023
Saiki Y Kabata T Ojima T Kajino Y Kubo N Tsuchiya H

Aims. We aimed to assess the reliability and validity of OpenPose, a posture estimation algorithm, for measurement of knee range of motion after total knee arthroplasty (TKA), in comparison to radiography and goniometry. Methods. In this prospective observational study, we analyzed 35 primary TKAs (24 patients) for knee osteoarthritis. We measured the knee angles in flexion and extension using OpenPose, radiography, and goniometry. We assessed the test-retest reliability of each method using intraclass correlation coefficient (1,1). We evaluated the ability to estimate other measurement values from the OpenPose value using linear regression analysis. We used intraclass correlation coefficients (2,1) and Bland–Altman analyses to evaluate the agreement and error between radiography and the other measurements. Results. OpenPose had excellent test-retest reliability (intraclass correlation coefficient (1,1) = 1.000). The R. 2. of all regression models indicated large correlations (0.747 to 0.927). In the flexion position, the intraclass correlation coefficients (2,1) of OpenPose indicated excellent agreement (0.953) with radiography. In the extension position, the intraclass correlation coefficients (2,1) indicated good agreement of OpenPose and radiography (0.815) and moderate agreement of goniometry with radiography (0.593). OpenPose had no systematic error in the flexion position, and a 2.3° fixed error in the extension position, compared to radiography. Conclusion. OpenPose is a reliable and valid tool for measuring flexion and extension positions after TKA. It has better accuracy than goniometry, especially in the extension position. Accurate measurement values can be obtained with low error, high reproducibility, and no contact, independent of the examiner’s skills. Cite this article: Bone Joint Res 2023;12(5):313–320


Bone & Joint Research
Vol. 8, Issue 8 | Pages 357 - 366
1 Aug 2019
Zhang B Sun H Zhan Y He Q Zhu Y Wang Y Luo C

Objectives. CT-based three-column classification (TCC) has been widely used in the treatment of tibial plateau fractures (TPFs). In its updated version (updated three-column concept, uTCC), a fracture morphology-based injury mechanism was proposed for effective treatment guidance. In this study, the injury mechanism of TPFs is further explained, and its inter- and intraobserver reliability is evaluated to perfect the uTCC. Methods. The radiological images of 90 consecutive TPF patients were collected. A total of 47 men (52.2%) and 43 women (47.8%) with a mean age of 49.8 years (. sd. 12.4; 17 to 77) were enrolled in our study. Among them, 57 fractures were on the left side (63.3%) and 33 were on the right side (36.7%); no bilateral fracture existed. Four observers were chosen to classify or estimate independently these randomized cases according to the Schatzker classification, TCC, and injury mechanism. With two rounds of evaluation, the kappa values were calculated to estimate the inter- and intrareliability. Results. The overall inter- and intraobserver agreements of the injury mechanism were substantial (κ. inter. = 0.699, κ. intra. = 0.749, respectively). The initial position and the force direction, which are two components of the injury mechanism, had substantial agreement for both inter-reliability or intrareliability. The inter- and intraobserver agreements were lower in high-energy fractures (Schatzker types IV to VI; κ. inter. = 0.605, κ. intra. = 0.721) compared with low-energy fractures (Schatzker types I to III; κ. inter. = 0.81, κ. intra. = 0.832). The inter- and intraobserver agreements were relatively higher in one-column fractures (κ. inter. = 0.759, κ. intra. = 0.801) compared with two-column and three-column fractures. Conclusion. The complete theory of injury mechanism of TPFs was first put forward to make the TCC consummate. It demonstrates substantial inter- and intraobserver agreement generally. Furthermore, the injury mechanism can be promoted clinically. Cite this article: B-B. Zhang, H. Sun, Y. Zhan, Q-F. He, Y. Zhu, Y-K. Wang, C-F. Luo. Reliability and repeatability of tibial plateau fracture assessment with an injury mechanism-based concept. Bone Joint Res 2019;8:357–366. DOI: 10.1302/2046-3758.88.BJR-2018-0331.R1


Bone & Joint Research
Vol. 5, Issue 8 | Pages 347 - 352
1 Aug 2016
Nuttall J Evaniew N Thornley P Griffin A Deheshi B O’Shea T Wunder J Ferguson P Randall RL Turcotte R Schneider P McKay P Bhandari M Ghert M

Objectives. The diagnosis of surgical site infection following endoprosthetic reconstruction for bone tumours is frequently a subjective diagnosis. Large clinical trials use blinded Central Adjudication Committees (CACs) to minimise the variability and bias associated with assessing a clinical outcome. The aim of this study was to determine the level of inter-rater and intra-rater agreement in the diagnosis of surgical site infection in the context of a clinical trial. Materials and Methods. The Prophylactic Antibiotic Regimens in Tumour Surgery (PARITY) trial CAC adjudicated 29 non-PARITY cases of lower extremity endoprosthetic reconstruction. The CAC members classified each case according to the Centers for Disease Control (CDC) criteria for surgical site infection (superficial, deep, or organ space). Combinatorial analysis was used to calculate the smallest CAC panel size required to maximise agreement. A final meeting was held to establish a consensus. Results. Full or near consensus was reached in 20 of the 29 cases. The Fleiss kappa value was calculated as 0.44 (95% confidence interval (CI) 0.35 to 0.53), or moderate agreement. The greatest statistical agreement was observed in the outcome of no infection, 0.61 (95% CI 0.49 to 0.72, substantial agreement). Panelists reached a full consensus in 12 of 29 cases and near consensus in five of 29 cases when CDC criteria were used (superficial, deep or organ space). A stable maximum Fleiss kappa of 0.46 (95% CI 0.50 to 0.35) at CAC sizes greater than three members was obtained. Conclusions. There is substantial agreement among the members of the PARITY CAC regarding the presence or absence of surgical site infection. Agreement on the level of infection, however, is more challenging. Additional clinical information routinely collected by the prospective PARITY trial may improve the discriminatory capacity of the CAC in the parent study for the diagnosis of infection. Cite this article: J. Nuttall, N. Evaniew, P. Thornley, A. Griffin, B. Deheshi, T. O’Shea, J. Wunder, P. Ferguson, R. L. Randall, R. Turcotte, P. Schneider, P. McKay, M. Bhandari, M. Ghert. The inter-rater reliability of the diagnosis of surgical site infection in the context of a clinical trial. Bone Joint Res 2016;5:347–352. DOI: 10.1302/2046-3758.58.BJR-2016-0036.R1


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_6 | Pages 38 - 38
2 May 2024
Buadooh KJ Holmes B Ng A
Full Access

The Revision Hip Complexity Classification (RHCC) was developed by modified Delphi system in 2022 to provide a comprehensive, reproducible framework for the multidisciplinary discussion of complex revision hip surgery. The aim of this study was to assess the validity, intra-relater and inter-relater reliability of the RHCC. Radiographs and clinical vignettes of 20 consecutive patients who had undergone revision of Total Hip Arthroplasty (THA) at our unit during the previous 12-month period were provided to observers. Five observers, comprising 3 revision hip consultants, 1 hip fellow and 1 ST3-8 registrar were familiarised with the RHCC. Each revision THA case was classified on two separate occasions by each observer, with a mean time between assessments of 42.6 days (24–57). Inter-observer reliability was assessed using the Fleiss™ Kappa statistic and percentage agreement. Intra-observer reliability was assessed using the Cohen Kappa statistic. Validity was assessed using percentage agreement and Cohen Kappa comparing observers to the RHCC web-based application result. All observers were blinded to patient notes, operation notes and post-operative radiographs throughout the process. Inter-observer reliability showed fair agreement in both rounds 1 and 2 of the survey (0.296 and 0.353 respectively), with a percentage agreement of 69% and 75%. Inter-observer reliability was highest in H3-type revisions with kappa values of 0.577 and 0.441. Mean intra-observer reliability showed moderate agreement with a kappa value of 0.446 (0.369 to 0.773). Validity percentage agreement was 44% and 39% respectively, with mean kappa values of 0.125 and 0.046 representing only slight agreement. This study demonstrates that classification using the RHCC without utilisation of the web-based application is unsatisfactory, showing low validity and reliability. Reliability was higher for more complex H3-type cases. The use of the RHCC web app is recommended to ensure the accurate and reliable classification of revision THA cases


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_14 | Pages 12 - 12
1 Dec 2022
Maggini E Bertoni G Guizzi A Vittone G Manni F Saccomanno M Milano G
Full Access

Glenoid and humeral head bone defects have long been recognized as major determinants in recurrent shoulder instability as well as main predictors of outcomes after surgical stabilization. However, a universally accepted method to quantify them is not available yet. The purpose of the present study is to describe a new CT method to quantify bipolar bone defects volume on a virtually generated 3D model and to evaluate its reproducibility. A cross-sectional observational study has been conducted. Forty CT scans of both shoulders were randomly selected from a series of exams previously acquired on patients affected by anterior shoulder instability. Inclusion criterion was unilateral anterior shoulder instability with at least one episode of dislocation. Exclusion criteria were: bilateral shoulder instability; posterior or multidirectional instability, previous fractures and/or surgery to both shoulders; congenital or acquired inflammatory, neurological, or degenerative diseases. For all patients, CT exams of both shoulders were acquired at the same time following a standardized imaging protocol. The CT data sets were analysed on a standard desktop PC using the software 3D Slicer. Computer-based reconstruction of the Hill-Sachs and glenoid bone defect were performed through Boolean subtraction of the affected side from the contralateral one, resulting in a virtually generated bone fragment accurately fitting the defect. The volume of the bone fragments was then calculated. All measurements were conducted by two fellowship-trained orthopaedic shoulder surgeons. Each measurement was performed twice by one observer to assess intra-observer reliability. Inter and intra-observer reliability were calculated. Intraclass Correlation Coefficients (ICC) were calculated using a two-way random effect model and evaluation of absolute agreement. Confidence intervals (CI) were calculated at 95% confidence level for reliability coefficients. Reliability values range from 0 (no agreement) to 1 (maximum agreement). The study included 34 males and 6 females. Mean age (+ SD) of patients was 36.7 + 10.10 years (range: 25 – 73 years). A bipolar bone defect was observed in all cases. Reliability of humeral head bone fragment measurements showed excellent intra-observer agreement (ICC: 0.92, CI 95%: 0.85 – 0.96) and very good interobserver agreement (ICC: 0.89, CI 95%: 0.80 – 0.94). Similarly, glenoid bone loss measurement resulted in excellent intra-observer reliability (ICC: 0.92, CI 95%: 0.85 – 0.96) and very good inter-observer agreement (ICC: 0.84, CI 95%:0.72 – 0.91). In conclusion, matching affected and intact contralateral humeral head and glenoid by reconstruction on a computer-based virtual model allows identification of bipolar bone defects and enables quantitative determination of bone loss


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_7 | Pages 62 - 62
4 Apr 2023
Rashid M Islam R Marsden S Trompeter A Teoh K
Full Access

A number of classification systems exist for posterior malleolus fractures of the ankle. The reliability of these classification systems remains unclear. The primary aim of this study was to evaluate the reliability of three commonly utilised fracture classification systems of the posterior malleolus. 60 patients across 2 hospitals sustaining an unstable ankle fracture with a posterior malleolus fragment were identified. All patients underwent radiographs and computed tomography of their injured ankle. 9 surgeons including pre-ST3 level, ST3-8 level, and consultant level applied the Haraguchi, Rammelt, and Mason & Molloy classifications to these patients, at two timepoints, at least 4 weeks apart. The order was randomised between assessments. Inter-rater reliability was assessed using Fleiss’ kappa and 95% confidence intervals (CI). Intra-rater reliability was assessed using Cohen's Kappa and standard error (SE). Inter-rater reliability (Fleiss’ Kappa) was calculated for the Haraguchi classification as 0.522 (95% CI 0.490 – 0.553), for the Rammelt classification as 0.626 (95% CI 0.600 – 0.652), and the Mason & Molloy classification as 0.541 (95% CI 0.514 – 0.569). Intra-rater reliability (Cohen's Kappa) was 0.764 (SE 0.034) for the Haraguchi, 0.763 (SE 0.031) for the Rammelt, 0.688 (SE 0.035) for the Mason & Molloy classification. This study reports the inter-rater and intra-rater reliability for three classification systems for posterior malleolus fractures. Based on definitions by Landis & Koch (1977), inter-rater reliability was rated as ‘moderate’ for the Haraguchi and Mason & Molloy classifications; and ‘substantial’ for the Rammelt classification. Similarly, the intra-rater reliability was rated as ‘substantial’ for all three classifications


Orthopaedic Proceedings
Vol. 104-B, Issue SUPP_13 | Pages 36 - 36
1 Dec 2022
Benavides B Cornell D Schneider P Hildebrand K
Full Access

Heterotopic ossification (HO) is a well-known complication of traumatic elbow injuries. The reported rates of post-traumatic HO formation vary from less than 5% with simple elbow dislocations, to greater than 50% in complex fracture-dislocations. Previous studies have identified fracture-dislocations, delayed surgical intervention, and terrible triad injuries as risk factors for HO formation. There is, however, a paucity of literature regarding the accuracy of diagnosing post-traumatic elbow HO. Therefore, the purpose of our study was to determine the inter-rater reliability of HO diagnosis using standard radiographs of the elbow at 52 weeks post-injury, as well as to report on the rate of mature compared with immature HO. We hypothesized inter-rater reliability would be poor among raters for HO formation. Prospectively collected data from a large clinical trial was reviewed by three independent reviewers (one senior orthopedic resident, one senior radiology resident, and one expert upper extremity orthopedic surgeon). Each reviewer examined anonymized 52-week post-injury radiographs of the elbow and recorded: 1. the presence or absence of HO, 2. the location of HO, 3. the size of the HO (in cm, if present), and 4. the maturity of the HO formation. Maturity was defined by consensus prior to image review and defined as an area of well-defined cortical and medullary bone outside the cortical borders of the humerus, ulna, or radius. Immature lesions were defined as an area of punctate calcification with an ill-defined cloud-like density outside the cortical borders of the humerus, ulna or radius. Data were collected using a standardized online data collection form (CognizantMD, Toronto, ON, CA). Inter-rater reliability was calculated using Fleiss’ Kappa statistic and a multivariate logistic regression analysis was performed to identify risk factors for HO formation in general, as well as mature HO at 52 weeks post injury. Statistical analysis was performed using RStudio (version1.4, RStudio, Boston, MA, USA). A total of 79 radiographs at the 52-week follow-up were reviewed (54% male, mean age 50, age SD 14, 52% operatively treated). Inter-rater reliability using Fleiss’ Kappa was k= 0.571 (p = 0.0004) indicating moderate inter-rater reliability among the three reviewers. The rate of immature HO at 52 weeks was 56%. The multivariate logistic regression analysis identified male sex as a significant risk factor for HO development (OR 5.29, 1.55-20.59 CI, p = 0.011), but not for HO maturity at 52 weeks. Age, time to surgery, and operative intervention were not found to be significant predictors for either HO formation or maturity of the lesion in this cohort. Our study demonstrates moderate inter-rater reliability in determining the presence of HO at 52 weeks post-elbow injury. There was a high rate (56%) of immature HO at 52-week follow-up. We also report the finding of male sex as a significant risk factor for post traumatic HO development. Future research directions could include investigation into possible male predominance for traumatic HO formation, as well as improving inter-rater reliability through developing a standardized and validated classification system for reporting the radiographic features of HO formation around the elbow


Bone & Joint Open
Vol. 3, Issue 11 | Pages 913 - 920
18 Nov 2022
Dean BJF Berridge A Berkowitz Y Little C Sheehan W Riley N Costa M Sellon E

Aims. The evidence demonstrating the superiority of early MRI has led to increased use of MRI in clinical pathways for acute wrist trauma. The aim of this study was to describe the radiological characteristics and the inter-observer reliability of a new MRI based classification system for scaphoid injuries in a consecutive series of patients. Methods. We identified 80 consecutive patients with acute scaphoid injuries at one centre who had presented within four weeks of injury. The radiographs and MRI scans were assessed by four observers, two radiologists, and two hand surgeons, using both pre-existing classifications and a new MRI based classification tool, the Oxford Scaphoid MRI Assessment Rating Tool (OxSMART). The OxSMART was used to categorize scaphoid injuries into three grades: contusion (grade 1); unicortical fracture (grade 2); and complete bicortical fracture (grade 3). Results. In total there were 13 grade 1 injuries, 11 grade 2 injuries, and 56 grade 3 injuries in the 80 consecutive patients. The inter-observer reliability of the OxSMART was substantial (Kappa = 0.711). The inter-observer reliability of detecting an obvious fracture was moderate for radiographs (Kappa = 0.436) and MRI (Kappa = 0.543). Only 52% (29 of 56) of the grade 3 injuries were detected on plain radiographs. There were two complications of delayed union, both of which occurred in patients with grade 3 injuries, who were promptly treated with cast immobilization. There were no complications in the patients with grade 1 and 2 injuries and the majority of these patients were treated with early mobilization as pain allowed. Conclusion. This MRI based classification tool, the OxSMART, is reliable and clinically useful in managing patients with acute scaphoid injuries. Cite this article: Bone Jt Open 2022;3(11):913–920


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_12 | Pages 27 - 27
23 Jun 2023
Chen K Wu J Xu L Han X Chen X
Full Access

To propose a modified approach to measuring femoro-epiphyseal acetabular roof (FEAR) index while still abiding by its definition and biomechanical basis, and to compare the reliabilities of the two methods. To propose a classification for medial sourcil edges. We retrospectively reviewed a consecutive series of patients treated with periacetabular osteotomy and/or hip arthroscopy. A modified FEAR index was defined. Lateral center-edge angle, Sharp's angle, Tonnis angle on all hips, as well as FEAR index with original and modified approaches were measured. Intra- and inter-observer reliability were calculated as intraclass correlation coefficients (ICC) for FEAR index with both approaches and other alignments. A classification was proposed to categorize medial sourcil edges. ICC for the two approaches across different sourcil groups were also calculated. After reviewing 411 patients, 49 were finally included. Thirty-two patients (40 hips) were identified as having borderline dysplasia defined by an LCEA of 18 to 25 degrees. Intra-observer ICC for the modified method were good to excellent for borderline hips; poor to excellent for DDH; moderate to excellent for normal hips. As for inter-observer reliability, modified approach outperformed original approach with moderate to good inter-observer reliability (DDH group, ICC=0.636; borderline dysplasia group, ICC=0.813; normal hip group, ICC=0.704). The medial sourcils were classified to 3 groups upon its morphology. Type II(39.0%) and III(43.9%) sourcils were the dominant patterns. The sourcil classification had substantial intra-observer agreement (observer 4, kappa=0.68; observer 1, kappa=0.799) and moderate inter-observer agreement (kappa=0.465). Modified approach to FEAR index possessed greater inter-observer reliability in all medial sourcil patterns. The modified FEAR index has better intra- and inter-observer reliability compared with the original approach. Type II and III sourcils accounts for the majority to which only the modified approach is applicable


Orthopaedic Proceedings
Vol. 106-B, Issue SUPP_11 | Pages 12 - 12
4 Jun 2024
Chapman J Choudhary Z Gupta S Airey G Mason L
Full Access

Introduction. Treatment pathways of 5. th. metatarsal fractures are commonly directed based on fracture classification, with Jones types for example, requiring closer observation and possibly more aggressive management. Primary objective. To investigate the reliability of assessment of subtypes of 5. th. metatarsal fractures by different observers. Methods. Patients were identified from our prospectively collected database. We included all patient referred to our virtual fracture clinic with a suspected or confirmed 5. th. metatarsal fracture. Plain AP radiographs were reviewed by two observers, who were initially trained on the 5. th. metatarsal classification identification. Zones were defined as Zone 1.1, 1.2, 1.3, 2, 3, diaphyseal shaft (DS), distal metaphysis (DM) and head. An inter-observer reliability analysis using Cohen's Kappa coefficient was carried out, and degree of observer agreement described using Landis & Koch's description. All data was analysed using IBM SPSS v.27. Results. 878 patients were identified. The two observers had moderate agreement when identifying fractures in all zones, apart from metatarsal head fractures, which scored substantial agreement (K=.614). Zones 1.1 (K=.582), 2 (K=.536), 3 (K=.601) and DS (K=.544) all tended towards but did not achieve substantial agreement. Whilst DS fractures achieved moderate agreement, there was an apparent difficulty with distal DS, resulting in a lot of cross over with DM (DS 210 vs 109; DM 76 vs 161). Slight agreement with the next highest adjacent zone was found when injuries were thought to be in zones 1.2, 1.3 and 2 (K=0.17, 0.115 and 0.152 respectively). Conclusions. Reliability of sub-categorising 5. th. metatarsal fractures using standardised instructions conveys moderate to substantial agreement in most cases. If the region of the fracture is going to be used in an algorithm to guide a management plan and clinical follow up during a virtual clinic review, defining fractures of zones 1–3 needs careful consideration


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_14 | Pages 3 - 3
10 Oct 2023
Verma S Malaviya S Barker S
Full Access

Technological advancements in orthopaedic surgery have mainly focused on increasing precision during the operation however, there have been few developments in post-operative physiotherapy. We have developed a computer vision program using machine learning that can virtually measure the range of movement of a joint to track progress after surgery. This data can be used by physiotherapists to change patients’ exercise regimes with more objectively and help patients visualise the progress that they have made. In this study, we tested our program's reliability and validity to find a benchmark for future use on patients. We compared 150 shoulder joint angles, measured using a goniometer, and those calculated by our program called ArmTracking in a group of 10 participants (5 males and 5 females). Reliability was tested using adjusted R squared and validity was tested using 95% limits of agreement. Our clinically acceptable limit of agreement was ± 10° for ArmTracking to be used interchangeably with goniometry. ArmTracking showed excellent overall reliability of 97.1% when all shoulder movements were combined but there were lower scores for some movements like shoulder extension at 75.8%. There was moderate validity shown when all shoulder movements were combined at 9.6° overestimation and 18.3° underestimation. Computer vision programs have a great potential to be used in telerehabilitation to collect useful information as patients carry out prescribed exercises at home. However, they need to be trained well for precise joint detections to reduce the range of errors in readings


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_7 | Pages 8 - 8
4 Apr 2023
Fridberg M Ghaffari A Husum H Rahbek O Kold S
Full Access

There is no consensus on how to evaluate and grade pin site infection. A precise, objective and reliable pin site infectious score is warranted. The literature was reviewed for pin site infection classification systems, The Modified Gordon Score (MGS) grade 0-6 was used. The aim was to test the reliability of The Modified Gordon Infection Score. The observed agreement and inter-rater reliability were investigated between nurse and doctors. MGS was performed in the outpatient clinic at Aalborg University Hospital, Denmark on 1472 pin sites in 119 patients by one nurse and one of three orthopaedic surgeons blinded to each other's judgement. The data was stored in a Red Cap Database for further statistical analysis. The observed agreement between the nurse and the 3 orthopaedic surgeons was evaluated with a one-way random-effect model with interclass correlation with absolute agreement. Furthermore the observed agreement for each of the 3 surgeons with the nurse was calculated. The distribution of MGS infection grade in the 1472 pin sites was: Grade 0; n=1372, Grade 1; n=32, Grade 2; n=39, Grade 3; n=24, Grade 4; n=5, Grade 5; n=0, Grade 6; n=0. The observed agreement between the nurse and the surgeons was calculated as 98%. The ICC estimated between nurse and the surgeons was 0,8943 (ICC >0,85 = reliable). The grading was done by three different doctors with an agreement with the nurse as follows. Rater1 (n=416) =99,5 %, Rater2 (n=1440) =97,4%, Rater3 (n=1440) =96,6%. A limitation to this study is that the dataset represents mostly clean pin sites with MGS 0. Only 100 pin sites had signs of superficial infection MGS 1-4 none above 4. We found that the MGS infection score is highly reliable for low grade infections but we cannot conclude on reliability in severe infections


Bone & Joint Open
Vol. 4, Issue 5 | Pages 363 - 369
22 May 2023
Amen J Perkins O Cadwgan J Cooke SJ Kafchitsas K Kokkinakis M

Aims. Reimers migration percentage (MP) is a key measure to inform decision-making around the management of hip displacement in cerebral palsy (CP). The aim of this study is to assess validity and inter- and intra-rater reliability of a novel method of measuring MP using a smart phone app (HipScreen (HS) app). Methods. A total of 20 pelvis radiographs (40 hips) were used to measure MP by using the HS app. Measurements were performed by five different members of the multidisciplinary team, with varying levels of expertise in MP measurement. The same measurements were repeated two weeks later. A senior orthopaedic surgeon measured the MP on picture archiving and communication system (PACS) as the gold standard and repeated the measurements using HS app. Pearson’s correlation coefficient (r) was used to compare PACS measurements and all HS app measurements and assess validity. Intraclass correlation coefficient (ICC) was used to assess intra- and inter-rater reliability. Results. All HS app measurements (from 5 raters at week 0 and week 2 and PACS rater) showed highly significant correlation with the PACS measurements (p < 0.001). Pearson’s correlation coefficient (r) was constantly over 0.9, suggesting high validity. Correlation of all HS app measures from different raters to each other was significant with r > 0.874 and p < 0.001, which also confirms high validity. Both inter- and intra-rater reliability were excellent with ICC > 0.9. In a 95% confidence interval for repeated measurements, the deviation of each specific measurement was less than 4% MP for single measurer and 5% for different measurers. Conclusion. The HS app provides a valid method to measure hip MP in CP, with excellent inter- and intra-rater reliability across different medical and allied health specialties. This can be used in hip surveillance programmes by interdisciplinary measurers. Cite this article: Bone Jt Open 2023;4(5):363–369


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_4 | Pages 3 - 3
3 Mar 2023
Roy K Joshi P Ali I Shenoy P Syed A Barlow D Malek I Joshi Y
Full Access

Classifying trochlear dysplasia (TD) is useful to determine the treatment options for patients suffering from patellofemoral instability (PFI). There is no consensus on which classification system is more reliable and reproducible for this purpose to guide clinicians in order to treat PFI. There are also concerns about validity of the Dejour classification (DJC), which is the most widely used classification for TD, having only a fair reliability score. The Oswestry-Bristol classification (OBC) is a recently proposed system of classification of TD and the authors report a fair-to-good interobserver agreement and good-to-excellent intra-observer agreement in the assessment of TD. The aim of this study was to compare the reliability and reproducibility of these two classifications. 6 assessors (4 consultants and 2 registrars) independently evaluated 100 magnetic resonance axial images of the patella-femoral joint for TD and classified them according to OBC and DJC. These assessments were again repeated by all raters after 4 weeks. The inter and intra-observer reliability scores were calculated using Cohen's kappa and Cronbach's alpha. Both classifications showed good to excellent interobserver reliability with high alpha scores. The OBC classification showed a substantial intra-observer agreement (mean kappa 0.628)[p<0.005] whereas the DJC showed a moderate agreement (mean kappa 0.572) [p<0.005]. There was no significant difference in the kappa values when comparing the assessments by consultants to those by registrars, in either classification systems. This large study from a non-founding institute shows both classification systems to be reliable for classifying TD based on magnetic resonance axial images of the patella-femoral joint, with the simple to use OBC having a higher intra-observer reliability score compared to the DJC


Bone & Joint Open
Vol. 5, Issue 6 | Pages 524 - 531
24 Jun 2024
Woldeyesus TA Gjertsen J Dalen I Meling T Behzadi M Harboe K Djuv A

Aims. To investigate if preoperative CT improves detection of unstable trochanteric hip fractures. Methods. A single-centre prospective study was conducted. Patients aged 65 years or older with trochanteric hip fractures admitted to Stavanger University Hospital (Stavanger, Norway) were consecutively included from September 2020 to January 2022. Radiographs and CT images of the fractures were obtained, and surgeons made individual assessments of the fractures based on these. The assessment was conducted according to a systematic protocol including three classification systems (AO/Orthopaedic Trauma Association (OTA), Evans Jensen (EVJ), and Nakano) and questions addressing specific fracture patterns. An expert group provided a gold-standard assessment based on the CT images. Sensitivities and specificities of surgeons’ assessments were estimated and compared in regression models with correlations for the same patients. Intra- and inter-rater reliability were presented as Cohen’s kappa and Gwet’s agreement coefficient (AC1). Results. We included 120 fractures in 119 patients. Compared to radiographs, CT increased the sensitivity of detecting unstable trochanteric fractures from 63% to 70% (p = 0.028) and from 70% to 76% (p = 0.004) using AO/OTA and EVJ, respectively. Compared to radiographs alone, CT increased the sensitivity of detecting a large posterolateral trochanter major fragment or a comminuted trochanter major fragment from 63% to 76% (p = 0.002) and from 38% to 55% (p < 0.001), respectively. CT improved intra-rater reliability for stability assessment using EVJ (AC1 0.68 to 0.78; p = 0.049) and for detecting a large posterolateral trochanter major fragment (AC1 0.42 to 0.57; p = 0.031). Conclusion. A preoperative CT of trochanteric fractures increased detection of unstable fractures using the AO/OTA and EVJ classification systems. Compared to radiographs, CT improved intra-rater reliability when assessing fracture stability and detecting large posterolateral trochanter major fragments. Cite this article: Bone Jt Open 2024;5(6):524–531


Bone & Joint Research
Vol. 9, Issue 5 | Pages 242 - 249
1 May 2020
Bali K Smit K Ibrahim M Poitras S Wilkin G Galmiche R Belzile E Beaulé PE

Aims. The aim of the current study was to assess the reliability of the Ottawa classification for symptomatic acetabular dysplasia. Methods. In all, 134 consecutive hips that underwent periacetabular osteotomy were categorized using a validated software (Hip2Norm) into four categories of normal, lateral/global, anterior, or posterior. A total of 74 cases were selected for reliability analysis, and these included 44 dysplastic and 30 normal hips. A group of six blinded fellowship-trained raters, provided with the classification system, looked at these radiographs at two separate timepoints to classify the hips using standard radiological measurements. Thereafter, a consensus meeting was held where a modified flow diagram was devised, before a third reading by four raters using a separate set of 74 radiographs took place. Results. Intrarater results per surgeon between Time 1 and Time 2 showed substantial to almost perfect agreement among the raters (κappa = 0.416 to 0.873). With respect to inter-rater reliability, at Time 1 and Time 2 there was substantial agreement overall between all surgeons (Time 1 κappa = 0.619; Time 2 κappa = 0.623). Posterior and anterior rating categories had moderate and fair agreement at Time 1 (posterior κappa = 0.557; anterior κappa = 0.438) and Time 2 (posterior κappa = 0.506; anterior κappa = 0.250), respectively. At Time 3, overall reliability (κappa = 0.687) and posterior and anterior reliability (posterior κappa = 0.579; anterior κappa = 0.521) improved from Time 1 and Time 2. Conclusion. The Ottawa classification system provides a reliable way to identify three categories of acetabular dysplasia that are well-aligned with surgical management. The term ‘borderline dysplasia’ should no longer be used. Cite this article: Bone Joint Res. 2020;9(5):242–249


Bone & Joint Open
Vol. 1, Issue 7 | Pages 355 - 358
7 Jul 2020
Konrads C Gonser C Ahmad SS

Aims. The Oswestry-Bristol Classification (OBC) was recently described as an MRI-based classification tool for the femoral trochlear. The authors demonstrated better inter- and intraobserver agreement compared to the Dejour classification. As the OBC could potentially provide a very useful MRI-based grading system for trochlear dysplasia, it was the aim to determine the inter- and intraobserver reliability of the classification system from the perspective of the non-founder. Methods. Two orthopaedic surgeons independently assessed 50 MRI scans for trochlear dysplasia and classified each according to the OBC. Both observers repeated the assessments after six weeks. The inter- and intraobserver agreement was determined using Cohen’s kappa statistic and S-statistic nominal and linear weights. Results. The OBC with grading into four different trochlear forms showed excellent inter- and intraobserver agreement with a mean kappa of 0.78. Conclusion. The OBC is a simple MRI-based classification system with high inter- and intraobserver reliability. It could present a useful tool for grading the severity of trochlear dysplasia in daily practice. Cite this article: Bone Joint Open 2020;1-7:355–358


Orthopaedic Proceedings
Vol. 103-B, Issue SUPP_4 | Pages 77 - 77
1 Mar 2021
Ataei A Eggermont F Baars M Linden Y Rooy J Verdonschot N Tanck E
Full Access

Patients with advanced cancer can develop bone metastases in the femur which are often painful and increase the risk of pathological fracture. Accurate segmentation of bone metastases is, amongst others, important to improve patient-specific computer models which calculate fracture risk, and for radiotherapy planning to determine exact radiation fields. Deep learning algorithms have shown to be promising to improve segmentation accuracy for metastatic lesions, but require reliable segmentations as training input. The aim of this study was to investigate the inter- and intra-operator reliability of manual segmentation of femoral metastatic lesions and to define a set of lesions which can serve as a training dataset for deep learning algorithms. F. CT-scans of 60 advanced cancer patients with a femur affected with bone metastases (20 osteolytic, 20 osteoblastic and 20 mixed) were used in this study. Two operators were trained by an experienced radiologist and then segmented the metastatic lesions in all femurs twice with a four-week time interval. 3D and 2D Dice coefficients (DCs) were calculated to quantify the inter- and intra-operator reliability of the segmentations. We defined a DC>0.7 as good reliability, in line with a statistical image segmentation study. Mean first and second inter-operator 3D-DCs were 0.54 (±0.28) and 0.50 (±0.32), respectively. Mean intra-operator I and II 3D-DCs were 0.56 (±0.28) and 0.71 (±0.23), respectively. Larger lesions (>60 cm. 3. ) scored higher DCs in comparison with smaller lesions. This study reveals that manual segmentation of metastatic lesions is challenging and that the current manual segmentation approach resulted in dissatisfying outcomes, particularly for lesions with small volumes. However, segmentation of larger lesions resulted in a good inter- and intra-operator reliability. In addition, we were able to select 521 slices with good segmentation reliability that can be used to create a training dataset for deep learning algorithms. By using deep learning algorithms, we aim for more accurate automated lesion segmentations which might be used in computer modelling and radiotherapy planning


Orthopaedic Proceedings
Vol. 103-B, Issue SUPP_13 | Pages 98 - 98
1 Nov 2021
Fridberg M Rahbek O Husum H Ghaffari A Kold S
Full Access

Introduction and Objective. Digital infra-red thermography may have the capability of identifying local inflammations. Nevertheless, the role of thermography in diagnosing pin site infection has not been explored yet and the reliability and validity of this method for pin site surveillance is in question. The purpose of this study was to explore the capability and intra-rater reliability of thermography in detecting pin site infection. Materials and Methods. This explorative proof of concept study follows GRRAS -guidelines for reporting reliability and agreement studies. After clinical assessment of pin sites by one examiner using Modified Gordon Pin Infection Classification (Grade 0 – 6), thermographic images of the pin sites were captured with a FLIR C3 camera and analyzed by the FLIR tools software package. The maximum skin temperature around the pin site and the maximum temperature for the whole thermographic picture was measured. Intra-rater agreement was established and test-retests were performed with different camera angles. Results. Thirteen (4 females) patients (age 9–72 years) were included. Indications for frames: 4 fracture, 2 deformity correction, 1 lengthening, 6 bone transport. Days from surgery to thermography ranged from 27 to 385 days. Overall, 231 pin sites were included. Eleven pin sites were diagnosed with early signs of infection: five grade 1, five grade 2, one grade 3. Mean pin site temperature was 33.9 °C (29.0–35.4). With 34 °C as cut-off value for infection, sensitivity was 73%, specificity 67%, positive predictive value 10% and negative predictive value 98%. Intra-rater reliability for thermography was ICC 0.85 (0.77–0.92). The temperature measured was influenced by the camera postioning in relation to pin site with a variance of 0.2. Conclusions. Measurements of pin sites using the handheld FLIR C3 infrared camera was a reliable method and the temperature was related to infection grading. This study demonstrates that digital thermography with a handheld camera might be used for monitoring the pin sites after operations to detect early infection, however, future larger prospective studies are necessary