Advertisement for orthosearch.org.uk
Results 1 - 20 of 307
Results per page:

Aims. Classifying trochlear dysplasia (TD) is useful to determine the treatment options for patients suffering from patellofemoral instability (PFI). There is no consensus on which classification system is more reliable and reproducible for the purpose of guiding clinicians’ management of PFI. There are also concerns about the validity of the Dejour Classification (DJC), which is the most widely used classification for TD, having only a fair reliability score. The Oswestry-Bristol Classification (OBC) is a recently proposed system of classification of TD, and the authors report a fair-to-good interobserver agreement and good-to-excellent intraobserver agreement in the assessment of TD. The aim of this study was to compare the reliability and reproducibility of these two classifications. Methods. In all, six assessors (four consultants and two registrars) independently evaluated 100 axial MRIs of the patellofemoral joint (PFJ) for TD and classified them according to OBC and DJC. These assessments were again repeated by all raters after four weeks. The inter- and intraobserver reliability scores were calculated using Cohen’s kappa and Cronbach’s α. Results. Both classifications showed good to excellent interobserver reliability with high α scores. The OBC classification showed a substantial intraobserver agreement (mean kappa 0.628; p < 0.005) whereas the DJC showed a moderate agreement (mean kappa 0.572; p < 0.005). There was no significant difference in the kappa values when comparing the assessments by consultants with those by registrars, in either classification system. Conclusion. This large study from a non-founding institute shows both classification systems to be reliable for classifying TD based on axial MRIs of the PFJ, with the simple-to-use OBC having a higher intraobserver reliability score than that of the DJC. Cite this article: Bone Jt Open 2023;4(7):532–538


The Bone & Joint Journal
Vol. 100-B, Issue 5 | Pages 596 - 602
1 May 2018
Bock P Pittermann M Chraim M Rois S

Aims. Various radiological parameters are used to evaluate a flatfoot deformity and their measurements may differ. The aims of this study were to answer the following questions: 1) Which of the 11 parameters have the best inter- and intraobserver reliability in a standardized radiological setting? 2) Are pre- and postoperative assessments equally reliable? 3) What are the identifiable sources of variation?. Patients and Methods. Measurements of the 11 parameters were recorded on anteroposterior and lateral weight-bearing radiographs of 38 feet before and after surgery for flatfoot, by three observers with different experience in foot surgery (A, ten years; B, three years; C, third-year orthopaedic resident). The inter- and intraobserver reliability was calculated. Results. Preoperative interobserver reliability was high for four, moderate for five, and low for two parameters. Postoperative interobserver reliability was high for four, moderate for five, and low for two parameters. Intraobserver reliability was excellent for all parameters preoperatively as recorded by observer A (PB) and B (MP), and for eight parameters as recorded by observer C (SR). Intraobserver reliability was excellent for ten parameters postoperatively as recorded by observer A and B, and for eight parameters as recorded by observer C. Conclusion. The following parameters can be recommended. For preoperative and postoperative evaluation of flatfoot: anteroposterior, talonavicular coverage angle; lateral, talometatarsal I angle, calcaneal pitch angle, and cuneiform-medial height (high interobserver reliability); and anteroposterior, talometatarsal II angle; lateral, talocalcaneal angle,tibiocalcaneal angle (moderate interobserver reliability). For more experienced observers, we also recommend the anteroposterior talometatarsal I angle (moderate reliability). The inter- and intraobserver reliability for most parameters were similar pre- and postoperatively. The experience of the observer and the definition and ability to measure the parameters themselves were sources of variation. Cite this article: Bone Joint J 2018;100-B:596–602


The Journal of Bone & Joint Surgery British Volume
Vol. 91-B, Issue 6 | Pages 766 - 771
1 Jun 2009
Brunner A Honigmann P Treumann T Babst R

We evaluated the impact of stereo-visualisation of three-dimensional volume-rendering CT datasets on the inter- and intraobserver reliability assessed by kappa values on the AO/OTA and Neer classifications in the assessment of proximal humeral fractures. Four independent observers classified 40 fractures according to the AO/OTA and Neer classifications using plain radiographs, two-dimensional CT scans and with stereo-visualised three-dimensional volume-rendering reconstructions. Both classification systems showed moderate interobserver reliability with plain radiographs and two-dimensional CT scans. Three-dimensional volume-rendered CT scans improved the interobserver reliability of both systems to good. Intraobserver reliability was moderate for both classifications when assessed by plain radiographs. Stereo visualisation of three-dimensional volume rendering improved intraobserver reliability to good for the AO/OTA method and to excellent for the Neer classification. These data support our opinion that stereo visualisation of three-dimensional volume-rendering datasets is of value when analysing and classifying complex fractures of the proximal humerus


Orthopaedic Proceedings
Vol. 87-B, Issue SUPP_I | Pages 69 - 69
1 Mar 2005
Viehweger E Hélix M Jacquemier M Scavarda D Rohon MA Scorsone-Pagny S
Full Access

Introduction: With the evolution and the complexity of the treatments in cerebral palsy (CP) patients it is essential to assess their outcome using validated tools. Technical analysis offers objective data which may be associated to more subjective functional evaluation and health related quality of life tests. Simplified visual tests were proposed as an alternative to the complex and expensive instrumented three-dimensional gait analysis. The Edinburgh Visual Gait Score (EVGS) was proposed for routine clinical use when complete technical analysis is not available or may represent a part of a global patient evaluation. The purposes of our study were: 1) to apply a French translation of the EVGS to standard video recordings of a group of independent walking spastic diplegic CP patients 2) to evaluate the intraobserver and interobserver reliability and 3) to compare the results of gait analysis with experienced and inexperienced observers. Material & methods: A series of ten standard video recordings of spastic diplegic CP patients, acquired during routine clinical gait analysis were examined by eight observers, two times, with two weeks in between the assessments. Observers were selected from following specialties: three paediatric orthopaedic surgeons, one resident in orthopaedic surgery, one neurosurgeon, one physiatrist and two physiotherapists. Observers were separated into two groups according to their experience with gait analysis interpretations. Kappa statistics and intraclass correlation coefficient were calculated. Results: Better intraobserver and interobserver reliability was observed for foot and knee scores with significant difference between stance and swing phase results. Pelvis, hip and trunk score results were significantly lower. The interobserver reliability for segment scores and the global EVGS showed better results than the intraobserver reliability. The gait analysis experienced observer group showed significantly higher intraobserver and interobserver reliability. Discussion & conclusion: Our reliability results about the use of the EVGS are close to the results of Read et al. Interestingly we showed a significant difference between the two observer groups. Observers familiar with gait analysis obtained better reliability results. That shows the importance to either be used to clinical gait analysis interpretation including learning the visualisation of the different gait phases, or to benefit of a video analysis training before using the visual score as a standard clinical evaluation tool. For this study we did not use the patient preparation recommendations of the initial authors to improve accuracy of scoring because the possibility to use historic standard videos wanted to be tested. Poor score reliability of the pelvis and hip may be improved. Further studies of multilevel surgery outcome evaluation by visual analysis trained observers are needed to explore clinical changes in CP patients over time


Orthopaedic Proceedings
Vol. 88-B, Issue SUPP_II | Pages 314 - 314
1 May 2006
Elkinson I Crawford H Barnes M Boxch P Ferguson J
Full Access

The aim was to evaluate the Intraobserver and Interobserver reliability of Pelvic Incidence as a fundamental parameter of sagittal spino-pelvic balance in patients with spondylolisthesis compared to controls with Idiopathic Adolescent Scoliosis. A blinded test retest study including multi-surgeon assessment of Pelvic Incidence in patients with spondylolisthesis and Idiopathic Adolescent Scoliosis was carried out. We assessed the agreement between the pelvic incidence measurements using the Bland and Altman method and mean differences (95% confidence interval) are reported. Forty patients seen at Starship Children’s Hospital between 1992 – 2003 by two spinal surgeons were retrospectively identified. The main group had 20 patients with spondylolisthesis (Isthmic and/or Dysplastic types) and the control group consisted of 20 patients with Idiopathic Adolescent Scoliosis. Five observers with different levels of experience included the two orthopaedic surgeons, one fellow, one senior trainee and one non-trainee registrar. Prior to the initial test phase, a consensus-building session was carried out. All five observers arrived at a standardised method for measuring the Pelvic Incidence. In the test phase randomly ordered lateral lumbosacral radiographs were independently evaluated by the five observers and pelvic incidence was measured. Assessment of the Pelvic Incidence was repeated one week later in the re-test phase. The radiographs were presented in a randomly pre-assigned order. Bland and Altman plots were constructed and mean differences (95% confidence interval) reported to evaluate the agreement between the Pelvic Incidence measurements among the five independent observers. All analysis was performed on the statistical software package SAS. P-value of 0.05 was considered statistically significant. The spondylolisthesis group had 11 (55%) males and 9 (45%) females with an average age of 14 ± 4.2. 2 patients had high-grade (Meyerding Class III, IV, V) and 16 had low-grade (Meyerding Class I, II) spondylolisthesis. 2 patients were post-reduction of spondylolisthesis. In the Scoliosis group there were 2 (10%) males and 18 (90%) females with an average age of 15 ± 2.9. There was no significant difference between male and females pelvic incidence measurement (60° ± 18.7° vs. 57° ± 14.6°, p=0.540) or age (15 ± 2.9 vs. 14 ± 3.8, p=0.181). There was no difference in pelvic incidence across the Meyerding groups, p=0.257. There was a significant difference between spondylolisthesis and scoliosis pelvic incidence measurements 65° ± 15.6° vs. 51° ± 12.8°, p=0.003. In the . Spondylolisthesis Group. the interobserver reliability between five clinicians, expressed as the mean difference in pelvic incidence measurement was 0.6° (95%CI −0.81, 1.91) and was not significantly different from zero p=0.423. The agreement limits were from −12.8° to 13.9°. The intraobserver reliability of pelvic incidence showed the mean difference ranging from −2.1° to 1.4° (p=0.129 and 0.333 with 95% CI). One had marginal evidence of a significant difference of 3.3° (95% CI 0.05° to 6.55°, p=0.047). In the . Scoliosis Group. the interobserver reliability was 0.3° (95% CI −0.81, 1.49) and was not significantly different from zero p=0.726. The agreement limits were from −11.0° to 11.6°. The intraobserver reliability among four observers ranged from −1.7° to 0.5° (p=0.178 and 0.661). One had a significant difference in readings of 4.1° (95% CI of 0.70° to 7.40°, p= 0.020). Scoliosis patients had a significantly smaller pelvic incidence than spondylolisthesis patients. The interobserver reliability of the pelvic incidence measurement was excellent across both groups. The intraobserver reliability was good with only one observer in each group demonstrating a marginally significant difference. Pelvic incidence is therefore a reliable measurement which can be used as a predictor in progression of spondylolisthesis


Orthopaedic Proceedings
Vol. 92-B, Issue SUPP_I | Pages 27 - 27
1 Mar 2010
Cunningham MR Quirno M Bendo J Steiber J
Full Access

Purpose: Facet joint arthrosis is an entity that can have a key role in the etiology of low back pain, especially with hyperextension, and is a key component of surgical planning, especially when considering disc arthroplasty. Plain films and MRI are most commonly utilized as the initial imaging of choice for low back pain, but these methods may not truly allow an accurate assessment of facet arthosis. Our purpose was to observe the inter- and intraobserver reliability of utilizing CT and MRI to evaluate facet arthrosis, the inter- and intraobserver reliability of the facet grading system, and the agreement of surgeons as to when to perform disc arthroplasty after the lumbar facets are evaluated. Method: A power analysis was performed which showed we would need 6 reviewers and 43 images to have 80% power to show excellent reliability. 102 CT and the corresponding MRI images of lumbar facets were obtained from patients who were to undergo lumbar spine surgery of any type. 10 spine surgeons and 3 spine fellows reviewed the randomized images at 2 time points, 3 months apart, graded the facet arthosis as well as indicated whether they would chose to perform a disc arthroplasty based on the amount of facet arthrosis. Both interobserver and intraobserver kappa values were calculated by result comparison between observers at the two time points and between CT and MRI images from the same patient. Results: interobserver reliability for MRI was 0.21 and 0.07(fair to slight agreement), and for CT was 0.33 and 0.27(fair agreement), for the spine surgeons and spine fellows respectively. The mean intraobserver reliability for MRI was 0.36 and 0.26 (fair agreement) and for CT was 0.52 and 0.51 (moderate agreement). The kappa value for agreement of whether to perform a disc arthroplasty after grading the facet arthrosis utilizing MRI was 0.22 (fair agreement) and utilizing CT was 0.33 (fair agreement) among the senior spine surgeons. Conclusion: The existing grading system for facet arthrosis and of whether to perform a disc arthroplasty utilizing the grading system has at best only fair agreement. CT is more reliable for grading facet arthrosis


Orthopaedic Proceedings
Vol. 88-B, Issue SUPP_I | Pages 171 - 171
1 Mar 2006
Sanchez R Salcedo C Martinez M Molina J Vera F Villarreal J
Full Access

Introduction and objectives: The purpose of the research is to show the agreement and reproducibility among 5 observers when they are questioned about 51 open fractures using two open fracture classifications for long bones (Gustilo and Aybar), interpreting the results obtained between both classifications. Material and Method: A classification protocol is established for open fractures. The fractures are graded independently using each of the systems being evaluated (Gustilo and Aybar), by visualising slides with clinical and radiologic images in addition to a report of the data in the clinical history. The survey is conducted twice with a time difference of one to eight weeks. 5 members of the Orthopedic and Traumatologic Surgery Department (OTSD) were questioned (1 Professor, 2 Specialists and 2 Residents). The statistical method used to analyse the results was the interobserver agreement percentage and the inter- and intraobserver kappa index. Results: The interobserver agreement percentage for the Gustilo classification was 58.82% and 39.21% for the Aybar classification. The kappa index for the interobserver agreement for the Gustilo classification was 0.51 and for the Aybar classification was 0.54. The kappa index for the intraobserver reproducibility was 0.69 for the Gustilo classification and 0.58 for the Aybar one. Conclusions: The interobserver agreemnet was considered moderate-poor for the Gustilo and Aybar classifications. The intraobserver reproducibility was considered substantial for the Gustilo classification and moderate for the Aybar one. We conclude that this agreement shows too much variability as to accept just one classification as the only valid method to take therapeutic decisions or for comparing results. Therefore, it’s necessary to create a more detailed and careful classification, which is quick to use, reliable, reproducible and which contains a more objective criteria


Orthopaedic Proceedings
Vol. 94-B, Issue SUPP_XXIV | Pages 16 - 16
1 May 2012
Rajan R Chandrasenan J Metcalfe J Konstantoulakis C
Full Access

The purpose of our study was to independently assess the modified Herring lateral pillar classification. Methods and results. 35 standardised true antero-posterior radiographs of children in various stages of fragmentation were independently assessed by 6 senior observers on 2 separate occasions (6 weeks apart). Kappa analysis was used to assess the inter and intraobserver agreement between observations made. Intraobserver analysis revealed at best only moderate agreement for two observers. 3 observers showed fair consistency, whilst 1 remaining observer showed poor consistency between repeated observations (p<0.01). The highest scores for interobserver agreement varying between moderate to good could only be established between 2 observers. For the remaining observers results were just fair (p<0.01). Conclusion. This stdy highlights the lack of agreement between senior clinicians when applying the modified LPC. This clearly has clinical implications. To our knowledge this is the first time the modified lateral pillar classification has been independently tested for its reproducibility by a specialist orthopaedic unit


The Journal of Bone & Joint Surgery British Volume
Vol. 82-B, Issue 5 | Pages 636 - 642
1 Jul 2000
Wainwright AM Williams JR Carr AJ

We assessed the inter- and intraobserver variation in classification systems for fractures of the distal humerus. Three orthopaedic trauma consultants, three trauma registrars and three consultant musculoskeletal radiologists independently classified 33 sets of radiographs of such fractures on two occasions, each using three separate systems. For interobserver variation, the Riseborough and Radin system produced ‘moderate’ agreement (kappa = 0.513), but half of the fractures were not classifiable by this system. For the complete AO system, agreement was ‘fair’ (kappa = 0.343), but if only AO type and group or AO type alone was used, agreement improved to ‘moderate’ and ‘substantial’, respectively (kappa = 0.52 and 0.66). Agreement for the system of Jupiter and Mehne was ‘fair’ (kappa = 0.295). Similar levels of intraobserver variation were found. Systems of classification are useful in decision-making and evaluation of outcome only if there is agreement and consistency among observers. Our study casts doubt on these aspects of the systems currently available for fractures of the distal humerus


Orthopaedic Proceedings
Vol. 94-B, Issue SUPP_XXXVII | Pages 207 - 207
1 Sep 2012
Chandrasenan J Rajan R Price K
Full Access

The lateral pillar classification (LPC) is a widely used tool in determining prognosis and planning treatment in patients who are in the fragmentation stage of Perthes disease. The original classification has been modified to help increase the accuracy of the classification system by the Herring group. The purpose of our study was to independently assess this modified Herring classification. 35 standardized true antero-posterior radiographs of children in various stages of fragmentation were independently assessed by 6 senior observers on 2 separate occasions (6 weeks apart). Kappa analysis was used to assess the inter and intraobserver agreement between observations made. The degrees of agreement were as follows: poor, fair, moderate, good and very good. Intraobserver analysis revealed at best only moderate agreement for two observers. 3 observers showed fair consistency, whilst 1 remaining observer showed poor consistency between repeated observations (p<0.01). The highest scores for interobserver agreement varying between moderate to good could only be established between 2 observers. For the remaining observers results were just fair (p<0.01). This study highlights the lack of agreement between senior clinicians when applying the modified LPC. This has clinical implications when applying the classification to the decision making process in treating patients at risk of developing adverse outcomes from the disease. To our knowledge, this is the first time the modified LPC has been independently tested for its reproducibility by another specialist paediatric orthopaedic unit


Orthopaedic Proceedings
Vol. 88-B, Issue SUPP_I | Pages 187 - 187
1 Mar 2006
Maguire M Mohil R Ng A Hodgson S
Full Access

The AO, Frykman, Mayo and Fernandez classification system for distal radius fractures were evaluated for interobserver reliability and intraobserver reproducibility using plain radiographs. Five orthopaedic consultants, five orthopaedic registras and five orthopaedic senior house officers classified 20 sets of distal radius fractures on two seperate occasions. There were 2400 induvidual observations. Kappa statistics were used to establish a relative level of agreement between observers for the two readings and between seperate readings by the same observer. Our results for intraobserver reproducibility showed Fernandez Kappa value of 0.49, Frykman 0.47, Mayo 0.45 and AO 0.33. A 0.4 result shows good consistecy accorcing to well reconised staistical boundries and is significant. That is reproducibility happened at a level greater than by chance. Interobserver Kappa values were poor in all classification systems. We also sought to look at varibles within grade of surgeon and developed Kappa values for these also


Introduction: The purpose of this study was to evaluate the impact of volume rendering 3D computed tomography reconstructions on the inter- and intraobserver reliability of the OTA/AO and Neer classifications in the assessment of proximal humerus fractures. Material and Methods: Four observers with different levels of clinical training classified forty proximal humerus fractures according to the OTA/AO and Neer classifications. Three rounds of evaluation were performed and compared. First, fractures were classified on the basis of plain radiographs alone. Then, four weeks later, the combination of plain radiographs and computed tomography scans with conventional 3D SSD reconstructions was evaluated. Finally, four weeks later, the combination of plain radiographs, computed tomography scans, and 3D volume rendering reconstructions was assessed. These readings were repeated in a newly randomized order after an interval of twelve weeks to evaluate intraobserver reliability. Results: Interobserver reliability for the AO/ASIF classification showed good interobserver reliability with plain radiographs (k=0,65) and two-dimensional CT scans with conventional three-dimensional (SSD) reconstructions (k=0,71). Interobserver reliability improved to excellent when the fractures were classified on the basis of 3D volume rendering reconstructions scans (k=0,84). Intraobserver reliability of the OTA/AO classification was good with plain radiographs (k=0,70) and improved to excellent after adding three-dimensional SSD reconstructions (k=0,80) and three-dimensional VR reconstructions (k=0,88). Interobserver reliability of the Neer classification was poor with plain radiographs (k=0,39) and moderate with two-dimensional CT scans and conventional three-dimensional (SSD) reconstructions (k=0,56) and improved to good with the addition of 3D VR scans (k=0,74). Intraobserver reliability for was poor with plain radiographs (k=0,34), good with three-dimensional SSD reconstructions (k=0,61), and excellent with three-dimensional VR reconstructions (k=0,80). Conclusion: In this study, three-dimensional volume rendering computed tomography improved the inter- and intraobserver reliability of the AO/OTA and the Neer classifications in the assessment of proximal humerus fractures. In the opinion of the authors, 3D volume rendering CT-scans are a helpful tool for preoperative planning and classification of fractures of the proximal humerus


The Journal of Bone & Joint Surgery British Volume
Vol. 84-B, Issue 1 | Pages 15 - 18
1 Jan 2002
Whelan DB Bhandari M McKee MD Guyatt GH Kreder HJ Stephen D Schemitsch EH

The reliability of the radiological assessment of the healing of tibial fractures remains undetermined. We examined the inter- and intraobserver agreement of the healing of such fractures among four orthopaedic trauma surgeons who, on two separate occasions eight weeks apart, independently assessed the radiographs of 30 patients with fractures of the tibial shaft which had been treated by intramedullary fixation. The radiographs were selected from a database to represent fractures at various stages of healing. For each radiograph, the surgeon scored the degree of union, quantified the number of cortices bridged by callus or with a visible fracture line, described the extent and quality of the callus, and provided an overall rating of healing. The interobserver chance-corrected agreement using a quadratically weighted kappa (κ) statistic in which values of 0.61 to 0.80 represented substantial agreement were as follows: radiological union scale (κ = 0.60); number of cortices bridged by callus (κ = 0.75); number of cortices with a visible fracture line (κ = 0.70); the extent of the callus (κ = 0.57); and general impression of fracture healing (κ = 0.67). The intraobserver agreement of the overall impression of healing (κ = 0.89) and the number of cortices bridged by callus (κ = 0.82) or with a visible fracture line (κ = 0.83) was almost perfect. There are no validated scales which allow surgeons to grade fracture healing radiologically. Among those examined, the number of cortices bridged by bone appears to be a reliable, and easily measured radiological variable to assess the healing of fractures after intramedullary fixation


Bone & Joint Research
Vol. 9, Issue 5 | Pages 242 - 249
1 May 2020
Bali K Smit K Ibrahim M Poitras S Wilkin G Galmiche R Belzile E Beaulé PE

Aims

The aim of the current study was to assess the reliability of the Ottawa classification for symptomatic acetabular dysplasia.

Methods

In all, 134 consecutive hips that underwent periacetabular osteotomy were categorized using a validated software (Hip2Norm) into four categories of normal, lateral/global, anterior, or posterior. A total of 74 cases were selected for reliability analysis, and these included 44 dysplastic and 30 normal hips. A group of six blinded fellowship-trained raters, provided with the classification system, looked at these radiographs at two separate timepoints to classify the hips using standard radiological measurements. Thereafter, a consensus meeting was held where a modified flow diagram was devised, before a third reading by four raters using a separate set of 74 radiographs took place.


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_12 | Pages 27 - 27
23 Jun 2023
Chen K Wu J Xu L Han X Chen X
Full Access

To propose a modified approach to measuring femoro-epiphyseal acetabular roof (FEAR) index while still abiding by its definition and biomechanical basis, and to compare the reliabilities of the two methods. To propose a classification for medial sourcil edges.

We retrospectively reviewed a consecutive series of patients treated with periacetabular osteotomy and/or hip arthroscopy. A modified FEAR index was defined. Lateral center-edge angle, Sharp's angle, Tonnis angle on all hips, as well as FEAR index with original and modified approaches were measured. Intra- and inter-observer reliability were calculated as intraclass correlation coefficients (ICC) for FEAR index with both approaches and other alignments. A classification was proposed to categorize medial sourcil edges. ICC for the two approaches across different sourcil groups were also calculated.

After reviewing 411 patients, 49 were finally included. Thirty-two patients (40 hips) were identified as having borderline dysplasia defined by an LCEA of 18 to 25 degrees. Intra-observer ICC for the modified method were good to excellent for borderline hips; poor to excellent for DDH; moderate to excellent for normal hips. As for inter-observer reliability, modified approach outperformed original approach with moderate to good inter-observer reliability (DDH group, ICC=0.636; borderline dysplasia group, ICC=0.813; normal hip group, ICC=0.704). The medial sourcils were classified to 3 groups upon its morphology. Type II(39.0%) and III(43.9%) sourcils were the dominant patterns. The sourcil classification had substantial intra-observer agreement (observer 4, kappa=0.68; observer 1, kappa=0.799) and moderate inter-observer agreement (kappa=0.465). Modified approach to FEAR index possessed greater inter-observer reliability in all medial sourcil patterns.

The modified FEAR index has better intra- and inter-observer reliability compared with the original approach. Type II and III sourcils accounts for the majority to which only the modified approach is applicable.


Orthopaedic Proceedings
Vol. 105-B, Issue SUPP_4 | Pages 3 - 3
3 Mar 2023
Roy K Joshi P Ali I Shenoy P Syed A Barlow D Malek I Joshi Y
Full Access

Classifying trochlear dysplasia (TD) is useful to determine the treatment options for patients suffering from patellofemoral instability (PFI). There is no consensus on which classification system is more reliable and reproducible for this purpose to guide clinicians in order to treat PFI. There are also concerns about validity of the Dejour classification (DJC), which is the most widely used classification for TD, having only a fair reliability score.

The Oswestry-Bristol classification (OBC) is a recently proposed system of classification of TD and the authors report a fair-to-good interobserver agreement and good-to-excellent intra-observer agreement in the assessment of TD. The aim of this study was to compare the reliability and reproducibility of these two classifications.

6 assessors (4 consultants and 2 registrars) independently evaluated 100 magnetic resonance axial images of the patella-femoral joint for TD and classified them according to OBC and DJC. These assessments were again repeated by all raters after 4 weeks. The inter and intra-observer reliability scores were calculated using Cohen's kappa and Cronbach's alpha.

Both classifications showed good to excellent interobserver reliability with high alpha scores. The OBC classification showed a substantial intra-observer agreement (mean kappa 0.628)[p<0.005] whereas the DJC showed a moderate agreement (mean kappa 0.572) [p<0.005]. There was no significant difference in the kappa values when comparing the assessments by consultants to those by registrars, in either classification systems.

This large study from a non-founding institute shows both classification systems to be reliable for classifying TD based on magnetic resonance axial images of the patella-femoral joint, with the simple to use OBC having a higher intra-observer reliability score compared to the DJC.


Orthopaedic Proceedings
Vol. 85-B, Issue SUPP_III | Pages 257 - 257
1 Mar 2003
Hell Anna K Ruehmann O Peters G Lazovic D
Full Access

Introduction. In Mid-Europe developmental dysplasia of the hip (DDH) is diagnosed using the sonographic hip screening described by Graf. To learn the necessary standards three courses are mandatory. However, little is known about learning curves and measurement errors of doctors at different levels of training and experience.

Material and Methods. Between 1997 and 2002 participants of the basic, advanced and final hip ultrasonogra-phy course were evaluated by a questionnaire and 34 normal and pathological sonograms. They were asked to measure the alpha and beta angle. “Normal” angles of each hip were created through the mean values of two experienced course organizers.

Results. 186 doctors (40% orthopedic surgeons, 60% pediatricians) were evaluated. The group included 20% interns, 60% residents and 20% consultants. An average time of 6.3 months lay between the basic and the advanced, and of 16.7 months between the advanced and the final course. The evaluation of the sonograms according to Graf showed major inter-observer differences of up to 30°. Participants had more difficulties in evaluating a correct beta angle than an alpha angle. Sonographic pictures of minor quality and pathological hips produced more difficulties than pictures of Graf type I and II hips. In the basic course all measurements showed an average difference of 3,6°, in the advanced course of 3,1° and in the final course of 4,2°. The number of examinations between courses did not correlate with good measurements.

Conclusion. Even participants of all three courses seem to develop major systemic errors if ultrasonography is regularly applied without supervision. Therefore, regular training and supervision should be mandatory in order to guarantee good quality.


Orthopaedic Proceedings
Vol. 88-B, Issue SUPP_III | Pages 436 - 436
1 Oct 2006
Rajan RA Metcalfe J Konstantoulakis C Jones S Sprigg A
Full Access

Introduction: The assessment of bone age using the standard Gruel and Pyle chart based on hand and wrist radiographs is usually carried out by Senior Radiologists. We performed a study to look at both intra and inter observer variability with different grades of clinicians.

Materials and Methods: 30 sets of wrist radiographs were selected at random. The investigators included a Senior Radiographer, a Consultant and Registrar Radiologist an Orthopaedic Consultant and Senior Orthopaedic Fellow.

Discussion: The Radiology team appear to be more consistent in their readings for the assessment of skeletal bone age than the Orthopaedic team. Howevr, it is interesting to note that although the Orthopaedic team are less consistent, when looking at the inter-observer variability, it suggests that both teams are equally well equipped to perform the task.

Conclusion: Our study suggests that we should not cross professional boundaries. Render unto Caeser what is Ceaser’s!


Bone & Joint Research
Vol. 13, Issue 1 | Pages 19 - 27
5 Jan 2024
Baertl S Rupp M Kerschbaum M Morgenstern M Baumann F Pfeifer C Worlicek M Popp D Amanatullah DF Alt V

Aims. This study aimed to evaluate the clinical application of the PJI-TNM classification for periprosthetic joint infection (PJI) by determining intraobserver and interobserver reliability. To facilitate its use in clinical practice, an educational app was subsequently developed and evaluated. Methods. A total of ten orthopaedic surgeons classified 20 cases of PJI based on the PJI-TNM classification. Subsequently, the classification was re-evaluated using the PJI-TNM app. Classification accuracy was calculated separately for each subcategory (reinfection, tissue and implant condition, non-human cells, and morbidity of the patient). Fleiss’ kappa and Cohen’s kappa were calculated for interobserver and intraobserver reliability, respectively. Results. Overall, interobserver and intraobserver agreements were substantial across the 20 classified cases. Analyses for the variable ‘reinfection’ revealed an almost perfect interobserver and intraobserver agreement with a classification accuracy of 94.8%. The category 'tissue and implant conditions' showed moderate interobserver and substantial intraobserver reliability, while the classification accuracy was 70.8%. For 'non-human cells,' accuracy was 81.0% and interobserver agreement was moderate with an almost perfect intraobserver reliability. The classification accuracy of the variable 'morbidity of the patient' reached 73.5% with a moderate interobserver agreement, whereas the intraobserver agreement was substantial. The application of the app yielded comparable results across all subgroups. Conclusion. The PJI-TNM classification system captures the heterogeneity of PJI and can be applied with substantial inter- and intraobserver reliability. The PJI-TNM educational app aims to facilitate application in clinical practice. A major limitation was the correct assessment of the implant situation. To eliminate this, a re-evaluation according to intraoperative findings is strongly recommended. Cite this article: Bone Joint Res 2024;13(1):19–27


The Bone & Joint Journal
Vol. 105-B, Issue 10 | Pages 1123 - 1130
1 Oct 2023
Donnan M Anderson N Hoq M Donnan L

Aims. The aim of this study was to investigate the agreement in interpretation of the quality of the paediatric hip ultrasound examination, the reliability of geometric and morphological assessment, and the relationship between these measurements. Methods. Four investigators evaluated 60 hip ultrasounds and assessed their quality based the standard plane of Graf et al. They measured geometric parameters, described the morphology of the hip, and assigned the Graf grade of dysplasia. They analyzed one self-selected image and one randomly selected image from the ultrasound series, and repeated the process four weeks later. The intra- and interobserver agreement, and correlations between various parameters were analyzed. Results. In the assessment of quality, there a was moderate to substantial intraobserver agreement for each element investigated, but interobserver agreement was poor. Morphological features showed weak to moderate agreement across all parameters but improved to significant when responses were reduced. The geometric measurements showed nearly perfect agreement, and the relationship between them and the morphological features showed a dose response across all parameters with moderate to substantial correlations. There were strong correlations between geometric measurements. The Graf classification showed a fair to moderate interobserver agreement, and moderate to substantial intraobserver agreement. Conclusion. This investigation into the reliability of the interpretation of hip ultrasound scans identified the difficulties in defining what is a high-quality ultrasound. We confirmed that geometric measurements are reliably interpreted and may be useful as a further measurement of quality. Morphological features are generally poorly interpreted, but a simpler binary classification considerably improves agreement. As there is a clear dose response relationship between geometric and morphological measurements, the importance of morphology in the diagnosis of hip dysplasia should be questioned. Cite this article: Bone Joint J 2023;105-B(10):1123–1130