ASA physical status assignment by non-anesthesia providers: Do surgeons consistently downgrade the ASA score preoperatively?

Affiliations.

  • 1 Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, United States.
  • 2 Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, United States. Electronic address: [email protected].
  • 3 New York Methodist Hospital, 506 6th St Brooklyn, NY 11215, United States.
  • PMID: 28372650
  • DOI: 10.1016/j.jclinane.2017.02.002

Objective: The American Society of Anesthesiologists physical status (ASA-PS) is associated with increased morbidity and mortality in the perioperative period. When surgeries are scheduled by surgeons and their staff at our large institution a presumed ASA-PS is chosen. This is because our institution (and, anecdotally, others in our hospital system and elsewhere), recognizing the relationship between higher ASA-PS and poorer postoperative outcomes, requires all patients with higher ASA-PS levels (≥3) to undergo enhanced preoperative workup. The patients may not, however, necessarily be seen in the anesthesia clinic prior to surgery. As a result, patients are assigned a presumed ASA-PS by a non-anesthesia provider (e.g., surgeons and physician extenders) that may not reflect the ASA-PS chosen by the anesthesiologist on the day of surgery. Errors in the accuracy of the ASA-PS prior to surgery lead to unnecessary and costly preoperative testing, delays in operative procedures, and potential case cancellations. Our study aimed to determine whether there are significant differences in the assignment of ASA-PS by non-anesthesia providers when compared to anesthesia providers.

Design: We administered an IRB-approved survey asking the ASA-PS of 20 hypothetical case vignettes to 229 clinicians in various departments. Responses by non-anesthesia providers were compared to the consensus of the department of anesthesiology.

Setting: Faculty office spaces and conferences.

Patients: No patients, physicians only.

Interventions: Survey administration.

Measurements: ASA-PS scores acquired from surveys.

Main results: Residents and faculty in the department of anesthesiology demonstrated no statistical difference in the median ASA score in 19/20 case scenarios. All other departments were statistically different when compared to the department of anesthesiology (p<0.05). The probability of a department either over- or under-rating the ASA-PS was calculated, and is summarized in Fig. 3. All departments, except anesthesiology, had a 30-40% chance of under-rating the ASA-PS of the patients in the clinical vignettes.

Conclusions: Non-anesthesia providers assign ASA-PS with significantly less accuracy than do anesthesia providers, even when adjusted for multiple comparisons. Surgical and procedural departments were found to consistently under-rate the ASA-PS of patients in our clinical vignettes.

Keywords: ASA classification; ASA score; ASA-PS.

Copyright © 2017 Elsevier Inc. All rights reserved.

Publication types

  • Comparative Study
  • Anesthesiologists*
  • Health Status Indicators
  • Perioperative Period
  • Practice Patterns, Physicians'*
  • Preoperative Care / methods*
  • Risk Assessment / methods
  • Surveys and Questionnaires
  • Research article
  • Open access
  • Published: 09 July 2020

Assignment of pre-event ASA physical status classification by pre-hospital physicians: a prospective inter-rater reliability study

  • Kristin Tønsager   ORCID: orcid.org/0000-0002-5289-0442 1 , 2 , 3 ,
  • Marius Rehn 1 , 2 , 4 ,
  • Andreas J. Krüger 1 , 5 ,
  • Jo Røislien 3 , 1 &
  • Kjetil G. Ringdal 6 , 7 , 8  

BMC Anesthesiology volume  20 , Article number:  167 ( 2020 ) Cite this article

1244 Accesses

Metrics details

Individualized treatment is a common principle in hospitals. Treatment decisions are made based on the patient’s condition, including comorbidities. This principle is equally relevant out-of-hospital. Furthermore, comorbidity is an important risk-adjustment factor when evaluating pre-hospital interventions and may aid therapeutic decisions and triage. The American Society of Anesthesiologists Physical Status (ASA-PS) classification system is included in templates for reporting data in physician-staffed pre-hospital emergency medical services (p-EMS) but whether an adequate full pre-event ASA-PS can be assessed by pre-hospital physicians remains unknown. We aimed to explore whether pre-hospital physicians can score an adequate pre-event ASA-PS with the information available on-scene.

The study was an inter-rater reliability study consisting of two steps. Pre-event ASA-PS scores made by pre- and in-hospital physicians were compared. Pre-hospital physicians did not have access to patient records and scores were based on information obtainable on-scene. In-hospital physicians used the complete patient record (Step 1). To assess inter-rater reliability between pre- and in-hospital physicians when given equal amounts of information, pre-hospital physicians also assigned pre-event ASA-PS for 20 of the included patients by using the complete patient records (Step 2). Inter-rater reliability was analyzed using quadratic weighted Cohen’s kappa (κ w ).

For most scores (82%) inter-rater reliability between pre-and in-hospital physicians were moderate to substantial (κ w 0,47-0,89). Inter-rater reliability was higher among the in-hospital physicians (κ w 0,77 to 0.85). When all physicians had access to the same information, κ w increased (κ w 0,65 to 0,93).

Conclusions

Pre-hospital physicians can score an adequate pre-event ASA-PS on-scene for most patients. To further increase inter-rater reliability, we recommend access to the full patient journal on-scene. We recommend application of the full ASA-PS classification system for reporting of comorbidity in p-EMS.

Peer Review reports

Tailored treatment through adapted choice of therapy, medication and monitoring to each patient is a common principle in hospitals [ 1 , 2 , 3 ]. In all parts of critical care, decisions are made based on the patient’s condition, including the patient’s comorbidities [ 1 , 2 , 4 ]. Decisions of dose adjusted medication and volume loading before anesthesia are common examples of individualized adaptions in the operating room [ 4 ]. Pre-hospital critical care is a continuum, and pre-hospital management is often a part of the patient’s course [ 5 , 6 ]. As such, stratification on comorbidity, and individualized treatment, is equally relevant and valid for pre-hospital patients. In line with this principle, the patient’s health status before the acute event should be accounted for in triage on-scene and to determine threshold for, and timing of interventions and physiological targets [ 7 , 8 ].

Risk adjustments allows for better judgement about the effectiveness and quality of alternative therapies [ 1 ]. Comorbidity is an important risk adjustment factor when evaluating pre-hospital interventions [ 9 , 10 ]. In general, there is an agreement that outcome after trauma is influenced by the patient’s physical state before the trauma occurs [ 11 ]. Thus, to include a comorbidity measure is a prerequisite for comparisons and improves the precision of outcome prediction for trauma patients [ 8 , 9 , 12 ]. However, to obtain information on comorbidity from in-hospital records may be challenging for pre-hospital services due to logistics and legal issues of access and other strategies for obtaining this information should be explored.

Several methods for reporting comorbidities in pre-hospital emergency medical services (p-EMS) exists [ 8 , 9 , 13 ]. The American Society of Anesthesiologists Physical Scale (ASA-PS) classification system is used globally by anesthesiologists and classifies the preoperative physical health condition in patients before anesthesia and surgery. ASA-PS was originally designed to allow for statistical analyses of outcomes and to standardize terminology [ 14 , 15 ], not to predict perioperative risk [ 15 ], but research has shown that the ASA-PS correlates well with overall surgical mortality [ 14 ]. Although the reliability of ASA-PS may be discussed, the scale is widely accepted as a tool to decide pre-operative health status [ 16 ]. The use of ASA-PS has expanded to the pre- and in-hospital critical care environment and pre-event ASA-PS, which is ASA-PS before the present injury or illness, [ 17 ] describes the inherent physiological state of a patient before an event. Pre-event ASA-PS is shown to be an independent predictor of mortality after trauma [ 8 ] and is included in templates for reporting of comorbidity in p-EMS and trauma [ 18 , 19 ]. We therefore used pre-event ASA-PS as a comorbidity measure for the present study.

Ideally, pre-hospital services should have access to the full patient journal on-scene. Reality is however different and access to the full patient journal tends to be restricted for most pre-hospital services on-scene. P-EMS services must thus commonly base their decisions on the more limited amount of data and observations obtainable on-scene than for in-hospital physicians. Obtaining the complete medical history from seriously ill or injured patients on-scene is considered unfeasible, and reporting a dichotomized pre-event ASA-PS (pre-event ASA-PS 1 or pre-event ASA-PS > 1) is thus often recommended [ 20 ]. This simplification of the scale provides a very rough measure of comorbidity with low clinical discriminatory capabilities. Whether an adequate full pre-event ASA-PS can be assessed by pre-hospital physicians based only on the limited information generally available on-scene has not been explored and remains unknown. If scores between pre-and in-hospital physicians do not differ more than between in-hospital physicians, then the pre-hospital scores are just as “correct” as the in-hospital scores and can be used accordingly.

The aim of the present study was to explore whether it is possible for pre-hospital physicians to score an adequate pre-event ASA-PS already while on-scene.

Prospective observational inter-rater reliability study. We assessed the degree of agreement among two raters using the ASA-PS scale under different circumstances to decide whether different access to information influenced the scores. All patients admitted by p-EMS to two Norwegian hospitals during a period of three-months (Stavanger University Hospital 19 Aug – 18 Nov 2016 and St. Olav University Hospital 1 Feb – 30 Apr 2017) were included. Following the inclusion periods, in-hospital physicians scored all included patients (Step 1). Data collection for the second part of the study (Step 2) was finished 21 Mar 2018. All Norwegian p-EMS services are staffed with anesthesiologists and respond to all types of emergency conditions, search and rescue missions and inter-hospital transfers.

We used the pre-event ASA-PS to assess comorbidity. The pre-event ASA-PS does not take the present event into account and describes the physiological state of the patient before an event [ 8 , 11 , 21 ]. The ASA-PS provides a global, subjective index of a patient’s overall health status, and pre-existing medical conditions are categorized on a scale of increasing medical severity (ASA-PS 1–5) [ 17 ].

Step 1. Inter-rater reliability study of pre- versus in-hospital scores

Pre-hospital physicians assigned a pre-event ASA-PS score on-scene based on information available out-of-hospital only. The pre-hospital physicians did not have access to the full patient records. If the physician was unable to decide on a pre-event ASA-PS score on-scene, the score was kept unassigned and the main reason declared. After the three-month inclusion period, three in-hospital anesthesiologists at each of the two sites were given access to full patient records for all included patients at each site. Blinded from the pre-event ASA-PS score allocated by p-EMS each in-hospital physician used this information to assign pre-event ASA-PS scores for the included patients. No specific training for ASA-PS scoring was provided.

Step 2. Inter-rater reliability with equal access to data

Because p-EMS generally do not have access to the full patient journal comparing pre-hospital on-scene scores with in-hospital scores is an asymmetric comparison (as in-hospital physicians have access to more information). We thus did not expect perfect agreement between pre- and in-hospital raters. To assess agreement of pre-event ASA-PS scores when pre- and in-hospital physicians had access to equal data, 20 patients were selected by an on-line randomizer and re-scored by the pre-hospital physicians when given access to complete patient records. The rationale behind this was to assess whether an observed difference in scoring was due to different physicians (pre- versus in-hospital) or different data availability.

We were unable to identify any studies in which pre-event ASA-PS was scored in a real-time pre-hospital setting. Without prior empirical information on the variation of the phenomenon under study we were consequently unable to perform sample size calculations [ 22 , 23 ]. Statistical rules of thumb for sample size varies in the literature and sample sizes from 10 to 50 is reported [ 24 ]. Combining existing advice, we chose to included 20 patients per physician to evaluate inter-rater reliability [ 24 ]. If no agreement between pre- and in-hospital physicians for 20 patients could be established, we considered the pre-hospital scores to be irrelevant.

Patients and physicians were anonymized prior to further statistical analyses.

Guidelines for Reporting Reliability and Agreement Studies (GRRAS) was used [ 25 ].

Statistical analyses

ASA-PS is an ordinal scale and agreement between two ASA-PS measures on the same individual was thus assessed using quadratic weighted Cohen’s Kappa (κ w ); a modification of Cohen’s Kappa that also accounts for the degree of disagreement between raters [ 26 ]. κ w is a number between 0 and 1. κ w  < 0.10 indicates no inter-rater reliability, while 0.11–0.40 indicates slight, 0.41–0.60 indicates fair, 0.61–0.80 indicates moderate and 0.8–1.0 indicates substantial inter-rater reliability [ 27 ].

If two measurement methods are to be considered similar their results should be indistinguishable from one another [ 28 ]. Using κ w values between pre- and in-hospital physicians as a measure of agreement, we performed minimax hierarchical agglomerative clustering; a method for exploring the inner agreement structure of a dataset [ 29 ]. The result from this clustering process is presented visually as dendrograms. Such dendrograms look like up-side-down trees, grouping elements that agree the most near the bottom of the graph, with decreasing agreement (i.e. inter-rater reliability) the higher on the graph. This approach allowed us to visually explore whether the agreement between pre-and in-hospital physicians were indeed indistinguishable from one another. The overall mean agreement [ 30 ] for all pre- versus in-hospital physicians was also calculated. Data were analyzed using IBM SPSS statistics version 22 and R 3.1.0.

Pre-event ASA-PS was registered for a total of 312 patients. We excluded four patients admitted to non-participating hospitals and three patients without identifiable patient records. One physician scored only four patients, three with pre-event ASA-PS 3 and one that could not be scored. This did not allow for κ w calculations, as scores were identical, and this physician and corresponding patients were thus excluded. In total 301 patients were available for further statistical analysis.

Pre-hospital physicians scored a median (range) of 21 (5–40) patients. Five patients (2%) could not be scored on-scene (four were unconscious and one was not able to communicate).

The distribution of ASA-PS scores between pre- and in-hospital physicians are presented in Table  1 .

κ w values for pre-event ASA-PS scores assigned by pre-hospital physicians on-scene, and subsequent scores based on complete patient records by in-hospital physicians are presented in Fig.  1 .

figure 1

κ w values for pre-event ASA-PS scores. Estimated inter-rater reliability between each pre-hospital (PDoc) and in-hospital (IDoc) physician using quadratic weighted Cohen’s kappa with 95% CI values

κ w values ranged from 0.77 to 0.85 among the three in-hospital physicians, and from 0.47 to 0.89 when comparing the pre- to in-hospital physicians. The mean kappa values were 0,67 (PDocs Stavanger), 0,78 (IDocs Stavanger), 0,75 (PDocs Trondheim) and 0,84 (IDocs Trondheim). For most scores (82%) inter-rater reliability between pre-and in-hospital physicians were moderate to substantial (κ w  > 0.61).

The mean agreement between all pre-hospital physicians and each of the three in-hospital physicians is generally high. However, the three in-hospital physicians tend to agree more with one another than they agree with the pre-hospital physicians. This is demonstrated in Fig.  2 .

figure 2

Pre- versus in-hospital agreement. Mean agreement between all pre-hospital physicians (PDocs) and the three in-hospital physicians (IDoc) at the two sites, using on-scene pre-hospital scores and in-hospital scores respectively

When pre- and in-hospital physicians scored the same 20 patients with equal access to information, the agreement was strengthened. The difference in inter-rater reliability between the pre- and in-hospital physicians was much smaller, with κ w values ranging from 0.65 to 0.93, indicating moderate to substantial agreement. Corresponding dendrograms for the two sites demonstrate that scores from pre- and in-hospital physicians do not cluster but remain largely indistinguishable from one another (Fig.  3 ).

figure 3

Agreement when given equal access to information. Dendrograms depict inter-rater reliability between pre- (PDoc) and in-hospital (IDoc) physicians when scoring the same 20 patients with pre-event ASA-PS given equal access to information. PDocs are indistinguishable from IDocs

The present study is a study of ASA-PS scoring in real life situations. As pre-hospital physicians did not have access to the full patient journal (Step 1), perfect agreement in ASA-PS scoring between pre-and in-hospital physicians was not to be expected. When comparing pre- and in-hospital pre-event ASA-PS scores, agreement was generally high ranging from fair to substantial. Most scores (82%) demonstrated moderate (64%) to substantial (18%) agreement, indicating that pre-hospital physicians can obtain sufficient data on-scene to score an adequate pre-event ASA-PS for most patients. Because the total number of pre-hospital scores are high, the impact of uncertainty in the scores, represented by broad 95% confidence intervals in Fig. 1 , is reduced.

When pre- and in-hospital physicians scored pre-event ASA-PS on the same patients with access to complete patient records, agreement improved and ranged from moderate (52%) to substantial (48%). This indicates that ASA-PS scores from pre- and in-hospital physicians are indistinguishable from one another when they have equal data access (Fig. 3 .). Accordingly, observed differences in pre-event ASA-PS scores in the first part of the study may be attributed to differences in data availability and time pressure on-scene rather than to factors related to individual physicians.

Comorbidity is an important risk-adjustment factor when evaluating pre-hospital interventions and the effect of p-EMS [ 9 , 10 ]. Additionally, adjustment for comorbidity significantly increase the predictive accuracy of trauma outcome prediction models [ 9 , 12 , 31 , 32 ]. The inherent nature of p-EMS favors a method for reporting comorbidities that is both readily available and time effective. ASA-PS is a well-known physical health condition scale, globally applied by anesthesiologists and surgeons, supporting the notion that pre-event ASA-PS may be advantageous for reporting comorbidity in p-EMS. However, studies have found substantial inter-observer variation [ 21 , 33 ]. Most of these studies are hypothetical case scenarios designed by researchers [ 8 , 16 , 21 ]. In the present study we found that the agreement between pre- and in-hospital scores is acceptable for most patients and argue that pre-event pre-hospital ASA-PS should be applied for documentation of comorbidity in p-EMS.

Obtaining complete medical history from seriously ill patients on-scene is considered unfeasible. Accordingly, a dichotomized pre-event ASA-PS is often reported [ 20 ]. This is a very rough measure of comorbidity with low clinical discriminatory ability and will not distinguish between mild and severe systemic disease. Our results indicate that p-EMS can assign an adequate full-scale pre-event ASA-PS score already on-scene.

Significantly less accuracy of assigning ASA-PS is reported for non-anesthesiologists compared to anesthesiologists, possibly limiting the validity of pre-hospital pre-event ASA-PS scores to anesthesiologist-staffed services [ 34 ]. Standardized education and encouraged use may decrease variability for less proficient users [ 35 ]. Knowledge of comorbidity is relevant for all emergency medical services to aid decision-making and to target the treatment. Reliability of pre-event ASA-PS scored by paramedics is unknown and should be subject for further research. Precise definitions of each ASA-PS class, along with training for use, may improve reliability and usability for all users.

Although the physicians in the present study did not have access to patient records only 2% of the patients could not be scored on-scene, all of which had impaired consciousness. These patients remain a challenge for p-EMS regarding comorbidity assessment. Access to patient records in p-EMS may increase feasibility and precision of pre-event ASA-PS scores and systems for field data access should be available. Summary care records (SCRs) are electronic records of important patient information available for authorized health care staff involved in patient care [ 36 ]. The prevalence of summary care records (SCRs) is increasing [ 36 ]. SCRs may provide timely and relevant patient information regardless of regional affiliation. Whether access to SCRs will increase reliability of pre-event ASA-PS scores on-scene remains unknown.

Limitations

The study was performed in a highly specialized anesthesiologist-staffed system and the results may not be transferable to other p-EMS. When number of assigned scores is low, conclusions may be inaccurate. Patients who died prior to hospital arrival were excluded. These patients are among the most severely sick or injured patients and may have a substantial comorbidity burden. Omitting these patients may overestimate the rate of agreement in this study.

For an anesthesiologist-staffed EMS covering a mixed patient population, an adequate pre-event ASA-PS can be assigned on-scene. When data access was equal, pre-event ASA-PS scores by pre- and in-hospital physicians were indistinguishable from each other. When pre-event ASA-PS was scored on-scene with restricted data access, inter-rater reliability was lower, but acceptable. We recommend application of the full pre-event ASA-PS classification system for documentation of comorbidity in p-EMS.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

The American Society of Anesthesiologists Physical Status

Physician-staffed pre-hospital emergency medical services

Guidelines for Reporting Reliability and Agreement Studies

Quadratic weighted Cohen’s Kappa

Pre-hospital physician

In-hospital physician

Summary care records

Garrison HG, Maio RF, Spaite DW, Desmond JS, Gregor MA, O'Malley PJ, Stiell IG, Cayten CG, Chew JL Jr, Mackenzie EJ, Miller DR. Emergency medical services outcomes project III (EMSOP III): the role of risk adjustment in out-of-hospital outcomes research. Ann Emerg Med. 2002;40:79–88.

Article   Google Scholar  

Keim SM, Spaite DW, Maio RF, Garrison HG, Desmond JS, Gregor MA, O'Malley PJ, Stiell IG, Cayten CG, Chew JL Jr, et al. Risk adjustment and outcome measures for out-of-hospital respiratory distress. Acad Emerg Med. 2004;11:1074–81.

Van Gelder IC, Hobbelt AH, Marcos EG, Schotten U, Cappato R, Lewalter T, Schwieler J, Rienstra M, Boriani G. Tailored treatment strategies: a new approach for modern management of atrial fibrillation. J Intern Med. 2016;279:457–66.

Miller RD. Miller’s Anestehesia 6th edition. In: Miller RD, editor. Miller’s Anestehsia, vol. 1. Philadelphia: Elsevier Churcuill Livingstone; 2005. p. 1018.

Vincent JL. The continuum of critical care. Crit Care. 2019;23:122.

Ghosh R, Pepe P. The critical care cascade: a systems approach. Curr Opin Crit Care. 2009;15:279–83.

Scalea TM, Simon HM, Duncan AO, Atweh NA, Sclafani SJ, Phillips TF, Shaftan GW. Geriatric blunt multiple trauma: improved survival with early invasive monitoring. J Trauma. 1990;30:129–34 discussion 134-126.

Article   CAS   Google Scholar  

Skaga NO, Eken T, Sovik S, Jones JM, Steen PA. Pre-injury ASA physical status classification is an independent predictor of mortality after trauma. J Trauma. 2007;63:972–8.

Bouamra O, Jacques R, Edwards A, Yates DW, Lawrence T, Jenks T, Woodford M, Lecky F. Prediction modelling for trauma using comorbidity and ‘true’ 30-day outcome. Emerg Med J. 2015;32:933–8.

Ghorbani P, Ringdal KG, Hestnes M, Skaga NO, Eken T, Ekbom A, Strommer L. Comparison of risk-adjusted survival in two Scandinavian level-I trauma centres. Scand J Trauma Resusc Emerg Med. 2016;24:66.

Jones JM, Skaga NO, Sovik S, Lossius HM, Eken T. Norwegian survival prediction model in trauma: modelling effects of anatomic injury, acute physiology, age, and co-morbidity. Acta Anaesthesiol Scand. 2014;58:303–15.

de Munter L, Polinder S, Lansink KW, Cnossen MC, Steyerberg EW, de Jongh MA. Mortality prediction models in the general trauma population: a systematic review. Injury. 2017;48:221–9.

Austin SR, Wong YN, Uzzo RG, Beck JR, Egleston BL. Why summary comorbidity measures such as the Charlson comorbidity index and Elixhauser score work. Med Care. 2015;53:e65–72.

Keats AS. The ASA classification of physical status--a recapitulation. Anesthesiology. 1978;49:233–6.

Saklad M. Grading of patients for surgical procedures. Anesthesiology. 1941;2:281–4.

Sankar A, Johnson SR, Beattie WS, Tait G, Wijeysundera DN. Reliability of the American Society of Anesthesiologists physical status scale in clinical practice. Br J Anaesth. 2014;113:424–32.

ASA Physical Status Classification System. https://www.asahq.org/standards-and-guidelines/asa-physical-status-classification-system . Accessed 1 Jun 2020.

Ringdal KG, Coats TJ, Lefering R, Di Bartolomeo S, Steen PA, Roise O, Handolin L, Lossius HM. The Utstein template for uniform reporting of data following major trauma: a joint revision by SCANTEM, TARN, DGU-TR and RITG. Scand J Trauma Resusc Emerg Med. 2008;16:7.

Tønsager K, Krüger AJ, Ringdal KG, Rehn M. Template for documenting and reporting data in physician-staffed pre-hospital services: a consensus-based update. Scand J Trauma Resusc Emerg Med. 2020;28:25.

Kruger AJ, Lockey D, Kurola J, Di Bartolomeo S, Castren M, Mikkelsen S, Lossius HM. A consensus-based template for documenting and reporting in physician-staffed pre-hospital services. Scand J Trauma Resusc Emerg Med. 2011;19:71.

Ringdal KG, Skaga NO, Steen PA, Hestnes M, Laake P, Jones JM, Lossius HM. Classification of comorbidity in trauma: the reliability of pre-injury ASA physical status classification. Injury. 2013;44:29–35.

Kirby A, Gebski V, Keech AC. Determining the sample size in a clinical trial. Med J Aust. 2002;177:256–7.

Kadam P, Bhalerao S. Sample size calculation. Int J Ayurveda Res. 2010;1:55–7.

Corder GW, Foreman DI. Nonparametric statistics for non-statisticians. Hoboken: Wiley; 2009.

Kottner J, Audige L, Brorson S, Donner A, Gajewski BJ, Hrobjartsson A, Roberts C, Shoukri M, Streiner DL. Guidelines for reporting reliability and agreement studies (GRRAS) were proposed. J Clin Epidemiol. 2011;64:96–106.

Hallgren KA. Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol. 2012;8:23–34.

Shrout PE. Measurement reliability and agreement in psychiatry. Stat Methods Med Res. 1998;7:301–17.

Zou KH, Warfield SK, Bharatha A, Tempany CM, Kaus MR, Haker SJ, Wells WM 3rd, Jolesz FA, Kikinis R. Statistical validation of image segmentation quality based on a spatial overlap index. Acad Radiol. 2004;11:178–89.

Bien J, Tibshirani R. Hierarchical clustering with prototypes via Minimax linkage. J Am Stat Assoc. 2011;106:1075–84.

Fiori S, Tanaka T. An algorithm to compute averages on matrix lie groups. Trans Sig Proc. 2009;57:4734–43.

Bergeron E, Rossignol M, Osler T, Clas D, Lavoie A. Improving the TRISS methodology by restructuring age categories and adding comorbidities. J Trauma. 2004;56:760–7.

Skaga NO, Eken T, Sovik S. Validating performance of TRISS, TARN and NORMIT survival prediction models in a Norwegian trauma population. Acta Anaesthesiol Scand. 2018;62:253–66.

Riley R, Holman C, Fletcher D. Inter-rater reliability of the ASA physical status classification in a sample of anaesthetists in Western Australia. Anaesth Intensive Care. 2014;42:614–8.

Curatolo C, Goldberg A, Maerz D, Lin HM, Shah H, Trinh M. ASA physical status assignment by non-anesthesia providers: do surgeons consistently downgrade the ASA score preoperatively? J Clin Anesth. 2017;38:123–8.

Ihejirika RC, Thakore RV, Sathiyakumar V, Ehrenfeld JM, Obremskey WT, Sethi MK. An assessment of the inter-rater reliability of the ASA physical status score in the orthopaedic trauma population. Injury. 2015;46:542–6.

Jones EW. How summary care records can improve patient safety. Emerg Nurse. 2015;23:20–2.

Download references

Acknowledgements

The authors are grateful to the donors of the Norwegian Air Ambulance Foundation. The authors thank all pre-hospital physicians in Stavanger and Trondheim who collected pre-hospital data and Guro Mæhlum Krüger, Trond Nordseth, Helge Haugland, Katrine Finsnes, Unni Bergland and Linda Rørtveit who collected in-hospital data.

The Norwegian Air Ambulance Foundation funded this project but played no part in study design, data collection, analysis, writing or submitting to publication.

Author information

Authors and affiliations.

Department of Research, The Norwegian Air Ambulance Foundation, Oslo, Norway

Kristin Tønsager, Marius Rehn, Andreas J. Krüger & Jo Røislien

Department of Anesthesiology and Intensive Care, Stavanger University Hospital, Stavanger, Norway

Kristin Tønsager & Marius Rehn

Faculty of Health Sciences, University of Stavanger, Stavanger, Norway

Kristin Tønsager & Jo Røislien

Pre-hospital Division, Air Ambulance Department, Oslo University Hospital, Oslo, Norway

Marius Rehn

Department of Emergency Medicine and Pre-Hospital Services, St. Olav’s Hospital, Trondheim, Norway

Andreas J. Krüger

Department of Anesthesiology, Vestfold Hospital Trust, Tønsberg, Norway

Kjetil G. Ringdal

Prehospital Division, Vestfold Hospital Trust, Tønsberg, Norway

Norwegian Trauma Registry, Oslo University Hospital, Oslo, Norway

You can also search for this author in PubMed   Google Scholar

Contributions

KT, KGR and AJK conceived the idea. KT and AJK were involved in acquisition of data. KT analyzed the data, KGR, AJK, MR and JR supervised the analysis. All authors were involved in the interpretation of the data. KT drafted the manuscript and KGR, AJK, MR and JR revised it critically. All authors have read and approved the final version of the manuscript. All authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Corresponding author

Correspondence to Kristin Tønsager .

Ethics declarations

Ethics approval and consent to participate.

The Regional Committee for Medical and Health Research Ethics in Western Norway (ID 2016/556) approved the study and ruled out that no formal consent was necessary, thus; they approved exemption of consent for all patients.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Tønsager, K., Rehn, M., Krüger, A.J. et al. Assignment of pre-event ASA physical status classification by pre-hospital physicians: a prospective inter-rater reliability study. BMC Anesthesiol 20 , 167 (2020). https://doi.org/10.1186/s12871-020-01083-x

Download citation

Received : 12 February 2020

Accepted : 01 July 2020

Published : 09 July 2020

DOI : https://doi.org/10.1186/s12871-020-01083-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Critical care
  • Comorbidity
  • Emergency medical services
  • Pre-hospital emergency care

BMC Anesthesiology

ISSN: 1471-2253

physician assign score to determine need for anesthesia

Anesthesia Experts

physician assign score to determine need for anesthesia

Using Examples Best When Classifying ASA Physical Status

March 1, 2016 by Dr. Clemens Leave a Comment

Despite being an important part of clinical practice for more than five decades, the American Society of Anesthesiologists (ASA) physical status classification system occasionally is criticized for its subjective nature, a trait that can lead to inconsistent assignments among health care professionals.

This problem can be ameliorated, a recent study has found, with the use of ASA-approved class-specific examples, which help anesthesia and nonanesthesia providers alike substantially increase their ability to determine the correct ASA class (Table 1).

“Part of the reason behind the 2014 development of the examples that accompany the ASA physical status classification system stems from the fact that the system is not used exclusively by professionals trained in anesthesia care,” stated Erin Hurwitz, MD, who was assistant professor of anesthesiology at The University of Texas Medical Branch at Galveston when the study was performed.

“So, if you are assigning a patient an ASA I or II physical status and undervaluing comorbidities that actually make them an ASA III, you may be putting patient safety at risk in certain situations.”

To help determine the utility of the examples in improving class assignment, Dr. Hurwitz and her colleagues recruited 779 anesthesia (from 41 states) and 110 nonanesthesia providers (from 18 states) into this Web-based study. As part of the questionnaire, participants were first asked to assign an ASA physical status level to 10 hypothetical cases using only the ASA definitions for reference.

“In the second part of the survey,” Dr. Hurwitz told  Anesthesiology News , “they were given the same 10 cases in a different order. Except this time, they were also given a table that included the published examples and again asked to assign an ASA physical status classification.”

Anesthesia providers included physician anesthesiologists, anesthesiology residents and fellows, nurse anesthetists and anesthesiology assistants. “For nonanesthesia providers, we targeted areas in medicine that utilized the ASA physical status,” she said, “including gastroenterologists, pulmonary critical care physicians, interventional radiologists, oral-maxillofacial surgeons and nurses.”

ASA Class Examples Not Widely Known

As Dr. Hurwitz reported at the 2015 annual meeting of the American Society of Anesthesiologists (abstract A1278), clinicians of all stripes saw significant improvement in their ability to correctly identify a patient’s ASA physical status class when they used the examples (Table 2).

“When only the definitions were used, in only about half the cases did clinicians give the correct ASA physical status assignment,” Dr. Hurwitz explained. “But when the examples were added, the mean correct score went up to almost eight out of 10 cases.” Of note, only three of the 10 cases were correctly assigned at least 65% of the time when definitions alone were used. This climbed to nine of 10 when the examples were added (Figure).

Figure. Correct ASA physical status assignments with and without use of examples.

This kind of improvement should help level the playing field between health care providers who often interpret the same clinical situations very differently. “One of the criticisms of the ASA physical status system is that it’s somewhat subjective in nature,” Dr. Hurwitz told  Anesthesiology News . “What constitutes a severe systemic disease to one provider may be something very different to another. So, this is a way to see if adding some objectivity can help improve consistency—and I think it does.”

Given these results, Dr. Hurwitz thought it important that clinicians familiarize themselves with the examples. “In talking to anesthesiologists, I have found that there are many people who are unaware that the published examples currently exist,” she explained. The examples are available on the ASA’s website at www.asahq.org/​resources/​clinical-information/​asa-physical-status-classification-system .

Robert E. Johnstone, MD, professor of anesthesiology at West Virginia University in Morgantown, noted that the investigation confirms that adding examples improves the consistency of judging patient physical status. “A weakness of the study is that no final authority exists to determine physical status [PS], so comparisons with a ‘correct’ value are suspect, especially in gray areas,” said Dr. Johnstone. “Is a healthy person who trips while jogging and breaks some ribs a PS I [healthy], II [minor injury] or III [functionally impaired – cannot take a deep breath]? Any could be correct.” Some anesthesiologists, he added, think the magnitude or risk of the surgery plus the patient’s age also should factor into the physical status.

“The more consistency in physical status assignments the better, because work assignments, quality assessments and finances are affected,” Dr. Johnstone added. “Some clinicians work in ambulatory surgical units where only patients with PS I, II or III are allowed. Quality assessments are adjusted for physical status. Finally, some insurers pay extra for anesthetics involving higher physical status patients.”

Leave a Reply Cancel reply

You must be logged in to post a comment.

Testimonials

“As Director of Surgical Services Departments there has been considerable changes have occurred in my department and Anesthesia Experts has always risen to meet our demands of our facility. They have been very pro-active in meeting the increase volumes allowing us to keep our surgeons and patients very satisfied with our services.”

“Before AE took over the anesthesia department was described by the surgeons as the worst in the history of our hospital. The prior management company was having a cancelled surgery per day. I am happy to report there has not been one since they have taken over the department. Additionally we have seen a 905 reduction if requested preop lab tests. The anesthesia department is now the very best hospital department in our entire facility.”

“Anesthesia Experts has provided consistent anesthesia providers who display a high degree of integrity, responsibility and professionalism. They have become a more valuable part of our facility and community.”

“Even though they are physically located 1000 miles away Anesthesia Experts just does not provide great anesthesia coverage they personally engage surgeons to increase their business. Last year my surgical volume rose by 24% and we are currently 50% ahead of last year and all of that growth is organic.”

“Anesthesia Experts is more responsive than anyone I have dealt with. They are available by phone whenever needed and will be on site for any need or request and has been on site to address issues before we can make the request.”

“While problems are extremely rare when they do occur Anesthesia Experts quickly and professionally implements a solution. Our surgical volume has grown over 100 cases per month and now our GI docs want to perform all of their endoscopies in our hospital instead of their GI lab that they own!”

“Our anesthesia department was a thorn in my side that kept me awake at night. Anesthesia Experts swept in and brought order to our mess and our department was quickly redirected.”

Add a Testimonial

  • Open access
  • Published: 19 June 2018

Clinical agreement in the American Society of Anesthesiologists physical status classification

  • Kayla M. Knuf   ORCID: orcid.org/0000-0002-8505-3552 1 ,
  • Christopher V. Maani 1 &
  • Adrienne K. Cummings 1  

Perioperative Medicine volume  7 , Article number:  14 ( 2018 ) Cite this article

5502 Accesses

72 Citations

1 Altmetric

Metrics details

The American Society of Anesthesiologists physical status (ASA-PS) classification is not intended to predict risk, but increasing ASA-PS class has been associated with increased perioperative mortality. The ASA-PS class is being used by many institutions to identify patients that may require further workup or exams preoperatively. Studies regarding the ASA-PS classification system show significant variability in class assignment by anesthesiologists as well as providers of different specialties when provided with short clinical scenarios. Discrepancies in the ASA-PS accuracy have the potential to lead to unnecessary testing and cancelation of surgical procedures. Our study aimed to determine whether these differences in ASA-PS classification were present when actual patients were evaluated rather than previously published scenario-based studies.

A retrospective chart review was completed for patients >/= 65 years of age undergoing elective total hip or total knee replacements. One hundred seventy-seven records were reviewed of which 101 records had the necessary data. The outcome measures noted were the ASA-PS classification assigned by the internal medicine clinic provider, the ASA-PS classification assigned by the Pre-Anesthesia Unit (PAU) clinic provider, and the ASA-PS classification assigned on the day of surgery (DOS) by the anesthesia provider conducting the anesthetic care.

A statistically significant difference was shown between the internal medicine and the PAU preoperative ASA-PS designation as well as between the internal medicine and DOS designation (McNemar p  = 0.034 and p  = 0.025). Low kappa values were obtained confirming the inter-observer variation in the application of the ASA-PS classification of patients by providers of different specialties [Kappa of 0.170 (− 0.001, 0.340) and 0.156 (− 0.015, 0.327)].

Conclusions

There was disagreement in the ASA-PS class designation between two providers of different specialties when evaluating the same patients with access to full medical records. When the anesthesia-run PAU and the anesthesia assigned DOS ASA-PS class designations were evaluated, there was agreement. This agreement was seen between anesthesia providers regardless of education or training level. The difference in the application of the ASA-PS classification in our study appeared to be reflective of department membership and not reflective of the individual provider’s level of training.

As the concept of a single surgical procedure has transitioned to a comprehensive perioperative process, the outcomes of many major elective operations have improved. Care now focuses on a preoperative evaluation, early planning for discharge, and post-procedure rehabilitation (Donabedian 1966 ; Bader 2012 ). This integrated perioperative system promotes the combination of the three care phases: preoperative, intraoperative, and postoperative. As this transition of perioperative ideology continues, patients will benefit from multidisciplinary management for effective and efficient patient care (Adamina et al. 2011 ; Perioperative Surgical Home n.d. ).

The preoperative component requires comprehensive preoperative evaluations. This has resulted in a change from a simple day of surgery evaluation to the establishment of standardized preoperative clinics. The purpose of these more thorough preoperative clinics is to allow for deliberate and careful clinical evaluation with additional investigation and optimization of medical conditions as indicated to promote better patient outcomes and reduce unnecessary medical expenses. Studies have linked the implementation of preoperative clinics with improved patient outcomes such as decreased in-hospital mortality and cost-reduction due to a decrease in day of surgery cancelations (Hoyt n.d. ; Blitz et al. 2016 ; Whitlock et al. 2015 ). There are many types of preoperative clinics with multiple staffing models including providers from a variety of specialties and training levels (Johnson et al. 2014 ).

There are several components to a preoperative evaluation, including the American Society of Anesthesiologists Physical Status (ASA-PS) classification which was established in the 1940s and has since undergone multiple revisions. While not intended to predict risk, increasing ASA-PS class has been associated with increased perioperative mortality (Lemmens et al. 2008 ; Hopkins et al. 2016 ). The incidence of perioperative morbidity also rises with increasing ASA-PS class from 3.9% in an ASA 1 to 33.7% in an ASA 4 (Menke et al. 1993 ). As the perioperative system of care evolves, many institutions are attempting to maximize value via patient stratification, i.e. requiring only patients with higher ASA-PS classification scores to undergo formal preoperative evaluation and allowing those with lower ASA-PS classification scores to bypass preoperative clinics in an effort to streamline care. This has important implications as the provider who assigns the initial ASA-PS class stratifies the patient to either further preoperative evaluation or preoperative bypass. While the ASA-PS classification is one component of the preoperative evaluation, it has important ramifications in perioperative medicine as well as the practice of anesthesia. The classification affects surgical decision making, the anesthetic plan, and billing/reimbursement practices. Due to these consequences, it is important to have a consistent application of the ASA-PS classification system across providers, clinics, and specialties.

Studies regarding the ASA-PS classification system show significant variability in class assignment by anesthesiologists when provided with short clinical scenarios or hypothetical vignettes (Owens et al. 1978 ; Cuvillon et al. 2011 ; Mak et al. 2002 ; Riley et al. 2014 ). Variability is also seen in retrospective chart review comparing the ASA-PS class assigned at a preoperative clinic versus the ASA-PS class assigned in the operating room (Sankar et al. 2014 ). Inter-rater reliability is not the only issue with the ASA-PS class system, but intra-rater reliability which one would expect to show near perfect agreement has shown only moderate agreement in the pediatric cancer setting (Tollinche et al. 2018 ). Not only is there disagreement between anesthesia providers, but providers of different specialties also lack consistency. A recent study administered a survey of clinical scenarios to anesthesia providers, surgeons, and internists. In this study, providers of different specialties not only assigned an ASA-PS classification score less consistently, but they also had a tendency to underrate the class of the patients when compared to anesthesia providers given the same scenario (Curatolo et al. 2017 ; Eakin and Bader 2017 ).

When clinical scenarios are used to study the assignment of the ASA-PS classes, there are many limitations. Study participants are unable to ask for additional information or to extract and analyze applicable data from the medical record. Our study seeks to retrospectively assess the consistency of the ASA-PS class assignment between anesthesia providers and internists when evaluating patients undergoing total hip and total knee replacements at our institution during a 2-year period (Table  1 ). Due to variability in training and exposure to the ASA-PS classification system, our hypothesis predicted disagreement between the ASA-PS classes assigned by internal medicine and anesthesia providers on the same patient when both providers complete a history and physical exam with access to the entire medical record.

After obtaining IRB approval, this single-center study was completed. Surgical scheduling software was queried for all patients >/= 65 years of age undergoing elective total hip or total knee replacements with surgical dates between 01 Jan 2015 and 31 Dec 2016 at a contemporary military treatment facility (MTF). A total of 303 patients were screened in the specified time period. These records were reviewed to eliminate emergent cases as well as to ensure that the patients had visited both the internal medicine preoperative clinic and the preoperative anesthetic unit (PAU). The resulting 177 records were reviewed of which 101 records were assigned an ASA-PS classification by both the medicine preoperative clinic and the PAU clinic (Table  2 ). These were included in the data analysis (Fig.  1 ).

figure 1

Consort diagram

At our institution, surgeons and anesthesia providers can make referrals to the internal medicine preoperative clinic based on clinical judgment. There is no algorithm that establishes which patients would benefit from additional resources in the form of an internal medicine preoperative visit. There is a stratification process in which the surgeons can determine who completes a PAU clinic visit versus who can bypass the PAU. Bypass is reserved for ASA-PS 1 and 2 patients. These patients are contacted telephonically by the PAU to determine if there are any outstanding issues that may need to be addressed by a PAU visit. The surgeons can refer ASA-PS 1 and ASA-PS 2 to the PAU based on their preference or if the surgeon believes they would benefit from seeing an anesthesia provider prior to the day of surgery. The order in which these visits occur is variable as the appointments are booked by the patient. The ASA-PS classification used in this study was the ASA-PS classification assigned following the initial encounter by both the PAU and the internal medicine clinic (Table  3 ).

For these records, the ASA classification from each visit as well as the day of surgery (DOS) ASA-PS class recorded by the anesthesia provider completing the case were collected. Supplemental data including age, BMI, gender, tobacco use, alcohol use, drug use, cardiac risk score, exercise tolerance (measured in metabolic equivalents), identified medical comorbidities, current medications, preoperative EKGs, additional preoperative cardiac study results, and preoperative pulmonary function test results were also collected (Table  4 ).

The outcome measures noted were the ASA-PS classification assigned by the internal medicine clinic provider, the ASA-PS classification assigned by the PAU clinic provider, and the ASA-PS classification assigned on the DOS by the anesthesia provider. There is no formal training in assigning an ASA-PS classification in our internal medicine department. Training is provided to PAU providers that are not anesthesia trained, specifically the Nurse Practioners and the Physician Assistants that see patients in the clinic.

Data analysis software was used to perform the following analyses [SPSS v22.0 (IBM Corp. Released 2013. IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY: IBM Corp)]. To assess the overall disagreement between the data sets, a McNemar test was completed with the following pairings: medicine and PAU, medicine and DOS, and PAU and DOS. To assess the overall agreement between the data sets, kappa statistics along with 95% confidence intervals were calculated for the aforementioned pairings (Table  5 ).

Three ASA-PS classifications documented by separate medical providers in reference to the same patient were obtained via retrospective chart review. The source of these ASA-PS classification sets were from the internal medicine preoperative appointment, the anesthesia PAU appointment, and the DOS anesthesia record. Medicine preoperative ASA-PS classifications were performed by resident physicians from the Department of Medicine with staff physician supervision. ASA-PS classifications from the PAU were performed by anesthesia providers and non-anesthesia providers with varying levels of experience, while those from the DOS were performed solely by anesthesia providers. The levels of experience included Physician Assistants (PAs) working in the PAU, Nurse Practitioners (NPs) working in the PAU, Student Registered Nurse Anesthetists (SRNAs), Certified Registered Nurse Anesthetists (CRNAs), Anesthesiology Residents, and Staff Anesthesiologists.

One record was excluded from the analysis, as it was designated an ASA-PS of 1 by the DOS anesthesia provider but as an ASA-PS of 2 by both the medicine and the PAU provider. Due to the fact that there were no other ASA-PS 1 designations in the data set, the McNemar test could not be performed. The McNemar test can be used only on paired nominal data; thus, the model could not be met as there was only 1 observed value of ASA-PS class 1.

When the ASA-PS class designation was compared between the internal medicine and the PAU preoperative assessment as well as between the internal medicine preoperative assessment and DOS designation, there was a statistically significant difference (McNemar p  = 0.034 and p  = 0.025, respectively). On further analysis of these groups, low kappa values were obtained further confirming the inter-observer variation in the application of the ASA-PS classification of patients by providers of different specialties [Kappa of 0.170 (− 0.001, 0.340) and 0.156 (− 0.015, 0.327), respectively].

Among the sets of ASA-PS classification from the PAU and the DOS, the low McNemar value demonstrates that the null hypothesis of marginal homogeneity cannot be rejected in respect to these two data sets indicating that these two sets of data are not in disagreement. Furthermore, the kappa value for these two sets of classifications was 0.863 (0.696, 1.030) indicating near perfect agreement between the two groups regarding the ASA-PS class assigned.

The goal of this study was to determine inter-rater reliability of the ASA-PS assignment between anesthesia and internal medicine providers in two preoperative clinics. We found disagreement in the designated ASA-PS classification between these two providers when evaluating the same patient with access to his or her full medical record. When the anesthesia-run PAU and the anesthesia assigned DOS ASA-PS class designations were evaluated, there was agreement. Interestingly, over half of the PAU evaluations in this study were completed by PAs or NPs from the department of anesthesia. These were non-anesthesia providers who were oriented and trained by licensed anesthesia providers. Approximately half of the DOS evaluations were completed by staff physicians and staff CRNAs while the other half were completed by trainees (with either direct or indirect supervision by a privileged anesthesia provider). There was agreement seen between anesthesia department staff regardless of education or training level. The difference in the application of the ASA-PS classification in our study appeared to be reflective of department membership and not reflective of the individual provider’s level of training.

The agreement in ASA-PS assignment seen in the anesthesia department at our institution regardless of training level suggests that the standard application of the classification system can be taught and learned. It also specifically implies that non-anesthesia providers could more predictably rate ASA-PS after education and brief training sponsored by the Anesthesia Department. This competency could be achieved independent of education or training level. Improving the inter-rater reliability between providers of different specialties will improve communication, preoperative risk stratification patient optimization, and perioperative care. To our knowledge, no study has looked at ASA-PS classification between providers of different specialties using a retrospective review of existing patient data. Prior studies utilized surveys of hypothetical clinical scenarios focusing on straightforward medical problems without clinical evaluation or correlation. These studies had a correct or designated ASA-PS class which was used to evaluate the accuracy of responders. While “correctness” can be determined in hypothetical, “static” clinical scenarios, it cannot always be determined in clinical situations with an actual patient. In “real-life” clinical situations that are often evolving or dynamic, it is the inter-rater reliability that is most useful in the preoperative management of patients.

While the ASA-PS class designation by the anesthesia provider on the day of surgery is the only ASA-PS class that matters in regard to billing and charting, there are potential clinical implications to non-anesthesia providers assigning an ASA-PS class early in the perioperative process. According to the American College of Cardiology/American Heart Association (ACC/AHA) Guidelines on Perioperative Cardiovascular Evaluation and Management of Patients Undergoing Noncardiac Surgery, assessment is made of a major adverse cardiac event (MACE) which leads to further workup or proceeding directly to surgery (Fleisher et al. 2014 ). While this was traditionally done with the Revised Cardiac Risk Index (RCRI), two new tools, the Gupta Myocardial Infarction or Cardiac Arrest (MICA) calculator as well as the National Surgical Quality Improvement Program (NSQIP) Surgical Risk Calculator, are mentioned in the guidelines. Both of these tools require the assignment of a ASA-PS class to produce the estimated perioperative risk of MACE. These risk tools are utilized by non-anesthesia providers and are a part of the perioperative cardiac assessment which determines which patients require further testing prior to noncardiac surgery.

Not only is accuracy of the ASA-PS class necessary to ensure the appropriate preoperative workup, but consistent ASA-PS classification also ensures accuracy of survival prediction models as well as quality comparisons among institutions (Skaga et al. 2007 ; Kuza et al. 2017 ). The ASA-PS class is also used by NSQIP to compare quality of care among hospitals. A recent study showed that the misclassification of the ASA-PS class significantly impacted the observed/expected mortality leading to skewed data in quality assessment between institutions (Helkin et al. 2017 ).

Our study had several limitations. While the retrospective nature of this study eliminated some of the shortcomings of prior studies, it introduced new limitations inherent to a retrospective study. Specifically, we were unable to collect full data sets as the ASA-PS class was not measured in a large subset of the patient population. Additionally, all data was collected retrospectively from the medical record; thus, if either provider did not take a full medical history and account for all medical comorbidities that in and of itself could explain the differences in the ASA-PS classification. Secondly, due to the inclusion criteria used, ASA-PS classes 1 and 5 were not represented in this study. While this is likely not clinically relevant, without full representation of all classes, we were unable to determine the applicability of the results to ASA classes 1 and 5. Thirdly, a large number of medical records were excluded due to insufficient data. Specifically, the most common reason for an incomplete data set was that the ASA-PS classification was missing from the medicine preoperative appointment. If these 76 records had been included, the results and significance of the study may have been different. Lastly, this retrospective study was completed at a military treatment facility (MTF) which had several implications. The patient population consisted solely of active duty military, retirees, and their dependents. These patients had increased access to care and decreased cost of care when compared with a civilian population. As a result, this population may have had an improved baseline health status when compared with a civilian population which may have resulted in less patient variability.

While this study was retrospective in nature and conducted at a MTF, we believe that the results are applicable to civilian facilities. The disagreement between providers’ use of the ASA-PS classification system as well as lack of uniformity in preoperative evaluations offers an opportunity for improving perioperative outcomes and patient safety. As comprehensive perioperative care continues to expand in a multidisciplinary fashion, preoperative evaluations form the cornerstone of patient stratification and resource allocation. If evaluations cannot be completed in an appropriate and consistent manner across perioperative providers, there is the potential for increased cost and decreased quality of care.

While research shows the inconsistencies that exist in the application of the ASA-PS classification system, further study is needed to determine how to solve this issue. It is difficult to ascertain the etiology of the inconsistency. Is it secondary to a lack of knowledge, or does it point to a deeper issue with the classification system we use? The next step would be to design an educational intervention that focuses on application of a consistent approach to the ASA-PS classification system. If this intervention results in improvement of inter-rater reliability between specialties, the likely explanation is a lack of knowledge/familiarity.

In summary, there was a statistically significant difference in the application of the ASA-PS classification system between providers of the internal medicine department and the anesthesia department. In a clinical setting, the “right” ASA-PS classification is not nearly as important as reliable ASA-PS class designations between providers. The agreement between anesthesia providers of varying levels of training shows that consistent application is possible.

Declarations

The view(s) expressed herein are those of the author(s) and do not reflect the official policy or position of Brooke Army Medical Center, the U.S. Army Medical Department, the U.S. Army Office of the Surgeon General, the Department of the Air Force, the Department of the Army, the Department of Defense, or the U.S. Government.

Abbreviations

American College of Cardiology/American Heart Association Guidelines

American Society of Anesthesiologists physical status

Day of Surgery

Major Adverse Cardiac Event

Myocardial Infarction or Cardiac Arrest

National Surgical Quality Improvement Program

Pre-Anesthesia Unit

Revised Cardiac Risk Index

Adamina M, Kehlet H, Tomlinson GA, Senagore AJ, Delaney CP. Enhanced recovery pathways optimize health outcomes and resource utilization: a meta-analysis of randomized controlled trials in colorectal surgery. Surgery. 2011;149(6):830–40.

Article   PubMed   Google Scholar  

Bader AM. Advances in preoperative risk assessment and management. Curr Probl Surg. 2012;49(1):11–40.

Blitz JD, Kendale SM, Jain SK, Cuff GE, Kim JT, Rosenberg AD. Preoperative evaluation clinic visit is associated with decreased risk of in-hospital postoperative mortality. Anesthesiology. 2016;125(2):280–94.

Curatolo C, Goldberg A, Maerz D, Lin HM, Shah H, Trinh M. ASA physical status assignment by non-anesthesia providers: do surgeons consistently downgrade the ASA score preoperatively? J Clin Anest. 2017;38:123–8.

Article   Google Scholar  

Cuvillon P, Nouvellon E, Marret E, Albaladejo P, Fortier LP, Fabbro-Perray P, et al. American Society of Anesthesiologists’ physical status system: a multicentre Francophone study to analyse reasons for classification disagreement. Eur J Anaesthesiol. 2011;28:742–7.

Donabedian A. Evaluating the quality of medical care. Milbank Mem Fund Q. 1966;44(3, pt 2):166–203.

Eakin JL, Bader AM. ASA physical status classification system: is it consistent amongst providers and useful in determining need for pre-operative evaluation resources? J Clin Anest. 2017;39:73–4.

Fleisher LA, Fleischmann KE, Auerbach AD, Barnason SA, Beckman JA, et al. 2014 ACC/AHA guideline on perioperative cardiovascular evaluation and management of patients undergoing noncardiac surgery. J Am Coll Cardiol. 2014;64(22):e77-e137.

Helkin A, Jain SV, Grussner A, Fleming M, Kohman L, Costanza M, et al. Impact of ASA score misclassification on NSQIP predicted mortality: a retrospective analysis. Perioper Med (Lond). 2017;6:23.

Hopkins TJ, Raghunathan K, Barbeito A, Cooter M, Stafford-Smith M, Schroeder R, et al. Associations between ASA physical status and postoperative mortality at 48 h: a contemporary dataset analysis compared to a historical cohort. Perioper Med (London). 2016;5:29.

Hoyt DB. (n.d.) Looking forward—November 2015. Bulletin of the American College of Surgeons. http: //bulletin. facs.org/2015/11/looking-forward -november-2015/. Published November 1, 2015. Accessed July 8, 2016.

Johnson BK, James CW 3rd, Ritchie G, Morgan RR Jr, McMillian HR. Evaluation of cost reduction measures at a state university medical center. J S C Med Assoc. 2014;110(1):8–11.

PubMed   Google Scholar  

Kuza CM, Hatzakis G, Nahmias JT. The assignment of American Society of Anesthesiologists physical status classification for adult polytrauma patients: results from a survey and future considerations. Anesth Analg. 2017 Dec;125(6):1960–6.

Lemmens LC, Kerkkamp HE, Van Klei WA, Klazinga NS, Rutten CL, Van Linge RH, et al. Implementation of outpatient preoperative evaluation clinics: facilitating and limiting factors. Br J Anaesth. 2008;100(5):645–51.

Article   PubMed   CAS   Google Scholar  

Mak PH, Campbell RC, Irwin MG. The ASA physical status classification: inter-observer consistency. American Society of Anesthesiologists Anaesth Intensive Care. 2002;30:633.

PubMed   CAS   Google Scholar  

Menke H, Klein A, John KD, Junginger T. Predictive value of ASA classification for the assessment of the perioperative risk. Int Surg 1993; Jul-Sep 78(3):266–270.

Owens WE, Felts JA, Spitznagel EL Jr. ASA physical status classifications: a study of consistency of ratings. Anesthesiology 1978; 49:239–243.

Perioperative Surgical Home. (n.d.) American Society of Anesthesiologists website. https://www.asahq.org/psh . Accessed July 8, 2016.

Riley R, Holman C, Fletcher D. Inter-rater reliability of the ASA physical status classification in a sample of anaesthetists in Western Australia. Anaesth Intensive Care. 2014 Sep;42(5):614–8.

Sankar A, Johnson SR, Beattie WS, Tait G, Wijeysundera DN. Reliability of the American Society of Anesthesiologists physical status scale in clinical practice. Br J Anaesth. 2014 Sep;113(3):424–32.

Article   PubMed   PubMed Central   CAS   Google Scholar  

Skaga NO, Eken T, Sovik S, Jones JM, Steen PA. Pre-injury ASA physical status classification is an independent predictor of mortality after trauma. J Trauma. 2007;63:972–8.

Tollinche LE, Yang G, Tan KS, Borchardt R. Interrater variability in ASA physical status assignment: an analysis in the pediatric cancer setting. J Anesth. 2018 Apr;32(2):211–8.

Whitlock EL, Feiner JR, Chen L. Perioperative mortality, 2010 to 2014: a retrospective cohort study using the National Anesthesia Clinical Outcomes Registry. Anesthesiology. 2015;123:1312–21.

Download references

Availability of data and materials

The datasets generated during the current study are available from the corresponding author on request.

Author information

Authors and affiliations.

Department of Anesthesia, San Antonio Military Medical Center, 3551 Rodger Brooke Dr, Fort Sam Houston, San Antonio, TX, 78234, USA

Kayla M. Knuf, Christopher V. Maani & Adrienne K. Cummings

You can also search for this author in PubMed   Google Scholar

Contributions

KK was responsible for the concept of the study. CM and AC were the major contributors to the design of the study. KK performed the dataset generation. KK, CM, and AC performed the data analysis as well as were all major contributors in writing the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kayla M. Knuf .

Ethics declarations

Ethics approval and consent to participate.

Approved by Institutional IRB, ref. c.2017.078e.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article.

Knuf, K.M., Maani, C.V. & Cummings, A.K. Clinical agreement in the American Society of Anesthesiologists physical status classification. Perioper Med 7 , 14 (2018). https://doi.org/10.1186/s13741-018-0094-7

Download citation

Received : 20 March 2018

Accepted : 04 June 2018

Published : 19 June 2018

DOI : https://doi.org/10.1186/s13741-018-0094-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Variation between specialties
  • Anesthesiology/standards
  • Preoperative care
  • Risk assessment/classification

Perioperative Medicine

ISSN: 2047-0525

physician assign score to determine need for anesthesia

Icahn School of Medicine at Mount Sinai Logo

  • Help & FAQ

ASA physical status assignment by non-anesthesia providers: Do surgeons consistently downgrade the ASA score preoperatively?

  • ISMMS Center for Biostatistics
  • Icahn School of Medicine at Mount Sinai
  • Population Health Science and Policy
  • Anesthesiology, Perioperative, and Pain Medicine

Research output : Contribution to journal › Article › peer-review

Objective The American Society of Anesthesiologists physical status (ASA-PS) is associated with increased morbidity and mortality in the perioperative period. When surgeries are scheduled by surgeons and their staff at our large institution a presumed ASA-PS is chosen. This is because our institution (and, anecdotally, others in our hospital system and elsewhere), recognizing the relationship between higher ASA-PS and poorer postoperative outcomes, requires all patients with higher ASA-PS levels (≥ 3) to undergo enhanced preoperative workup. The patients may not, however, necessarily be seen in the anesthesia clinic prior to surgery. As a result, patients are assigned a presumed ASA-PS by a non-anesthesia provider (e.g., surgeons and physician extenders) that may not reflect the ASA-PS chosen by the anesthesiologist on the day of surgery. Errors in the accuracy of the ASA-PS prior to surgery lead to unnecessary and costly preoperative testing, delays in operative procedures, and potential case cancellations. Our study aimed to determine whether there are significant differences in the assignment of ASA-PS by non-anesthesia providers when compared to anesthesia providers. Design We administered an IRB-approved survey asking the ASA-PS of 20 hypothetical case vignettes to 229 clinicians in various departments. Responses by non-anesthesia providers were compared to the consensus of the department of anesthesiology. Setting Faculty office spaces and conferences. Patients No patients, physicians only. Interventions Survey administration. Measurements ASA-PS scores acquired from surveys. Main results Residents and faculty in the department of anesthesiology demonstrated no statistical difference in the median ASA score in 19/20 case scenarios. All other departments were statistically different when compared to the department of anesthesiology (p < 0.05). The probability of a department either over- or under-rating the ASA-PS was calculated, and is summarized in Fig. 3. All departments, except anesthesiology, had a 30–40% chance of under-rating the ASA-PS of the patients in the clinical vignettes. Conclusions Non-anesthesia providers assign ASA-PS with significantly less accuracy than do anesthesia providers, even when adjusted for multiple comparisons. Surgical and procedural departments were found to consistently under-rate the ASA-PS of patients in our clinical vignettes.

  • ASA classification

Access to Document

  • 10.1016/j.jclinane.2017.02.002

Fingerprint

  • Anesthesiologists Medicine & Life Sciences 100%
  • Surgeons Medicine & Life Sciences 67%
  • Anesthesiology Medicine & Life Sciences 30%
  • Anesthesia Medicine & Life Sciences 14%
  • Physician Assistants Medicine & Life Sciences 8%
  • Perioperative Period Medicine & Life Sciences 7%
  • Operative Surgical Procedures Medicine & Life Sciences 6%
  • Research Ethics Committees Medicine & Life Sciences 6%

T1 - ASA physical status assignment by non-anesthesia providers

T2 - Do surgeons consistently downgrade the ASA score preoperatively?

AU - Curatolo, Christopher

AU - Goldberg, Andrew

AU - Maerz, David

AU - Lin, Hung Mo

AU - Shah, Hardikkumar

AU - Trinh, Muoi

N1 - Publisher Copyright: © 2017 Elsevier Inc.

PY - 2017/5/1

Y1 - 2017/5/1

N2 - Objective The American Society of Anesthesiologists physical status (ASA-PS) is associated with increased morbidity and mortality in the perioperative period. When surgeries are scheduled by surgeons and their staff at our large institution a presumed ASA-PS is chosen. This is because our institution (and, anecdotally, others in our hospital system and elsewhere), recognizing the relationship between higher ASA-PS and poorer postoperative outcomes, requires all patients with higher ASA-PS levels (≥ 3) to undergo enhanced preoperative workup. The patients may not, however, necessarily be seen in the anesthesia clinic prior to surgery. As a result, patients are assigned a presumed ASA-PS by a non-anesthesia provider (e.g., surgeons and physician extenders) that may not reflect the ASA-PS chosen by the anesthesiologist on the day of surgery. Errors in the accuracy of the ASA-PS prior to surgery lead to unnecessary and costly preoperative testing, delays in operative procedures, and potential case cancellations. Our study aimed to determine whether there are significant differences in the assignment of ASA-PS by non-anesthesia providers when compared to anesthesia providers. Design We administered an IRB-approved survey asking the ASA-PS of 20 hypothetical case vignettes to 229 clinicians in various departments. Responses by non-anesthesia providers were compared to the consensus of the department of anesthesiology. Setting Faculty office spaces and conferences. Patients No patients, physicians only. Interventions Survey administration. Measurements ASA-PS scores acquired from surveys. Main results Residents and faculty in the department of anesthesiology demonstrated no statistical difference in the median ASA score in 19/20 case scenarios. All other departments were statistically different when compared to the department of anesthesiology (p < 0.05). The probability of a department either over- or under-rating the ASA-PS was calculated, and is summarized in Fig. 3. All departments, except anesthesiology, had a 30–40% chance of under-rating the ASA-PS of the patients in the clinical vignettes. Conclusions Non-anesthesia providers assign ASA-PS with significantly less accuracy than do anesthesia providers, even when adjusted for multiple comparisons. Surgical and procedural departments were found to consistently under-rate the ASA-PS of patients in our clinical vignettes.

AB - Objective The American Society of Anesthesiologists physical status (ASA-PS) is associated with increased morbidity and mortality in the perioperative period. When surgeries are scheduled by surgeons and their staff at our large institution a presumed ASA-PS is chosen. This is because our institution (and, anecdotally, others in our hospital system and elsewhere), recognizing the relationship between higher ASA-PS and poorer postoperative outcomes, requires all patients with higher ASA-PS levels (≥ 3) to undergo enhanced preoperative workup. The patients may not, however, necessarily be seen in the anesthesia clinic prior to surgery. As a result, patients are assigned a presumed ASA-PS by a non-anesthesia provider (e.g., surgeons and physician extenders) that may not reflect the ASA-PS chosen by the anesthesiologist on the day of surgery. Errors in the accuracy of the ASA-PS prior to surgery lead to unnecessary and costly preoperative testing, delays in operative procedures, and potential case cancellations. Our study aimed to determine whether there are significant differences in the assignment of ASA-PS by non-anesthesia providers when compared to anesthesia providers. Design We administered an IRB-approved survey asking the ASA-PS of 20 hypothetical case vignettes to 229 clinicians in various departments. Responses by non-anesthesia providers were compared to the consensus of the department of anesthesiology. Setting Faculty office spaces and conferences. Patients No patients, physicians only. Interventions Survey administration. Measurements ASA-PS scores acquired from surveys. Main results Residents and faculty in the department of anesthesiology demonstrated no statistical difference in the median ASA score in 19/20 case scenarios. All other departments were statistically different when compared to the department of anesthesiology (p < 0.05). The probability of a department either over- or under-rating the ASA-PS was calculated, and is summarized in Fig. 3. All departments, except anesthesiology, had a 30–40% chance of under-rating the ASA-PS of the patients in the clinical vignettes. Conclusions Non-anesthesia providers assign ASA-PS with significantly less accuracy than do anesthesia providers, even when adjusted for multiple comparisons. Surgical and procedural departments were found to consistently under-rate the ASA-PS of patients in our clinical vignettes.

KW - ASA classification

KW - ASA score

KW - ASA-PS

UR - http://www.scopus.com/inward/record.url?scp=85012049384&partnerID=8YFLogxK

U2 - 10.1016/j.jclinane.2017.02.002

DO - 10.1016/j.jclinane.2017.02.002

M3 - Article

C2 - 28372650

AN - SCOPUS:85012049384

SN - 0952-8180

JO - Journal of Clinical Anesthesia

JF - Journal of Clinical Anesthesia

  • Account Link Sign In
  • Account Sign In
  • Quality and Practice Management
  • Managing Your Practice
  • ​Timely Topics in Payment and Practice Management

Anesthesia Payment Basics Series: #4 Physical Status

September 2019

To properly and accurately report anesthesia services, one must know and adhere to rules and guidelines that are specific to anesthesia care.  Additionally, the formula used to determine payment for anesthesia services is unique to anesthesia.  These rules and formula may be misunderstood or improperly applied.  This ASA Timely Topic is the fourth of a series that will break the components of anesthesia billing and payment down into individual components and provide explanation on what the components represent. 

Physical Status 

  The first article in this series covered CPT®, HCPCS and ICD-10-CM – important tools applicable to coding and billing across all specialties and types of care.  The second piece provided information on the coding resources that are specific to anesthesia.  Anesthesia modifiers and payment determination were the subject of the third article.  This fourth installment offers information about Physical Status.   Medicare does not recognize or pay additional units for Physical Status, but many private payers do.  As such, it is important that this is addressed within your contracts with private payers to avoid any ambiguity on the issue.  The ASA’s Annual Commercial Conversion Factor Survey asks whether payers cover physical status and with some regional variation, results show that overall, the percentage of payers that cover physical status has remained relatively stable from 2013 to 2018 with over 80% of the contracts included in the results covering physical status.      The status of patients undergoing surgery under anesthesia can range from a healthy patient to one that is critically ill or injured.  A patient with a past or current disease or condition may require different care than a healthier patient undergoing the same surgical procedure.  This differentiation is expressed by the physical status classification that is assigned to the patient by the anesthesiologist and is communicated on a claim by appending the appropriate modifier to the anesthesia code.  The physical status modifiers are found in both the CPT code set and the Healthcare Common Procedure Coding System (HCPCS).  For a refresher on CPT and HCPCS, see the June 2019 Timely Topic, Anesthesia Payment Basics Series: #1 Codes and Modifiers .   Information about reporting physical status is included in the ASA Relative Value Guide® (RVG™) and in CPT:

  • “All anesthesia services are reported by use of the anesthesia five-digit procedure code plus the addition of a physical status modifier.  These modifying units may be added to the base values. The use of other optional modifiers may be appropriate.”  
  • Physical status is included in CPT to distinguish between various levels of complexity of the anesthesia service provided.”

In October 2014, the ASA Expert Consensus Document, ASA Physical Status Classification was updated to include examples of each level of the classification to help anesthesiologists make the classification assignment.  More information and background are available in the June 2015 edition of the ASA Monitor . 

It is important to note that the assignment of a physical status classification is a clinical determination made by the anesthesiologist after evaluating the patient about to undergo anesthesia care. 

Putting It Together A patient covered by a private plan that includes coverage for physical status is to have a total knee replacement as described by CPT code 27447 - Arthroplasty, knee, condyle and plateau; medial AND lateral compartments with or without patella resurfacing (total knee arthroplasty).   Per the ASA CROSSWALK®, the anesthesia care may be best described with anesthesia CPT code 01402 - Anesthesia for open or surgical arthroscopic procedures on knee joint; total knee arthroplasty . Code 01402 has 7 base units.  Let’s assume total anesthesia time of 112 minutes.  The payer uses a 15-minute unit and rounds down to the nearest whole unit.  The conversion factor in our example will be $70.00 per unit. 

Payment will be calculated using the equation:

(Base Units+ Time Units+ Modifying Units) * Conversion Factor

If the patient is an ASA I:

(7 Base Units + 7 Time Units + 0 Physical Status Modifying Units) * $70.00 = $980.00

If the patient is an ASA III:

(7 Base Units + 7 Time Units + 1 Physical Status Modifying Units) * $70.00 = $1050.00

Physical Status is one modifying factor that may be included in anesthesia coding and payment. Look for our next article in this series which will cover Qualifying Circumstances. 

Advertisement

Issue Cover

  • Previous Issue
  • Previous Article
  • Next Article

What We Already Know about This Topic

What this article tells us that is new, materials and methods, determining resident clinical performance : getting beyond the noise.

  • Split-Screen
  • Article contents
  • Figures & tables
  • Supplementary Data
  • Peer Review
  • Open the PDF for in another window
  • Cite Icon Cite
  • Get Permissions
  • Search Site

Keith Baker; Determining Resident Clinical Performance : Getting Beyond the Noise . Anesthesiology 2011; 115:862–878 doi: https://doi.org/10.1097/ALN.0b013e318229a27d

Download citation file:

  • Ris (Zotero)
  • Reference Manager

Valid and reliable (dependable) assessment of resident clinical skills is essential for learning, promotion, and remediation. Competency is defined as what a physician can do, whereas performance is what a physician does in everyday practice. There is an ongoing need for valid and reliable measures of resident clinical performance.

Anesthesia residents were evaluated confidentially on a weekly basis by faculty members who supervised them. The electronic evaluation form had five sections, including a rating section for absolute and relative-to-peers performance under each of the six Accreditation Council for Graduate Medical Education core competencies, clinical competency committee questions, rater confidence in having the resident perform cases of increasing difficulty, and comment sections. Residents and their faculty mentors were provided with the resident's formative comments on a biweekly basis.

From July 2008 to June 2010, 140 faculty members returned 14,469 evaluations on 108 residents. Faculty scores were pervasively positively biased and affected by idiosyncratic score range usage. These effects were eliminated by normalizing each performance score to the unique scoring characteristics of each faculty member (Z-scores). Individual Z-scores had low amounts of performance information, but signal averaging allowed determination of reliable performance scores. Average Z-scores were stable over time, related to external measures of medical knowledge, identified residents referred to the clinical competency committee, and increased when performance improved because of an intervention.

This study demonstrates a reliable and valid clinical performance assessment system for residents at all levels of training.

Evaluating clinical performance of resident trainees is essential to education, but the validity of evaluation methods has been questioned.

In a 2-yr period, more than 14,000 electronic evaluations were submitted by faculty. Significant grade inflation could be removed by normalizing scores to each faculty member, yielding a more reliable and valid assessment of resident clinical skills.

RELIABLE measures of clinical performance are needed to enhance and direct learning, determine which trainees are ready for advanced training, and identify which are in need of remediation. 1 , 2 Unfortunately, evaluations of resident clinical performance suffer from a number of limitations, 3 , – , 5 such as trainees not being directly observed, 3 faculty leniency and grade range restriction, 6 , – , 8 concerns about validity of what is being assessed, 9 , – , 11 and the finding that even highly valid tests of medical knowledge may not 12 , 13 or may only modestly 14 , – , 17 predict competence in patient care. There are also issues of generalizability because Objective Structured Clinical Examinations (OSCEs) 18 and simulation-based examinations 19 , 20 sample only a subset of the domain of interest, and performance may not generalize to different circumstances. 10 , 21 , 22 Furthermore, even when faculty members observe the same clinical performance, they may disagree about their observations 23 or what constitutes an acceptable performance 24 or response to a situation. 25 Lastly, and of considerable importance, is that physicians' scores on high-stakes OSCEs may not predict what they do in actual practice. 26 Thus, measures of competence (what a physician can do) may not relate to performance (what a physician actually does in everyday practice). 27 , 28  

This article describes an approach to assessing anesthesia resident clinical performance using the Accreditation Council for Graduate Medical Education (ACGME) core competency framework, is based on what residents do in everyday practice, depends on direct observation, uses many different evaluators representing a wide range of situations, is linked to written formative feedback, and yields a large number of evaluations. It was hypothesized that clinical performance scores could be corrected for faculty member leniency (positive bias) and idiosyncratic grade range usage and then averaged to yield a normalized resident performance metric that was valid and that distinguished clinical performance levels with known degrees of statistical confidence. The clinical performance metric is stable over time, reliably identifies low performers, detects improvement in performance when an educational intervention is successful, is related to an external measure of medical knowledge, and identifies poor performance due to a wide variety of causes.

The Massachusetts General Hospital Institutional Review Board waived the need for informed consent and classified this study as exempt.

Evaluation Instrument and Evaluation Process

The department's Education Committee created an initial evaluation instrument that was sent to the full faculty for comment. Faculty input was incorporated, and an updated version was sent to all residents for additional comment. Resident feedback was incorporated, and the Education Committee created a final version of the instrument. The resident evaluation form has five distinct sections (appendix) and is confidential for the evaluator.

Absolute/Anchored ACGME Core Competencies Section.

The six ACGME core competencies are used, but patient care is divided into cognitive and technical sections yielding seven competency scores. The absolute/anchored scale uses a Likert scale (1–7) with descriptors of how much help the resident needed relating to each competency. A score of 5 was defined as performing independently and without the need for help.

Relative ACGME Core Competencies Section.

The relative scale asks how the resident performed compared with other residents in residency in the same training year. The relative scale uses a Likert scale (1–5) with descriptors of how the resident performed compared with peers. A score of 3 is defined as performing at peer level (average) compared with other Massachusetts General Hospital anesthesia residents in the same clinical anesthesia year (CA-year).

Comment boxes occur frequently within the form (after each core competency, specific strengths, specific areas for improvement, and the clinical competency committee [CCC] section).

CCC Section.

Five statements relating to essential competency attributes are listed. Each has a yes or no answer and any “yes” is considered concerning.

Faculty Member Confidence Section.

Faculty members indicate their willingness to let the resident provide independent and unsupervised care for each of eight cases of increasing difficulty.

The faculty was formally educated on this instrument during conferences and faculty meetings, but not all faculty members attended the sessions.

Who Evaluates Whom.

The electronic anesthesia system automatically determines which residents are supervised by which faculty members during the previous week (Sunday–Saturday). Duplicate interactions are collapsed into a single request for evaluation. Faculty members are permitted to submit additional evaluations at any time. For rotations that do not use the electronic anesthesia system (intensive care unit, preoperative clinic), matches are created by hand. The list of resident-faculty pairs is automatically sent to the New Innovations (New Innovations, Inc, Uniontown, OH) web site, which generates an electronic evaluation for each unique interaction. Faculty members are sent a link via   electronic mail containing the evaluations and are automatically sent reminder e-mails if they do not complete the evaluations within a week. Overall compliance is tracked, with a target of 60% for each faculty member. Noncompliant faculty members are contacted and encouraged to complete outstanding evaluations. Completed evaluations are downloaded from the New Innovations web site as Excel spreadsheets (Version 2003, Microsoft, Redmond, WA). Raw data are imported into Access (Version 2003, Microsoft) for analysis.

Z-scores normalize a single resident evaluation to the unique scoring attributes of the faculty member providing the evaluation. Evaluations submitted within a specified date window are used to determine the characteristics of each faculty member's scoring attributes. Z-scores were determined using absolute/anchored core competency scores (Z abs ), relative-to-peers core competency scores (Z rel ), or case confidence scores (Z conf ). Each faculty member's Likert scores were used to determine his or her personal mean and SD for each CA-year. Individual resident Z-scores were calculated as:

Resident Score (CA-year) is the Likert Score assigned to a particular resident by a faculty member. When more than one core competency section is included, the average of the Likert scores from the selected core competencies is used.

Faculty Member Mean (CA-year) is the mean Likert score given to residents of a similar CA-year by this faculty member.

Faculty Member SD (CA-year) is the SD of Likert scores given by this faculty member to residents of this CA-year.

Z-scores provide a measure of distance from the grader's mean score in terms of SD units. For example, a Z-score of −0.5 means that the faculty member scored the resident one half SD less than he or she normally scores residents of this same CA-year. Z-scores are essentially effect sizes because they are differences normalized by the SD. Any combination of core competencies can be used in the calculation of a Z-score. When core competencies are not mentioned, a Z-score refers to an average based on all of the core competencies. Faculty member confidence data were converted to Z-scores by first determining the breakpoint at which the faculty member converted from “yes” to “no” along the sequence of eight graded cases. For example, if a faculty member said yes to the first three cases and no for the remaining five cases, the breakpoint would be 3. This allows the determination of the mean and SD of the breakpoints for each faculty member for each CA-year.

In-training Examination Z-scores

Z-scores for the American Society of Anesthesiologists/American Board of Anesthesiology In-Training Examination (ITE) (Z ITE ) were computed for each resident by first subtracting the resident's individual ITE scores from his or her Massachusetts General Hospital residency class mean (CA-year–matched classmates) and then dividing by the class SD.

Statistical Analysis

Statistical results were determined using StatsDirect Version 2.6.6 (StatsDirect Ltd., Cheshire, United Kingdom), Excel (Version 2003), SAS Version 9.2 (SAS Institute, Cary, NC), or Origin Version 7.5 SR4 (OriginLab, Northampton, MA). Effect sizes were determined by Cohen d   and provide a measure of the size of a difference compared with the variation in the data. Effect sizes are classified as small (Cohen d  = 0.2), medium (Cohen d  = 0.5), or large (Cohen d  = 0.8). 29 Regression analyses are characterized by r   and r   2 (explained variance) values along with the number of data points used in the regression. Slopes were determined using linear regression. Slopes were compared using a Z-test statistic. 30 Repeat tests on the same sample are compared with paired t   tests. Independent samples are compared with unpaired t   tests assuming unequal group variance. Single-sample t   tests compared a specified reference value to a sample of values. Sample variances were compared using an F test. Chi-square analysis was used for categorical data and Yates' correction was applied if expected frequencies were less than 10. Scores for relative ACGME core competencies were compared in a linear mixed model (LMM) with fixed effects for resident year (CA-1, -2, or -3); length of training within year at the time of the evaluation to accommodate improvement in scores over the course of training; and the interaction between resident year and length of training, random participant- and faculty-specific intercepts, and variance heterogeneity by faculty member. Nonlinearity in the trends over length of training was assessed using a cubic spline, but the fit was not improved based on Akaike information criterion. Point and interval estimates from this analysis were compared with results obtained from analyses of Z-scores. LMM estimates of participant-specific CIs were roughly 20% wider and more variable than matched Z-score estimates, but inference for comparisons among resident years was unchanged. P   values were two-sided. The term “bias” is used throughout the study to denote the systematic tendency to assign performance scores that are higher than is normatively possible. With this particular usage, bias implies leniency. The terms “reliable” or “reliably” refer to dependable findings. With this usage, a score with a narrow 95% CI would be called reliable.

Completed Evaluations

Between July 1, 2008, and June 30, 2010, 14,469 evaluations were submitted. This represents an overall (all requested, all returned) compliance rate of 49%. Evaluations were submitted by 140 different faculty members, who entered at least 5 evaluations on a total of 108 different residents, who each had at least 10 evaluations. There were 5,404 CA-1, 4,319 CA-2, and 4,746 CA-3 resident evaluations. On average, each CA-1, CA-2, and CA-3 resident received 101, 70, and 73 evaluations, respectively. On average, each CA-1, CA-2, and CA-3 resident was evaluated by 49, 40, and 41, respectively, different faculty members. Comments were entered on 59.1% of all returned evaluations. Comments averaged 225 ± 209 characters.

Faculty Members Characterize Resident Performance with a Positive Bias

The relative performance Likert scale defined 3 as “peer average” for each CA-year. This is explicitly stated on each evaluation form. The average relative score assigned for all core competencies by each faculty member contributing at least 10 evaluations was determined using all data. The average faculty member assigned a relative score of 3.36, 3.51, and 3.68 to CA-1, CA-2, and CA-3 residents, respectively. Histograms of the average relative score assigned by each faculty member by CA-year are shown in fig. 1 . Using the expected value of 3.00 and the known SD of the faculty score distributions yields effect sizes for the bias of 0.91, 1.15, and 1.41 ( P  < 0.001 by single-sample t   test, all cohorts) for scores assigned to the CA-1, CA-2, and CA-3 residents, respectively. These are large effects because average scores are approximately 1 SD above the expected value of 3.00.

Fig. 1. Faculty members assign positively biased relative-to-peers scores. Histograms show counts, by clinical anesthesia (CA) year, of faculty members who assigned similar average relative-to-peers scores to residents. Average scores can range from 1 to 5, and 3 is defined as peer average. Bin widths are 0.1 score unit, and all faculty with an average score in that bin are counted. Counts were made for faculty members who submitted 10 or more evaluations per CA-year (CA-1: 88 faculty members, 4,630 evaluations; CA-2: 105 faculty members, 3,663 evaluations; CA-3: 110 faculty members, 4,034 evaluations).

Fig. 1. Faculty members assign positively biased relative-to-peers scores. Histograms show counts, by clinical anesthesia (CA) year, of faculty members who assigned similar average relative-to-peers scores to residents. Average scores can range from 1 to 5, and 3 is defined as peer average. Bin widths are 0.1 score unit, and all faculty with an average score in that bin are counted. Counts were made for faculty members who submitted 10 or more evaluations per CA-year (CA-1: 88 faculty members, 4,630 evaluations; CA-2: 105 faculty members, 3,663 evaluations; CA-3: 110 faculty members, 4,034 evaluations).

Faculty members also increase their bias as they score more senior residents. For faculty members who provided both CA-1 and CA-2 evaluations, average CA-2 relative scores were higher (CA-1 = 3.36 vs.   CA-2 = 3.48, N = 78 faculty, P  < 0.001 by paired t   test). For faculty members who provided both CA-2 and CA-3 evaluations, average CA-3 relative scores were higher (CA-2 = 3.52 vs.   CA-3 = 3.72, N = 97 faculty members, P  < 0.001 by paired t   test). For faculty members who provided both CA-1 and CA-3 evaluations, average CA-3 relative scores were higher (CA-1 = 3.37 vs.   CA-3 = 3.70, N = 79 faculty members, P  < 0.001 by paired t   test).

Bias Varies by Faculty Member

All faculty members have their own amount of bias. Their average relative-to-peers scores are widely distributed (SD = 0.46, fig. 1 ). Scores from a relatively unbiased faculty member are compared with scores from a more biased faculty member in figure 2 A. In addition to the variation in bias, faculty members also use different amounts of the score range. The used score range can be quantified by the SD of the scores given by each faculty member. Figure 2 B shows a histogram of SD for all faculty members having 10 or more evaluations for each of the CA-years (SD = 0.22). Faculty members use different amounts of the score range, as demonstrated by the lower average SD in scores given by one faculty member (SD = 0.26, N = 117 evaluations, CA-1 year, black arrow   fig. 2 B) compared with the higher average SD in scores given by another faculty member (SD = 0.68, N = 104 evaluations, CA-1 year, gray arrow   fig. 2 B).

Fig. 2. Faculty members differ in how much they inflate scores. A histogram of all relative-to-peers scores from a relatively unbiased faculty member (red bins,  N = 53 evaluations of residents in their third year of clinical anesthesia [CA-3]) is compared with a more biased faculty member (blue bins,  N = 42 CA-3 evaluations) (A ). Faculty members differ in how they use the available score range. The SD in score assignment was determined for each faculty member having 10 or more evaluations per CA-year. Bins are 0.1 SD units wide, and all faculty members with an average SD in that bin were counted. Data are from 88 faculty members with CA-1 data, 105 faculty members with CA-2 data, and 110 faculty members with CA-3 data. The black arrow  denotes a faculty member with a SD of 0.26 for CA-1 scores, and the gray arrow  denotes a faculty member with a SD of 0.68 for CA-1 scores (B ).

Fig. 2. Faculty members differ in how much they inflate scores. A histogram of all relative-to-peers scores from a relatively unbiased faculty member ( red bins,   N = 53 evaluations of residents in their third year of clinical anesthesia [CA-3]) is compared with a more biased faculty member ( blue bins,   N = 42 CA-3 evaluations) ( A  ). Faculty members differ in how they use the available score range. The SD in score assignment was determined for each faculty member having 10 or more evaluations per CA-year. Bins are 0.1 SD units wide, and all faculty members with an average SD in that bin were counted. Data are from 88 faculty members with CA-1 data, 105 faculty members with CA-2 data, and 110 faculty members with CA-3 data. The black arrow   denotes a faculty member with a SD of 0.26 for CA-1 scores, and the gray arrow   denotes a faculty member with a SD of 0.68 for CA-1 scores ( B  ).

Z rel Scores Correct for Individual Faculty Member Bias and Unique Score Range Use

Because faculty members are biased to various degrees ( fig. 1 ) and they each use different amounts of the score range ( fig. 2 B), a Z-score transformation was applied to the relative-to-peers scores (see Methods). Each faculty member's Z rel scores thus have an overall mean of 0.0 and SD of 1 for each CA-year. All Z rel scores for all residents were averaged (N = 13,639 evaluations), and the grand mean was 0.00000 with SD of 0.98623.

When a Faculty Member Evaluates the Same Resident on Two Occasions, the First Z rel Score Predicts Only a Small Amount of the Variance in the Second Z rel Score

Z rel scores were determined for the first and second occasions when a faculty member evaluated the same resident more than once. This resulted in 3,509 unique Z rel score pairings. A regression analysis demonstrated that the first Z rel score explained 23.1% of the variance of the second Z rel score (N = 3,509 pairs, r  = 0.48, r   2 = 0.231, P  < 0.001). A plot of unique Z rel score pairs demonstrates significant scatter in the data ( fig. 3 ).

Fig. 3. Each Zrelscore has only a modest amount of clinical performance information. The first Zrelscore assigned to a resident by a faculty member is plotted against the second Zrelscore assigned to the same resident by the same faculty member. Each of the 3,509 points is a unique resident–faculty member pairing. The first Zrelscore predicts 23.1% of the variance in the second Zrelscore; 1.6% of the Zrelscores lie outside the plot limits and are not shown.

Fig. 3. Each Z rel score has only a modest amount of clinical performance information. The first Z rel score assigned to a resident by a faculty member is plotted against the second Z rel score assigned to the same resident by the same faculty member. Each of the 3,509 points is a unique resident–faculty member pairing. The first Z rel score predicts 23.1% of the variance in the second Z rel score; 1.6% of the Z rel scores lie outside the plot limits and are not shown.

Signal Averaging Reveals Reliable Performance Scores

Because there is significant “noise” in each Z rel score, any single Z rel score will not provide a dependable assessment of resident clinical performance. However, averaging noisy signals will cause accumulation of the real signal while averaging out the noise component. Figure 4 A demonstrates how sequential Z rel scores yield a running average with a tighter and tighter nominal 95% CI as more signals (Z rel scores) are averaged. A histogram of Z-scores for this individual shows how Z-scores are distributed about the mean ( fig. 4 B).

Fig. 4. Signal averaging reveals a reliable clinical performance score from a noisy background. The first 100 sequential Zrelscores are shown for a single resident (blue circles ). The running average and the upper and lower 95% CIs on the running average are shown by the red filled circles  and red lines , respectively (A ). Z-scores are distributed broadly about the mean. Z-scores (N = 100) from the same resident are displayed as a histogram with a bin width of 0.25 (B ).

Fig. 4. Signal averaging reveals a reliable clinical performance score from a noisy background. The first 100 sequential Z rel scores are shown for a single resident ( blue circles  ). The running average and the upper and lower 95% CIs on the running average are shown by the red filled circles   and red lines  , respectively ( A  ). Z-scores are distributed broadly about the mean. Z-scores (N = 100) from the same resident are displayed as a histogram with a bin width of 0.25 ( B  ).

Z rel Scores Reliably Differentiate Relative Performance

All Z rel scores were used to determine each resident's mean Z rel and 95% CI. Of the 107 residents with 20 or more Z rel scores, 32 (30%) were reliably above average, 46 (43%) were not reliably different from average, and 29 (27%) were reliably below average ( fig. 5 ). When overall resident performance was determined using absolute data (Z abs ) or case confidence data (Z conf ), the different metrics yielded performance measures that were similar to Z rel . A resident's mean Z abs was related to his or her mean Z rel ( r  = 0.91, r   2 = 0.83, N = 105 residents, P  < 0.001). A resident's mean Z conf was related to his or her mean Z rel ( r  = 0.57, r   2 = 0.33, N = 105 residents, P  < 0.001). The number of evaluations with usable Z conf data were only 32.2% of the number with usable Z rel data (13,639). The lower correlation of Z conf with Z rel was not attributable to a sampling bias because the correlation was unchanged when the correlation was determined using only forms containing both Z conf and Z rel data ( r  = 0.58, r   2 = 0.34, N = 105 residents, P  < 0.001).

Fig. 5. Residents differ in their relative clinical performance. All data were used to determine mean Zrelscores for each resident having 20 or more evaluations. Error bars are the 95% CI on the mean. Residents with a mean that is reliably above or below 0 are shown in blue . Residents with a mean that is not reliably different from 0 are shown in red .

Fig. 5. Residents differ in their relative clinical performance. All data were used to determine mean Z rel scores for each resident having 20 or more evaluations. Error bars are the 95% CI on the mean. Residents with a mean that is reliably above or below 0 are shown in blue  . Residents with a mean that is not reliably different from 0 are shown in red  .

Average Z rel Scores Determine Resident Performance as Well as a Sophisticated LMM

Average Z rel scores and associated CIs do not take into account the repeated measures inherent in scoring the same resident on two or more occasions or scoring multiple residents by the same rater. To determine whether repeated measures were altering the estimates of resident clinical performance, average Z rel scores (based on 20 or more samples) were compared with performance estimates determined using relative-to-peers data in a LMM. Z rel scores provided a performance metric that was nearly identical to one determined using a LMM ( r  = 0.96, r   2 = 0.92, N = 107 residents, P  < 0.001). The ratio of the resident variance component to residual variation was 27%. Thus, the repeated scores for a given resident are not fully independent., and the CIs determined by simple averaging of Z rel scores will be narrower when repeated measures are included. The magnitude of this effect was determined by comparing CIs determined using Z rel scores to those determined by the LMM. On average, the 95% CIs were 17.7% wider when determined using the LMM than when determined using Z rel scores (N = 107 resident's 95% CIs, P  < 0.001 by paired t   test). The variance in the 95% CI was also higher when determined using the LMM (variance in Z rel score 95% CI = 0.0016, variance in LMM 95% CI = 0.0031, P  < 0.001 by F test).

There Is More Certainty in Determining Below-average Performances

When the SD of Z rel is small, it indicates lower variation in the underlying Z rel scores used to determine the mean. This leads to more certainty in the average score. When the SD of each resident's mean Z rel score was regressed against the mean Z rel score for the 107 residents with 20 or more Z rel scores, the regression showed that the lower the Z rel , the lower the SD ( r  = 0.60, r   2 = 0.37, N = 107 residents, P  < 0.001). Thus, there is less variation in individual Z rel scores for the lowest-performing residents than for the highest-performing residents. The number of evaluations submitted each month per resident did not differ between residents whose mean Z rel was above 0 (9.88 evaluations per month, N = 543 resident-months) and those whose mean Z rel was below 0 (9.98 evaluations per month, N = 696 resident-months) (unpaired t   test, P  = 0.67).

Z rel Scores Are Stable When No Performance Interventions Occur

The temporal stability of each resident's Z rel score was assessed by comparing his or her average Z rel score during one 6-month period with the average Z rel score 1 yr later during another 6-month period. All resident's having 15 or more evaluations during both 6-month periods (Period 1: October 1, 2008–March 31, 2009, Period 2: October 1, 2009–March 31, 2010) and who did not receive a performance intervention from the CCC were included. Forty-seven residents met these inclusion criteria. There was a strong relationship between the Z rel scores from Period 1 and subsequent Z rel scores from Period 2 ( r  = 0.75, r   2 = 0.56, N = 47 residents, P  < 0.001, fig. 6 ). When the single outlier resident was removed, the relationship was strengthened ( r  = 0.81, r   2 = 0.71, N = 46 residents, P  < 0.001).

Fig. 6. Zrelscores are stable over time when no interventions occur. Zrelscores are shown for the 47 residents having no performance interventions and who had 15 or more evaluations in both Period 1 (October 1, 2008–March 31, 2009) and 1 yr later in Period 2 (October 1, 2009–March 31, 2010). The fitted line includes all data points (r = 0.75, r  2= 0.56, N = 48, P < 0.001).

Fig. 6. Z rel scores are stable over time when no interventions occur. Z rel scores are shown for the 47 residents having no performance interventions and who had 15 or more evaluations in both Period 1 (October 1, 2008–March 31, 2009) and 1 yr later in Period 2 (October 1, 2009–March 31, 2010). The fitted line includes all data points ( r  = 0.75, r   2 = 0.56, N = 48, P  < 0.001).

Z rel Scores for Medical Knowledge Are Related to an Independent Metric of Medical Knowledge: The American Society of Anesthesiologists/American Board of Anesthesiology ITE

Z rel scores based solely on the core competency of Medical Knowledge (Z rel,MK ) were compared with the American Society of Anesthesiologists/American Board of Anesthesiology ITE examination. There were three cohorts of residents having both Z rel,MK scores and same-year ITE Z-scores (Z ITE ) (see Methods). The 2008 ITE was held in July. The 2009 and 2010 ITEs were held in March. The average Z rel,MK score for each resident was determined using evaluations submitted in the months after the exam (March through June). For each cohort, faculty member reference data were determined using their scores from the corresponding academic year (July–June). The 2008, 2009, and 2010 Z rel,MK scores were significantly related to the independently determined Z ITE scores for each year examined (2008: r  = 0.38, r   2 = 0.14, N = 71 residents, P  = 0.001; 2009: r  = 0.33, r   2 = 0.12, N = 76 residents, P  = 0.002; 2010: r  = 0.30, r   2 = 0.09, N = 69 residents, P  = 0.01).

Z rel Scores Independently Predict Referral to the CCC

Before the implementation of the new evaluation system, a number of residents had been referred to the CCC. The process leading to referral was multifactorial and included verbal communication, concerning written rotation evaluations, and electronic mail messages describing concerning performance. Once the Z rel score system was functional, the system was used to see of it would identify residents who had been independently referred to the CCC. Residents with a Z rel score greater than 0 were infrequently referred to the CCC (1 referred and 36 not). Residents with a Z rel score of 0 or less were more often referred to the CCC (19 referred and 25 not). A Z rel score of 0 or less was associated with an odds ratio of 27 in favor of being referred to the CCC ( P  < 0.001, two-tailed, chi-square with Yates' correction).

Z rel Scores Predict CCC Flag Density of Below-average Performers

The evaluation form has five questions from the CCC that raise concern if answered “yes.” CCC flag density is the fraction of evaluations having any of the CCC questions answered yes. For residents whose mean Z rel score was less than 0, there was a strong inverse relationship between Z rel score and CCC flag density ( r  = 0.90, r   2 = 0.82, N = 57 residents, P  < 0.001). For residents whose mean Z rel score was 0 or greater, there was no relationship between Z rel score and CCC flag density ( r  = 0.24, r   2 = 0.06, N = 51 residents, P  = 0.10).

Faculty Confidence in Having Residents Provide Unsupervised Care Increases as Residency Progresses

Faculty members provide a measure of their confidence in having the resident independently perform a series of eight cases of increasing difficulty. Of the evaluations completed, 5,006 had scores allowing a meaningful measure of when confidence was lost (see Methods). Confidence increased as months in residency increased ( fig. 7 ). Confidence increased most rapidly during the first year of residency (slope = 0.25 cases/month, r  = 0.39, r   2 = 0.15, N = 1,941 evaluations, P  < 0.001) and slowed during the second year (slope = 0.09 cases/month, r  = 0.16, r   2 = 0.03, N = 1,421 evaluations, P  < 0.001) and third year (slope = 0.12 cases/month, r  = 0.27, r   2 = 0.07, N = 1,644 evaluations, P  < 0.001) of residency. The rate of increase in confidence was significantly higher during the first year of residency compared with either the second ( P  < 0.001, Z-test statistic) or third year ( P  < 0.001, Z-test statistic) of residency. The rate of increase was not different between the second and third years of residency ( P  = 0.088, Z-test statistic).

Fig. 7. Confidence increases as residency progresses. The mean maximum confidence (defined as the most advanced case the faculty has confidence in having the resident perform in an unsupervised fashion) is shown for all residents for all 36 months of residency (N = 5,006 evaluations). Confidence rises throughout residency but rises fastest during the first 12 months. The Y axis spans the case complexity used in the evaluation form: case 1 is relatively easy and case 8 is extremely challenging. Error bars are the 95% CI on the mean.

Fig. 7. Confidence increases as residency progresses. The mean maximum confidence (defined as the most advanced case the faculty has confidence in having the resident perform in an unsupervised fashion) is shown for all residents for all 36 months of residency (N = 5,006 evaluations). Confidence rises throughout residency but rises fastest during the first 12 months. The Y axis spans the case complexity used in the evaluation form: case 1 is relatively easy and case 8 is extremely challenging. Error bars are the 95% CI on the mean.

Confidence Scores Increase More than Relative Scores as Residents Become More Senior

Faculty members score residents increasingly above average as residents become more senior, although this is normatively impossible. If confidence scores rise disproportionately more than relative scores, this implies a real increase in actual performance and not just an increase in bias. Scores from evaluations containing both confidence and relative-to-peers data were normalized by their respective scale ranges such that 0.0 and 1.0 were the lowest and highest scores attainable. As residents progressed through residency, their normalized relative-to-peers scores increased (slope = 0.0044 normalized units/month, N = 4,982 evaluations, P  < 0.001), as did their normalized confidence scores (slope = 0.018 normalized units/month, N = 4,982 evaluations, P  < 0.001). The overall rate of increase was 4.0 times faster for the confidence data than for the relative-to-peers data ( P  < 0.001, Z-test statistic). Figure 8 shows the differential growth in normalized confidence scores compared with normalized relative-to-peers scores as residency proceeds.

Fig. 8. Confidence scores increase faster than relative-to-peers scores as residency progresses. The average normalized confidence scores (red ) and average normalized relative-to-peers scores (blue ) are shown for each month of residency. Likert scores were normalized to a 0–1 scale, where 0 is the minimum and 1 is the maximum attainable score. Only evaluations having both a usable confidence score and a relative-to-peers score were included (N = 4,892). The overall slopes of the two data sets are different (P < 0.001, Z-test statistic).

Fig. 8. Confidence scores increase faster than relative-to-peers scores as residency progresses. The average normalized confidence scores ( red  ) and average normalized relative-to-peers scores ( blue  ) are shown for each month of residency. Likert scores were normalized to a 0–1 scale, where 0 is the minimum and 1 is the maximum attainable score. Only evaluations having both a usable confidence score and a relative-to-peers score were included (N = 4,892). The overall slopes of the two data sets are different ( P  < 0.001, Z-test statistic).

A Performance Intervention Can Significantly Improve Z rel Scores

Before this new system was used, a resident was referred to the program director using customary mechanisms. This resulted in an intervention in which performance issues were defined, written expectations were set forth, and consequences were defined. The program director, chair of the department, chair of the CCC, resident, and resident's mentor knew of the intervention. The faculty was otherwise unaware of the intervention. When the Z rel score system became functional, previously collected data revealed that the faculty had independently assigned below-average Z rel scores to this resident in the time leading up to the intervention (Z rel = −0.47, upper bound on 95% CI did not include 0). The resident's Z rel score increased significantly after the intervention (Z rel = 0.12, 95% CI included 0, P  = 0.003, unpaired t   test). Figure 9 shows the Z rel scores by month before and after the intervention. A second situation occurred after the Z rel score system was in use. The CCC detected a resident with very low Z rel scores, and a confidential educational intervention occurred. This included a written statement of specific concerns and expectations for improvement. The resident's Z rel score for the 6 months leading up to the intervention was well below average (Z rel = −0.66, upper bound on 95% CI did not include 0). The average Z rel score increased significantly for the 5 months after the educational intervention ( P  < 0.001, unpaired t   test) and was no longer below average (Z rel = −0.02, 95% CI included 0). Details and time courses of these two interventions are purposely left out to maintain anonymity of the residents.

Fig. 9. Zrelscores can increase after an education intervention. The mean monthly Zrelscores for a single resident are shown for 4 months before and 4 months after an education intervention. The intervention occurred at the arrow . The resident's mean Zrelscores for the 4 months before and after the intervention differ (P = 0.003) and are shown by the gray lines . Error bars are the 95% CI on the mean.

Fig. 9. Z rel scores can increase after an education intervention. The mean monthly Z rel scores for a single resident are shown for 4 months before and 4 months after an education intervention. The intervention occurred at the arrow  . The resident's mean Z rel scores for the 4 months before and after the intervention differ ( P  = 0.003) and are shown by the gray lines  . Error bars are the 95% CI on the mean.

The Overall System

The reported resident evaluation system follows many of the recommendations found in the review of Williams et al.   3 and is consistent with the view that faculty members can, in aggregate, reliably and validly assess resident clinical performance. 31 The system is based on direct observation of clinical performance, has broad systematic sampling, uses multiple raters, uses a ACGME Core Competency construct, currently separates formative feedback and evaluative numbers, encourages weekly evaluation, occurs in a naturalistic setting with relatively unobtrusive observation, corrects for grade inflation (bias) and differential grade range use, is related to important metrics of performance such as high-stakes medical knowledge tests (ITE) and referral to a CCC, uses only five or seven rating choices per item, and specifies the meaning of ratings ( table 1 ). A key finding in this study is that each Z rel score has only limited clinical performance information. These noisy data are handled effectively by signal averaging many scores to create an overall clinical performance metric for each resident. The analysis includes CIs, which are helpful when using data for decision-making. CIs help distinguish meaningful differences in performance from differences that are uncertain. Uncertainty can be caused by too few evaluations or large variations in the scores themselves. The author used his department's previous competency system to identify residents in need of remediation while gaining comfort with the Z rel score system. The Z rel score system essentially has supplanted the department's previous system because it reliably detects all residents who have significant performance issues. Despite Z rel scores being normalized values that do not contain absolute clinical competency information, the experience at the institution has shown repeatedly that a mean Z rel score of approximately −0.5 (or less) signals the need for intervention ( fig. 5 ). Residents with a mean Z rel score of less than approximately −0.6 present a challenge, and those with scores less than approximately −0.8 may face serious performance issues necessitating significant intervention. Residents whose Z rel score is so low that their upper 95% CI does not reach −0.5 are most concerning. Unless otherwise noted, individual Z rel scores are based on the average of the ACGME core competency subscores after a recent review found that raters typically are unable to assess independently the six core competencies. 32 This process appears to be one of the most robust and extensive evaluation systems found in the medical education literature.

Table 1. Features of the Clinical Performance Evaluation System

Table 1. Features of the Clinical Performance Evaluation System

Z-scores Correct for Biases

The relative-to-peers component of the evaluation system asks faculty members to score a resident's performance relative to his or her peer group (same CA-year within the same residency) for each competency. Nearly every faculty member provided scores that were well above average ( fig. 1 ). This bias was exaggerated when faculty members evaluated more senior residents. The finding that normative performance scores are inflated into the “above average” range is an example of the “Lake Wobegon” effect, which is not unique to physicians. 33 Because of the unique use patterns by each faculty member, it became apparent that a normalization process was needed to recenter the scores and adjust for differing score range use. Z-scores accomplish both of these requirements. In addition, because bias increased with CA-year, faculty scores were normalized for each CA-year. The Z-score transformation reduces the amount of construct-irrelevant variance 11 , 34 , – , 36 in the data. Z-scores can be averaged and compared in units of SD. The Z-score transformed data behave as expected with a grand mean of 0 and a SD of nearly 1.

A Single Z rel Score Has Only a Small Amount of Clinical Performance ‘Truth’ Associated with It

A key finding of this study was the low correlation between first and second Z rel scores when a faculty member evaluated the same resident on two occasions ( fig. 3 ). This indicates at most a modest halo effect 37 because faculty member scores differ significantly between subsequent evaluations of the same resident. Overall, approximately 23% of the second performance score can be explained by the first performance score. This small component likely contains the actual performance measure. This leaves 77% of the score as noise or unexplained variance. The low correlation between first and second Z rel scores may be partly attributable to the differences in the situations leading to each Z rel score. Clinical performance is highly affected by the circumstances of the event. This concept is known as “context specificity” 38 , 39 and explains why performance on one OSCE station predicts only a modest amount of the performance on the exact same OSCE station when using a different standardized patient. 21 Essentially, people fail to adequately consider the role of the situation in determining behavior and performance. 39 , 40  

Signal Averaging Is the Key to Determining Clinical Performance

Noisy signals such as Z rel scores are well handled by signal averaging, which reduces the noise and reveals the signal. Figures 4 A and B display significant variation in Z rel scores but a running average that converges on a “true” Z rel score with a small error signal. This allows an estimate of overall relative performance to emerge from the noise. Because of repeated measures, the Z rel score CIs of below-average performers typically reach statistical significance with a smaller number of evaluations than if an LMM had been used. Thus, the Z-score system will detect low performers sooner and enable educators to get them the help they need.

Do Z-scores Really Provide a Measure of Clinical Performance?

There are four lines of evidence supporting Z rel scores as a measure of actual clinical performance. First, Z rel scores determined using just the scores for medical knowledge (Z rel,MK ) were related to an independent determination of medical knowledge. The strength of the relationship indicates that Z rel,MK scores explain approximately 10–15% of the variance in ITE scores. Second, the likelihood of being referred to the CCC was independently related to mean Z rel scores. Residents with a Z rel score of 0 or less were referred to the CCC with an odds ratio of 27. The author's CCC now uses Z rel scores to detect low performers. Third, as residents progress through residency, the normalized confidence scores increased 4.0 times faster than the normalized relative scores ( fig. 8 ). If scores were simply related to progressive bias or construct-irrelevant variance, 35 , 36 the ratio of normalized confidence to normalized relative scores would remain constant. Fourth, CCC flag density, an independent measure of concern with clinical performance, is strongly related to lower Z rel scores.

The finding that residents with higher average Z rel scores have more variance in their Z rel scores is intriguing. One explanation may be that it is difficult to consistently deliver an above-average performance, and this may add variance to their scores. It is also possible that the faculty have more agreement on what constitutes poor performance than what constitutes excellent performance. 31  

Why Are Z rel,MK Scores Only Slightly Related to ITE Scores?

A modest but real relationship was found between the Z rel score assessment of medical knowledge and the ITE-based assessment of medical knowledge. Faculty members are unaware of residents' ITE scores except for those few that they mentor, so the correlation is not caused by the faculty's knowledge of residents' ITE scores. Although United States Medical Licensing Examination scores predict future standardized test results, such as ITEs, 12 , 16 they are poorly 16 or not at all 12 related to clinical performance. Even when the medical knowledge being tested is related to the actual clinical scenario of an OSCE, it hardly predicts performance on that OSCE. 21 Thus, weak correlations between Z rel,MK and ITE scores are expected and may be attributable to a number of factors. Faculty members may not actively probe residents to determine the true extent of their medical knowledge. Furthermore, when residents and faculty members interact, they are using practical or applied medical knowledge, as opposed to the theoretical medical knowledge tested by standardized examinations. Most medical decisions in natural settings have significant amounts of uncertainty, are prone to bias and cognitive errors, 41 , 42 and require significant amounts of judgment. 43 This is in sharp contrast to ITE questions, which have only one correct answer. There is a significant amount of research showing that cognitive ability (intelligence) is poorly or not related to the ability to avoid biased thinking. 44 , – , 46 Thus, the Z rel assessment of medical knowledge may be an excellent proxy for day-to-day clinical decision-making and serve as a metric for what residents do in practice, an important measure.

Z-scores Are Stable Unless the Resident Is Coached onto a New Plane of Performance

The stability of Z rel scores over the course of 1 yr is significant ( fig. 6 ). The mean Z rel score from the first time period explained 56% of the variance in the mean Z rel score 1 yr later, indicating that scores generally are stable. Recent studies indicate that certain personality traits are related to better and worse clinical performance. 47 , 48 If this is true, stability in relative clinical performance can be explained partially by the general stability of personality traits. 49  

Z-scores Change When a Resident's Performance Changes

If clinical performance is not malleable, there is little reason to provide feedback. This article provides two clear examples of clinical performance improvement associated with a feedback intervention. There are three important features found in these examples (see fig. 9 for one example). First, the Z rel score system independently identified the resident. Second, the resident's Z rel scores increased after the intervention without the faculty being aware of the intervention. This indicates that the faculty view performance for what it is and do not allow previous reputation to taint significantly the evaluation process. Third, it associates feedback and an educational intervention with improved clinical performance, a key role of residency. 50 It is likely that the evaluation system served to identify a performance problem and track its improvement. The educational interventions, in conjunction with developmental feedback, are what likely caused the performance improvement.

Is There a Particular Score Defining Adequate Performance?

When the residents have average Z rel scores of less than approximately −0.3 and the 95% CI does not include 0 ( i.e.  , their performance is reliably below average), the author's CCC carefully examines the corresponding comments to determine the nature of the low performance. It has been found that there are many routes to low performance, including poor medical knowledge, low effort, unprofessional behavior, interpersonal and communication difficulties, poor motivation to improve, confidence in excess of competence, defensiveness, anxiety, low confidence, poor decision-making, and so forth. The comments are used to help develop educational interventions that target the area in need of improvement. Residents exhibiting noncognitive and nontechnical causes of low performance (such as low motivation for learning, defensiveness, anxiety, and so forth) are readily identified using this system. However, the underlying causes sometimes can be difficult to identity. The comments section usually provides strong hints to the cause but not always. In situations in which the precise noncognitive cause for low performance cannot be identified, outside learning specialists, psychiatrists, cognitive behavioral therapists, and personal coaches have been used. The results usually have been quite rewarding. Additional information is limited to protect the privacy of individual residents.

The ACGME has reframed residency training to focus on outcomes instead of process. 32 Despite this call, there are few outcomes that independently measure competency and fewer still that measure performance. Unfortunately, even when OSCEs or other highly reliable metrics are used to determine clinical competency, there is only a weak relationship with actual clinical performance. 21 , 26 , 51 This indicates the need for more naturalistic measures of performance, 2 , 5 , 28 , 52 , – , 55 such as the one described in this article. Once clinical performance becomes measurable, there remains the task of standard setting. Standard setting is largely context sensitive; for example, a physician deemed acceptable by today's standards may not be considered acceptable by future standards. Thus, normative standards still have an important role in determining adequacy of performance. 31 , 56  

Limitations of the Study

This study is limited by its inability to establish absolute performance levels. However, the relationship between relative and absolute performance appears to be real based on the ability of Z rel scores to predict ITE scores and CCC referrals. Z rel scores assume normally distributed data, and faculty member scores may not always be normally distributed. Individual Z rel scores contain only a modest signal, so large sample sizes are required to attain reliable measures of clinical performance. The Z rel system does not take into account repeated measures; however, using an LMM to correct for repeated measures did not significantly affect the estimates of clinical performance. Importantly, averaging Z rel scores typically results in more narrow CIs than those determined using an LMM. This may result in earlier detection of poor performance. The LMM is an excellent tool but does not easily lend itself to practical use. The current study is receiving approximately one half of the evaluations requested. This means there is a risk of a sampling error. Many different faculty members contribute to each resident's Z rel score, so it is unlikely that the error is large. Another limitation is the delay in requesting an evaluation. The delay is, on average, one half a week but can be as short as 1 day or as long as 1 week, depending on when during the previous week the interaction occurred. A more concerning delay occurs when faculty members delay completing the evaluation. This can amount to many weeks or even months. Currently, outstanding evaluations are deleted after 3 months.

This study demonstrates that when faculty members evaluate resident clinical performance in a naturalistic setting that encompasses a variety of clinical situations, they assign scores that suffer from significant grade inflation and varying degrees of grade-range usage. The unique grading characteristics of each faculty member were used to normalize the scores that each faculty member assigned. Resulting single Z rel scores were shown to contain a modest amount of true clinical performance information. The low information content of single scores was largely circumvented by averaging many independent scores to arrive at a metric that was related to clinical performance measures, including referral to the CCC, medical knowledge scores (ITE scores), and growth in faculty confidence in allowing residents to undertake independent and unsupervised care of increasingly complex patients. The strength of the system is its ability to average out irrelevant variance, which leaves a useful metric of clinical performance. The metric was stable over time. Although the metric is normalized and thus does not measure absolute clinical performance, it is able to detect poor clinical performance, which faculty members, in aggregate, appear to agree upon. When mean Z rel scores are less than approximately −0.5, it signals the need to look into the cause(s) of the poor performance, and the comments section can help identify what can be done to improve performance. Two exemplar residents with low clinical performance scores each received an educational intervention based on the information contained in the comments sections, and both experienced significant improvement in performance after the intervention.

The author thanks the faculty members who spent time and effort evaluating residents and extends a special thanks to those who wrote comments aimed at improving resident performance. The author also thanks Eric A. Macklin, Ph.D. (Instructor, Harvard Medical School, Assistant in Biostatics, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts), for statistical advice.

Figure. No caption available.

Figure. No caption available.

Figure. No caption available.

Citing articles via

Most viewed, email alerts, related articles, social media, affiliations.

  • ASA Practice Parameters
  • Online First
  • Author Resource Center
  • About the Journal
  • Editorial Board
  • Rights & Permissions
  • Online ISSN 1528-1175
  • Print ISSN 0003-3022
  • Anesthesiology
  • ASA Monitor

Silverchair Information Systems

  • Terms & Conditions Privacy Policy
  • Manage Cookie Preferences
  • © Copyright 2024 American Society of Anesthesiologists

This Feature Is Available To Subscribers Only

Sign In or Create an Account

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • BMC Anesthesiol
  • PMC10388503

Logo of bmcanes

Anesthesia quality indicators to measure and improve your practice: a modified delphi study

May-sann yee.

1 Southlake Regional Health Centre, Newmarket, ON L3Y 2P9 Canada

Jordan Tarshis

2 Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON Canada

Associated Data

The datasets generated and/or analyzed during the current study are not publicly available because the institutional rules strictly prohibit releasing the native data on the web but are available from the corresponding author on reasonable request.

Implementation of the new competency-based post-graduate medical education curriculum has renewed the push by medical regulatory bodies in Canada to strongly advocate and/or mandate continuous quality improvement (cQI) for all physicians. Electronic anesthesia information management systems contain vast amounts of information yet it is unclear how this information could be used to promote cQI for practicing anesthesiologists. The aim of this study was to create a refined list of meaningful anesthesia quality indicators to assist anesthesiologists in the process of continuous self-assessment and feedback of their practice.

An initial list of quality indicators was created though a literature search. A modified-Delphi (mDelphi) method was used to rank these indicators and achieve consensus on those indicators considered to be most relevant. Fourteen anesthesiologists representing different regions across Canada participated in the panel.

The initial list contained 132 items and through 3 rounds of mDelphi the panelists selected 56 items from the list that they believed to be top priority. In the fourth round, a subset of 20 of these indicators were ranked as highest priority. The list included items related to process, structure and outcome.

This ranked list of anesthesia quality indicators from this modified Delphi study could aid clinicians in their individual practice assessments for continuous quality improvement mandated by Canadian medical regulatory bodies. Feasibility and usability of these quality indicators, and the significance of process versus outcome measures in assessment, are areas of future research.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12871-023-02195-w.

Continuing professional development (CPD) refers to the ongoing process of developing new knowledge, skills, and competencies necessary to maintain and improve professional practice. Continuing quality improvement (cQI) is a systematic approach to assessing and improving quality of care by professionals which involves collecting data on indicators of quality and using this data to identify areas for improvement and develop strategies to address them [ 1 ]. Anesthesia quality indicators are specific measures used to assess clinical care in anesthesia. Ongoing learning and professional development with change implementation informed by regular feedback using quality indicators that are transparent, reliable, evidence-based, measurable, and improvable is critical to ensuring anesthesiologists continue to provide high-quality and relevant care that meets the needs of their patients.

Recent changes in post-graduate medical education (PGME) training in Canada have necessitated changes in continuing professional development (CPD) requirements for practicing clinicians. While the adoption of competency-based education has fully penetrated anesthesia postgraduate medical education (PGME) training programs in Canada, it is in much earlier stages of implementation in the continuing education realm beyond PGME. The current Royal College of Physicians and Surgeons of Canada (RCPSC) Maintenance of Certification (MOC) program, the national CPD program for specialists, states that, “All licensed physicians in Canada must participate in a recognized revalidation process in which they demonstrate their commitment to continued competent performance in a framework that is fair, relevant, inclusive, transferable, and formative” [ 1 ]. This mandate for continuing quality improvement (cQI) applies not only to physicians but also to the national specialty societies providing continuing professional development resources to their physician members.

The Federation of Medical Regulatory Authorities of Canada (FMRAC) published a document titled, “Physician Practice Improvement” in 2016, with the goal of supporting physicians in their continuous commitment to improve their practice [ 2 ]. Their suggested five-step iterative process involves (1) understanding your practice, (2) assessing your practice, (3) creating a learning plan, (4) implementing the learning plan, and (5) evaluating the outcomes.

In 2018, the CPD report from The Future of Medical Education in Canada (FMEC) project was published [ 3 ]. In this report, principle #2 states, “The new continuing professional development (CPD) system must be informed by scientific evidence and practice-based data” and should, “…encourage practitioners to look outward, harness the value of external data, and focus on how these data should be received and used”, stressing the importance of the data being from physicians’ own practices.

Although these reports make clear a link between competency-based continuing professional development as a physician in practice and the importance of gathering and analyzing physician specific data, it neither provides guidance on what data is relevant for anesthesia, nor how to gather it. While many national organizations including the Canadian Anesthesiologists’ Society publish Guidelines for the Practice of Anesthesia [ 4 ], these guidelines are distinct from practice quality indicators. Internationally, national anesthesia specialty societies and safety groups have published lists of anesthesia quality indicators, but the evidence for many of these indicators is weak and not broad-based. Haller et al. published a systematic review of quality indicators in anesthesia in 2009; however, the focus was neither on physician CPD nor cQI [ 5 ]. An important distinction exists between the goal of this study and from that of the recent Standardized Endpoints in Perioperative Medicine and the Core Outcome Measures in Perioperative and Anesthetic Care (StEP-COMPAC) initiative, which focused on establishing clear definitions for outcomes for clinical trials [ 6 – 10 ], and not for physician performance improvement.

Therefore, a need currently exists for a list of quality indicators that are relevant to physicians’ goals of continuing quality improvement and ongoing professional development. Furthermore, as electronic anesthesia information management systems (AIMS) become ubiquitous, it is essential that a list of indicators relevant to individuals and the anesthesia community be developed to forward the goal of competency-based CPD. Ideally these indicators would be readily extractable from an AIMS. The purpose of this study was to create a list of anesthesia quality indicators for anesthesiologists to help guide self-assessment and continuing quality improvement.

This study received Johns Hopkins Institutional Review Board application acknowledgement (HIRB00008519) on May 27, 2019.

The original Delphi method, first described by Dakley and Helmer in 1962 [ 11 ], was used as a method to generate specific information for United States National Defense using a panel of selected experts starting with an open questionnaire. The modified Delphi technique was used to streamline the time and effort of the participants, and the modification involved starting with a pre-selected set of items identified by a literature search rather than with an open questionnaire.

The literature search was performed with the help of a medical health informationist by a review of the literature published between 2009 and 2019 in Pubmed, including Ovid Medline and Cochrane content, using the search protocol outlined in Supplementary Table S1 .

Retrieved articles were reviewed by the principal author to determine the relevance to the topic. Inclusion criteria included items deemed to be anesthesia quality indicators in systematic reviews completed within 10 years of the study start date, anesthesia quality indicators currently in use in Canadian academic institutions, anesthesia quality and safety indicators in published articles in peer-reviewed journals, anesthesia quality indicators identified in the Anesthesia Quality Institute National Anesthesia Clinical Outcomes Registry, as well as any additional items generated by the panel. The list of anesthesia quality indicators was reviewed by the second author prior to distribution. The focus on the last ten years of published data helped ensure that the indicators were the most up-to-date available.

Selection of the Delphi panel was based on a stratified random sampling technique [ 11 ]. Anesthesiologists representing the different regions across Canada were identified and approached based on their active involvement in the Canadian Anesthesiologists’ Society Continuing Education & Professional Development Committee, Quality & Patient Safety Committee, Standards Committee, Association of Canadian University Departments of Anesthesia Education Committee, the Royal College of Physicians and Surgeons of Canada Specialty Committee in Anesthesiology, or academic involvement. A minimum of 12 participants was sought to ensure validity of the responses [ 12 ]. Written informed consent was obtained from all participants.

The survey was created using a matrix table question type with a 2-point binary scale (agree/disagree) with a single answer option. The survey was optimized for mobile devices and each item had an adjacent textbox for comments. Additional items and general comments were solicited at the end of each survey round. A reiteration of the study purpose, research questions, and instructions were emailed to participants along with an anonymous link to the survey. The surveys required 10 to 15 min to complete. Panelists were given a window of 2 weeks to complete each survey round, with 4 weeks between each round.

All responses were gathered anonymously and tallied by Qualtrics survey collection and analysis software (Johns Hopkins University access). Consensus was a priori defined as agreement of greater than 70% of the group. With 14 panelists, a consensus was equivalent to 10 or more points on any given item. Subsequent Delphi rounds were planned to continue until stability with less than 15% change in responses from the previous round was achieved. Items that reached consensus would be removed and not recirculated. New items generated by the panelists, items that did not reach consensus, and panelist comments were shared anonymously in subsequent rounds.

A total of 28 articles published on anesthesia quality indicators from 2009 to 2019 were identified. A subset of these articles was useful for item generation [ 5 – 10 , 13 – 30 ], including several systematic reviews [ 5 , 6 , 8 – 10 , 21 , 25 , 30 ]. Review of the American Society of Anesthesiologists Anesthesia Quality Institute and Wake Up Safe websites, as well as communication with anesthesia quality experts (separate from the study panel) from two academic centers, provided additional information. A total of 132 anesthesia quality indicators were identified for the initial round of the study. These indicators are presented in Supplementary Table S2 .

Twenty-one Canadian anesthesiologists were approached and fourteen consented to participate. A consent form was emailed to those who expressed interest in participating and those who returned a signed consent form were included in the study.

An expert is a person who has a high degree of skill and knowledge in a particular field or subject, acquired through training, education, and experience. They are considered to be authoritative and capable of providing valuable advice and guidance in their area or expertise. The members of the expert panel for this study were identified based on their ongoing involvement with the Canadian Anesthesiologists’ Society (CAS) Continuing Education and Professional Development Committee, the CAS Quality & Patient Safety Committee, the CAS Standards Committee, the Canadian Journal of Anesthesia editorial board, the Royal College of Physicians & Surgeons of Canada Specialty Committee in Anesthesiology, the Association of Canadian University Departments of Anesthesia Education Committee, and University of Toronto Department of Anesthesiology & Pain Medicine faculty.

This expert panel was representative of the different regions across Canada (British Columbia 1; Alberta 2; Manitoba 3; Ontario 4; Quebec 1; Nova Scotia 1; Newfoundland 2). The group spanned all levels of practice with 2 members in practice for < 5 years, 2 members between 5 and 10 years in practice, and the remaining 10 members in practice for > 10 years. There were 7 self-identified females and 7 self-identified males on the panel (Table  1 . Panelist Demographics).

Panelist Demographics. Abbreviations: CAS – Canadian Anesthesiologists’ Society, CEPD – Continuing Education & Professional Development, UBC – University of British Columbia, ACUDA – Association of Canadian University Departments of Anesthesia QPS - Quality and Patient Safety, CJA – Canadian Journal of Anesthesia, CPD – Continuing Professional Development

QPS – Quality and Patient Safety, CJA – Canadian Journal of Anesthesia, CPD – Continuing Professional Development

For Rounds 1 through 3, expert panelists were given the following instructions, “ The following items are elements of quality in anesthesia care. Please evaluate each item or event to determine if you think it is reasonable and appropriate for use as a measure of an individual anesthesiologist’s practice by ticking ‘agree’ or ‘disagree ”.

One hundred thirty-two indicators were circulated to the panel in the initial round. Thirteen out of 14 participants (93%) responded to the survey. Item response rate variability; one hundred twenty-one items had 13 respondents, 10 items had 12 respondents, and 1 item had 11 respondents. Consensus (> 70%) was achieved for 85 items (83 accept; 2 reject). The item with only 11 responses reached consensus to reject. The 85 items that reached consensus were removed from the list and 47 items remained. By combining the 47 remaining items with 9 new items generated from the panel, a total of 56 items were prepared for circulation in the next round.

Fifty-six items were circulated. Twelve out of 14 participants (86%) responded. Item response rate variability: 50 items had 12 respondents; 6 items had 11 respondents. Consensus (> 70%) was achieved for 13 items (11 accept; 2 reject). The 13 items that reached consensus were removed from the list and 43 items remained. Since the process produced duplicate items in concept, with differing wording, both authors reviewed and curated the list to combine duplicate items without adding or removing any concepts, and in so doing the list was condensed down to 37 items. An example of this process is that failed spinal block, incomplete spinal block, and postdural puncture headache were 3 separate items and were combined into a single item, “complications of neuraxial block”.

Of the 37 items circulated, consensus was achieved for 7 items (5 accept; 2 reject). Thirteen out of 14 participants (93%) responded. Item response rate variability: 35 items had 13 respondents; 2 items had 12 respondents. The thirty items that did not reach consensus were eliminated.

After 3 rounds, a total of 132 items were evaluated. Ninety-nine items were accepted with greater than 70% consensus. Six items out of 132 were rejected with greater than 70% consensus. Nine new items were generated from the panel. Items that reached consensus were not recirculated to panelists. Significant redundancy in the 99 items that reached consensus was eliminated by combining items, reducing the list to 56 items (Fig.  1 ).

An external file that holds a picture, illustration, etc.
Object name is 12871_2023_2195_Fig1_HTML.jpg

Modified Delphi results summary

There was a 10-month pause between rounds 3 and 4 due to Covid19 pandemic related disruptions. In the 4th round, the 56-item list was sent out to the study panel with specific instructions to “ select 20 anesthesia quality indicators from the list below that you believe to be of top priority in the continuous self-assessment and feedback of an anesthesiologists’ practice ”. The electronic survey tool required exactly 20 responses, ranking of these 20 responses was not required. All 14 study panelists responded in the Round 4. Table  2 ranks the 56 indicators according to the number of votes received from the panel in Round 4.

List of anaesthesia quality indicators ranked by response in the fourth round of the mDelphi. The % column is the percent of respondents who ranked the indicator in the top 20. Similar items are grouped together though deemed sufficiently different to list separately. An asterisk marks the items deemed most easily extractable from and EMR/AIMS

Abbreviations: ToF train-of-four; PACU post-anesthesia recovery unit; SBP systolic blood pressure; PONV postoperative nausea & vomiting; GA general anesthesia; ICU intensive care unit; ERAS early recovery after surgery; TRALI transfusion-related acute lung injury; OR operating room; N/A not applicable

The overall goal of this initiative was to answer the question of whether in the current era of competency-based medical education and the increasing use of electronic medical records and AIMS, can a consensus list of indicators be identified to aid clinicians and Departments in promoting practice and performance improvement by measuring, analyzing, and using the data to improve the quality of anesthetic care. This process requires establishing a list of anesthesia quality indicators as an essential first step. Our study determined that airway complications, incidence & duration of perioperative adverse events, number of medical errors, patient satisfaction, perioperative residual neuromuscular blockade requiring intervention by an anesthesiologist, patient temperature less than 35.5 Celsius on arrival to PACU, complications of or failed neuraxial block, and incidence of severe PONV to be the most important anesthesia specific quality indicators for continuous self-assessment and feedback of an anesthesiologist’s practice.

It is useful to determine the type of categories under which these quality indicators can be grouped. In a seminal manuscript, Donabedian categorized quality indicators into 3 groups: structure (supportive and administrative), process (provision of care), and outcomes (measurable and patient related) [ 31 ]. In their 2009 systematic review Haller et al. identified 108 quality indicators (only 40% of which were validated beyond face validity), and found that 57% were outcome metrics, 42% measured process of care metrics, and 1% were structure-related metrics [ 5 ]. Hamilton et al. (2021) reviewed regional anesthesia quality indicators and found that 76% of 68 identified items were outcome measures, 18% process of care measures, and 6% structure-related [ 25 ]. Our findings identified 56 consensus quality indicators, 52% were outcome-related, 35% were process-related, and 12% were structure-related indicators. This is in agreement with other studies, with the top results being primarily outcome indicators, followed by process and then structure. Process indicators in anesthesia can be difficult to measure because there is variability in practice between providers and healthcare settings that make it difficult to develop standardized processes. Anesthesia care is a complex process with multiple steps making measuring and tracking time-consuming and resource intensive. Smaller health care settings and outpatient procedures may have limited opportunity to collect data on anesthesia processes. There is also a lack of consensus among healthcare providers regarding the most important process to measure and track in anesthesia care. For all these reasons, there are relatively less process indicators compared to outcome quality indicators in anesthesia.

Demographic indicators were included in the outset of this study because items such as surgical service, surgical priority, ASA status, caseload, number of GA cases, number of spinals provide a snapshot of an individual anesthesiologist’s practice and serves to help clinicians understand and assess their practice by following the first 2 steps of the FMRAC’s 5-step iterative process to practice improvement: (1) understanding your practice and (2) assessing your practice.

Perioperative mortality is a notably absent quality indicator in this study. Benn et al. noted that as the anesthesia specialty has been at the forefront of improving safety in healthcare, significant morbidity and mortality attributable to anesthesia has decreased significantly over the last half century. Mortality is a poor anesthesia quality indicator because it is rare and usually related to factors outside the anesthesiologists’ control. Data from the UK reveals that less than 1% of all patients undergoing surgery die during the same hospital admission and perioperative mortality of a healthy elective patient undergoing surgery is a mere 0.2% [ 32 ].

Relying on expert opinion and consensus, the modified Delphi technique was intentionally chosen for this study because a strong level of evidence for most anesthesia quality indicators is lacking. Expert opinion, therefore, provides a level of face validity. Advantages of the modification, include improved initial round response rates, solid grounding in previously developed work, reduced effect of bias due to group interaction, and assured anonymity while providing controlled feedback to participants [ 33 ]. The variable item response rates on the Delphi rounds are a common challenge to this method despite measures to prevent panel attrition including, (1) ensuring each round required less than 15 min to complete, (2) not recirculating items that reached consensus, and (3) using two options agree/disagree rather than a rank scale (e.g. Likert). Large datasets containing many items is a recognized challenge. However, previous attempts to reduce fatigue by creating competency subsets, sub-panels, or rotational modifications were largely unsuccessful, resulting in an increased number of rounds and introduction of bias, while being subject to the same factors that threaten the validity of any Delphi study (lack of experts on the panel, lack of clear content definition, poorly developed initial dataset) [ 34 ]. The item response rates of the 14-member panel ranged between 86 and 100% indicating that there was a consistent level of interest among the group members in participating in this study.

The modified Delphi study begins with a list of pre-selected items, but also gives panel members opportunity to generate new items. The elements of quality can be used to define what is considered to be good quality, and the specific quality indicators can be selected to measure and track each element. The term ‘element of quality’ was used in the instruction to panel members to keep the process open, inclusive, and as broad-based as possible as there may be newly emerging elements of anesthesia care or that have yet to be fully defined or properly studied, that could be added to the list for consideration.

Using quality indicators with the intent of providing effective feedback to improve quality requires that indicators be transparent, reliable, evidence-based, measurable, and improvable. Feedback processes should be regular, continuously updated, comparative to peers, non-judgmental, confidential, and from a credible source [ 32 ]. EMR/AIMS is an excellent source of data with these qualities yet requires time, technological skills, and institutional financial investments to initiate and maintain. The intent of this study was to focus on quality indicators extractable from an EMR/AIMS, and the participants were informed of this goal in the introduction to this study. Nonetheless, many of the indicators proposed by the participants are broad and may not be easily extractable from an electronic system. The use of EMR/AIMS in Canada at the time of the study was highly variable in both the availability of use and the specific software being used and may have contributed to the generated item list not including exclusively extractable items. This discrepancy between intent and outcomes of this study are indicative of the challenges of identifying, gathering, and distilling the massive quantity of extractable data from an EMR/AIMS.

There were several limitations to this study. The final list of generated items in Table  1 has been reviewed and items marked with an asterisk have been deemed to be most likely to be extractable from an EMR, based on the quantitative nature of the item, recognizing there is heterogeneity in the data mining capabilities of various electronic records and that the quality of data extraction is directly related to the quality and detail of input data. For example, aspects of care which are multi-dimensional, such as patient satisfaction would be more difficult to extract from most EMR’s compared to a concise, focused element, such as measured temperature of less than 35.5 Celsius on arrival in the post anesthetic care unit. Some institutions might consider an automated dashboard using indicators which would require efforts to set up but once in place could provide ongoing, on-demand clinician feedback [ 35 ]. Quality indicators can be used in a balanced score card or a quality clinical dashboard for the purposes of continuing quality improvement. The balanced scorecard approach functions by linking clinical indicators to an organization’s mission and strategy in a multi-dimensional framework. A quality clinical dashboard is used to provide clinicians with relevant and timely information that informs decisions and helps monitor and improve patient care [ 26 ]. Regardless of the feedback methods, effecting lasting change in clinician practice and patient outcomes can be challenging.

Although efforts were made to obtain broad national geographic representation of participants and individuals were chosen based on their background in education and quality improvement, it is recognized that some valuable data may have been overlooked by not including allied health workers and patients in this study. However, continuous performance improvement, a form of Lean Improvement [ 36 ], emphasizes the tenets that ideas for improvement originate from people who do the work and that it is essential to understand the work process before trying to fix it. Additionally, a recent study by Bamber et al. [ 27 ]., found that both allied health members and patients included in their study demonstrated significant participant attrition of both these groups through the Delphi process. A consensus face-to-face meeting was not included in our study to reduce the risk of nuances lost in virtual meetings during the pandemic and because the panel had anesthesiologists of different career stages, to mitigate the potential influence of senior panelists on junior panelists voicing differing opinions and to reduce the risk of ‘group think’.

Our study was paused after the third round to avoid attrition as the participants were dealing the clinical challenges at the onset of the COVID-19 pandemic. The fourth and last round of this study is a slight deviation from the original study methods and was decided on after the authors recognized the need to prioritize the list of indicators. Twenty items were chosen to aid the reader in prioritizing these indicators.

While there remain questions regarding how these indicators can be best used, as well as hurdles related to cost of implementation and end-user buy-in, it is recognized that comprehensive practice assessment must be based on more than data collected from an electronic record. The next steps in this project would be to further refine those indicators that are both feasible to collect and most desirable to end users.

This study has identified and prioritized a list of 56 anesthesia quality indicators deemed to be both relevant to an anesthesiologist’s practice and obtainable from an electronic record. This is an essential step in the goal of aiding clinicians and departments in meeting ongoing cQI requirements recommended by professional societies and medical regulatory bodies.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Acknowledgements

Special thanks to Dr. G. Bryson, Dr. L. Filteau, Dr. H. Pellerin, Dr. M. Sullivan, Dr. H. Grocott, Dr. J. Loiselle, Dr. P. Collins, Dr. K. Sparrow, Dr. S. Rashiq, Dr. M. Cohen, Dr. M.J. Nadeau, Dr. F. Manji, Dr. R. Merchant, Dr. S. Microys, Dr. M. Thorleifson, and Dr. V. Sweet for their time and input on this study.

Abbreviations

Authors’ contributions.

MSY and JT collected and analyzed the data, wrote the main manuscript, prepared the figures and tables, reanalyzed the data, and revised the manuscript. Both authors reviewed and approved the final manuscript for submission.

The authors declare they have no funding.

Data Availability

Declarations.

This study received Johns Hopkins Institutional Review Board application acknowledgement (HIRB00008519) on May 27, 2019. Written informed consent was obtained from all participants. All methods were performed in accordance with the relevant guidelines and regulations.

Not applicable.

The authors declare they have no competing interests.

Consent for publication : not applicable.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

COMMENTS

  1. ASA Physical Status Classification System

    The American Society of Anesthesiologists (ASA) physical status classification system is a grading system to determine the health of a person before a surgical procedure that requires anesthesia. The purpose of ASA classification is to: Keep a record of your health before surgery. Provide a uniform system for all anesthesiologists to use.

  2. Statement on ASA Physical Status Classification System

    The ASA Physical Status Classification System has been in use for over 60 years. The purpose of the system is to assess and communicate a patient's pre-anesthesia medical co-morbidities. The classification system alone does not predict the perioperative risks, but used with other factors (eg, type of surgery, frailty, level of deconditioning ...

  3. American Society of Anesthesiologists Staging

    Pre-operatively, the patient is subjectively assigned a score according to their physical status, which is determined by the anesthesiologist after considering patient presentation, history, and functional limitations.

  4. American Society of Anesthesiologists Classification

    The American Society of Anesthesiologists (ASA) physical status classification system came about to offer perioperative clinicians a simple categorization of a patient's physiological status to help predict operative risk. The ASAPS originated in 1941 and has seen some revisions since that time.[1][2][3] Unfortunately, while the ordinal classification scheme is simple, it is far from an ideal ...

  5. What Is an ASA Score in Surgery?

    The ASA (American Society of Anesthesiology) score is a metric to determine if someone is healthy enough to tolerate surgery and anesthesia. The American Society of Anesthesiologists (ASA) Physical Status Classification System is a tool used in preparation for surgery to help predict risks in a given patient.

  6. ASA physical status assignment by non-anesthesia providers: Do ...

    As a result, patients are assigned a presumed ASA-PS by a non-anesthesia provider (e.g., surgeons and physician extenders) that may not reflect the ASA-PS chosen by the anesthesiologist on the day of surgery.

  7. ASA Physical Status Classification

    ASA VI. ASA VI is a brain-dead patient whose organs are being removed for donor purposes. 1,2 History and Evolution. The first version of a physical status classification system was introduced by Saklad, Taylor, and Rovenstine in 1941 and served as the basis for the contemporary ASA PS classification system. 3 The original intent was "to study, examine, experiment, and devise a system for ...

  8. ASA Physical Status

    For more detail in discerning where individual patient co-morbidities score, visit the ASA Standards and Guidelines website focused on ASA physical status. This resource updates new evolving disease states and their associated ASA score. Variable & Associated Points. ASA 1 - Normal healthy patient

  9. Assignment of pre-event ASA physical status classification by pre

    κ w values ranged from 0.77 to 0.85 among the three in-hospital physicians, and from 0.47 to 0.89 when comparing the pre- to in-hospital physicians. The mean kappa values were 0,67 (PDocs Stavanger), 0,78 (IDocs Stavanger), 0,75 (PDocs Trondheim) and 0,84 (IDocs Trondheim). For most scores (82%) inter-rater reliability between pre-and in-hospital physicians were moderate to substantial (κ w ...

  10. Delphi consensus on the American Society of Anesthesiologists' physical

    The American Society of Anesthesiologists (ASA) score is generated based on patients' clinical status. Accurate ASA classification is essential for the communication of perioperative risks and resource planning. Literature suggests that ASA classification can be automated for consistency and time-efficiency.

  11. Using Examples Best When Classifying ASA Physical Status

    To help determine the utility of the examples in improving class assignment, Dr. Hurwitz and her colleagues recruited 779 anesthesia (from 41 states) and 110 nonanesthesia providers (from 18 states) into this Web-based study.

  12. Clinical agreement in the American Society of Anesthesiologists

    Background The American Society of Anesthesiologists physical status (ASA-PS) classification is not intended to predict risk, but increasing ASA-PS class has been associated with increased perioperative mortality. The ASA-PS class is being used by many institutions to identify patients that may require further workup or exams preoperatively. Studies regarding the ASA-PS classification system ...

  13. ASA physical status assignment by non-anesthesia ...

    The patients may not, however, necessarily be seen in the anesthesia clinic prior to surgery. As a result, patients are assigned a presumed ASA-PS by a non-anesthesia provider (e.g., surgeons and physician extenders) that may not reflect the ASA-PS chosen by the anesthesiologist on the day of surgery.

  14. PDF NHMSFAP ASA Physical Status Classification Guideline

    ASA physical status classification is documented for each patient. The decision to perform surgery on a patient should take into account the health risks and comorbidities associated with the planned surgery and the impact those conditions will have on the patient outcome. 1. ASA 1 and 2. Suitable for surgery in a non-hospital facility. 2. ASA 3.

  15. ASA physical status assignment by non-anesthesia ...

    Non-anesthesia providers assign ASA-PS with significantly less accuracy than do anesthesia providers, even when adjusted for multiple comparisons. ... They were asked to assign an ASA-PS score and rate their perceived self-confidence level (20-100%) on the accuracy of their assigned score for each case both (1) before and (2) after reviewing ...

  16. ASA physical status assignment by non-anesthesia providers: Do surgeons

    ASA physical status assignment by non-anesthesia providers: Do surgeons consistently downgrade the ASA score preoperatively? Christopher Curatolo, Andrew Goldberg, David Maerz, Hung Mo Lin, Hardikkumar Shah, Muoi Trinh. ISMMS Center for Biostatistics; Icahn School of Medicine at Mount Sinai;

  17. Clinical risk assessment tools in anaesthesia

    Risk scores assign a weighting to factors identified as independent predictors of an outcome. They are simple to use. • Risk prediction models estimate an individual probability by entering the patient's data into the multivariable risk prediction model. •

  18. Anesthesia Payment Basics Series: #4 Physical Status

    The conversion factor in our example will be $70.00 per unit. Payment will be calculated using the equation: (Base Units+ Time Units+ Modifying Units) * Conversion Factor. If the patient is an ASA I: (7 Base Units + 7 Time Units + 0 Physical Status Modifying Units) * $70.00 = $980.00. If the patient is an ASA III:

  19. Clinical risk assessment tools in anaesthesia

    ASA-PS score correlates with outcome in a number of different clinical settings. 1. Underlying fitness is an important predictor of survival after surgery; a high ASA score is predictive of both increased postoperative complications and mortality after non-cardiac surgery. 5. ASA-PS is easy to use and understand.

  20. Determining Resident Clinical Performance

    Background. Valid and reliable (dependable) assessment of resident clinical skills is essential for learning, promotion, and remediation. Competency is defined as what a physician can do, whereas performance is what a physician does in everyday practice. There is an ongoing need for valid and reliable measures of resident clinical performance.Methods. Anesthesia residents were evaluated ...

  21. Guidelines to the Practice of Anesthesia

    Overview. The Guidelines to the Practice of Anesthesia Revised Edition 2020 (the Guidelines) were prepared by the Canadian Anesthesiologists' Society (CAS), which reserves the right to determine their publication and distribution. The Guidelines are subject to revision and updated versions are published annually.

  22. Guidelines to the Practice of Anesthesia

    The Guidelines to the Practice of Anesthesia Revised Edition 2021 (the Guidelines) were prepared by the Canadian Anesthesiologists' Society (CAS), which reserves the right to determine their publication and distribution. The Guidelines are subject to revision and updated versions are published annually.

  23. Anesthesia quality indicators to measure and improve your practice: a

    The aim of this study was to create a refined list of meaningful anesthesia quality indicators to assist anesthesiologists in the process of continuous self-assessment and feedback of their practice. Methods An initial list of quality indicators was created though a literature search.