Javascript must be enabled to continue!

Examiner Variability in Clinical Assessments: Do Examiner Pairings Influence Candidate Ratings?

View through CrossRef

Abstract Background The reliability of clinical assessments is known to vary considerably and inter-examiner variability is a key contributor. This may result in significant differences in scores between comparable candidates, a serious challenge in medical education. An approach frequently adopted to avoid this and improve reliability is to pair examiners and ask them to come to an agreed score. Little is known however, about what occurs when these paired examiners interact to generate a score.Methods A fully-crossed design was employed with each participant examiner observing and scoring. A quasi-experimental research design used candidate’s observed scores in a mock clinical assessment as the dependent variable. The independent variables were examiner numbers, demographics and personality. Demographic and personality data was collected by questionnaire. A purposeful sample of medical doctors who examine in the Final Medical examination at our institution was recruited.Results Variability between scores given by examiner pairs (N=6) was less than the variability with individual examiners (N=12). 75% of examiners (N=9) scored below average for neuroticism and 75% also scored high or very high for extroversion. Two thirds scored high or very high for conscientiousness. The higher an examiner’s personality score for extroversion, the lower the amount of change in his/her score when paired up with a co-examiner; reflecting possibly a more dominant role in the process of reaching a consensus score.Conclusions While the variability between scores given by examiner pairs (N=6) was less than the variability with individual examiners (N=12), the reliability statistics for both assessments were comparable. Using paired examiners resulted in a more accurate and robust score than simply averaging two independent examiners scores. The higher an examiner’s personality score for extroversion, the lower the amount of change in his/her score when paired up with a co-examiner; reflecting possibly a more dominant role in the process of reaching a consensus score. These findings could have implications for the organisation and administration of clinical assessments. Further studies with larger numbers of participants might establish if personality testing before choosing examiner pairs could be utilised to help pair examiners and improve examiner variability.

Springer Science and Business Media LLC

Aileen Faherty Yvonne Finn Tim Counihan

Title: Examiner Variability in Clinical Assessments: Do Examiner Pairings Influence Candidate Ratings?

Description:

Abstract Background The reliability of clinical assessments is known to vary considerably and inter-examiner variability is a key contributor.

This may result in significant differences in scores between comparable candidates, a serious challenge in medical education.

An approach frequently adopted to avoid this and improve reliability is to pair examiners and ask them to come to an agreed score.

Little is known however, about what occurs when these paired examiners interact to generate a score.

Methods A fully-crossed design was employed with each participant examiner observing and scoring.

A quasi-experimental research design used candidate’s observed scores in a mock clinical assessment as the dependent variable.

The independent variables were examiner numbers, demographics and personality.

Demographic and personality data was collected by questionnaire.

A purposeful sample of medical doctors who examine in the Final Medical examination at our institution was recruited.

Results Variability between scores given by examiner pairs (N=6) was less than the variability with individual examiners (N=12).

75% of examiners (N=9) scored below average for neuroticism and 75% also scored high or very high for extroversion.

Two thirds scored high or very high for conscientiousness.

The higher an examiner’s personality score for extroversion, the lower the amount of change in his/her score when paired up with a co-examiner; reflecting possibly a more dominant role in the process of reaching a consensus score.

Conclusions While the variability between scores given by examiner pairs (N=6) was less than the variability with individual examiners (N=12), the reliability statistics for both assessments were comparable.

Using paired examiners resulted in a more accurate and robust score than simply averaging two independent examiners scores.

The higher an examiner’s personality score for extroversion, the lower the amount of change in his/her score when paired up with a co-examiner; reflecting possibly a more dominant role in the process of reaching a consensus score.

These findings could have implications for the organisation and administration of clinical assessments.

Further studies with larger numbers of participants might establish if personality testing before choosing examiner pairs could be utilised to help pair examiners and improve examiner variability.

Related Results

Consistency of perceiving odors: Inter- and Intra- Individual Differences in Odor Similarity Ratings

This study conducted odor pair similarity ratings twice and used the replicability of the ratings, as indicated by the correlation coefficients between the two sets of ratings, to ...

Inter-examiner reliability when using the Objective Structured Practical Examination (OSPE) mark sheet for physiotherapy practical

The Objective Structured Practical Examination (OSPE) format is used during practical examinations as part of the physiotherapy undergraduate curriculum at the University of the Wi...

Provocative Tests in Diagnosis of Thoracic Outlet Syndrome: A Narrative Review

Abstract Thoracic outlet syndrome (TOS) is a group of conditions caused by the compression of the neurovascular bundle within the thoracic outlet. It is classified into three main ...

Are file review-based SAVRY ratings of violence risk reliable?

Since its publication a decade ago, the Structured Assessment for Violence Risk in Youth (SAVRY) has gained acceptance as a strong predictor of future violence in adolescent popula...

Nudge and bias in subjective ratings? The role of icon sets in determining ratings of icon characteristics

AbstractSubjective ratings have been central to the evaluation of icon characteristics. The current study examined biases in ratings in relation to the context in which icons are p...

Study of Validity of Ratings

As part of research on validation of the Edwards Personal Preference Schedule, this study indicates that in addition to the traditional well-known constant errors in ratings explor...

Predictors of subjective ratings of stressor severity: the effects of current mood and neuroticism

AbstractRespondent‐based or subjective, ratings of stressor severity are posited to be influenced by systematic biases related to current mood and trait neuroticism which may confo...

Mitochondrial DNA in Mon‐Mon and Di‐Mon Pairings of Pleurotus ostreatus

Abstract:Based on enzymatically amplified regions of the mitochondrial DNA (mtDNA), stock‐specific markers were obtained for two stocks of Pleurotus ostreatus. A length mutation wa...