|Year : 2021 | Volume
| Issue : 4 | Page : 239-243
Relationship between difficulty and discrimination indices of essay questions in formative assessment
Pushpalatha Kunjappagounder1, Sunil Kumar Doddaiah2, Pushpa Nagavalli Basavanna3, Deepa Bhat1
1 Department of Anatomy, FAIMER Fellow, JSS Medical College, JSS Academy of Higher Education and Research, Mysore, Karnataka, India
2 Department of Community Medicine, JSS Medical College, JSS Academy of Higher Education and Research, Mysore, Karnataka, India
3 Department of Anatomy, JSS Medical College, JSS Academy of Higher Education and Research, Mysore, Karnataka, India
|Date of Submission||03-Sep-2020|
|Date of Acceptance||30-Sep-2021|
|Date of Web Publication||21-Dec-2021|
Dr. Pushpa Nagavalli Basavanna
Department of Anatomy, JSS Medical College, JSS Academy of Higher Education and Research, Mysore, Karnataka
Source of Support: None, Conflict of Interest: None
Introduction: Assessment drives learning; assessment is the key component in medical education. Written examination plays a major role in assessing cognitive domain and well-constructed essay questions helps to assess the higher order of knowledge. Item analysis supports to assess the quality of items written and in turn which helps the faculty to retain, modify, or vomit the items. Material and Methods: Item analysis was done on anatomy essay questions of two internal assessment of 200 first MBBS students. Difficulty and discrimination indices were calculated, and relationship between the two indices was also analyzed. Analysis of the data indicated that there was a wide spectrum of level of difficulty among the essay items. Results: 83.33% items were within the acceptable range of difficulty index (DIF) and 1.67% items were not within the acceptable range. 91.67% items were good to identify students who have studied and only 8.33% (one question) item to be discarded and replaced. The relationship between the item discrimination index (DI) and DIF was determined using regression analysis and it is statistically significant, as the difficulty level of item increased the DI also increased. Discussion and conclusion: By this kind analysis, we can have a question bank with validated different grades of acceptability. Before deciding on test, this will help to consider whether the item difficulty level appropriate for testing given objective and does the item discriminate adequately and then to decide which items to include, revise, or omit from a test.
Keywords: Difficulty index, discrimination index, essay questions
|How to cite this article:|
Kunjappagounder P, Doddaiah SK, Basavanna PN, Bhat D. Relationship between difficulty and discrimination indices of essay questions in formative assessment. J Anat Soc India 2021;70:239-43
|How to cite this URL:|
Kunjappagounder P, Doddaiah SK, Basavanna PN, Bhat D. Relationship between difficulty and discrimination indices of essay questions in formative assessment. J Anat Soc India [serial online] 2021 [cited 2022 May 24];70:239-43. Available from: https://www.jasi.org.in/text.asp?2021/70/4/239/333185
| Introduction|| |
Anatomy is normally considered as the foundation of medical sciences and students need to acquire core anatomical knowledge for a strong foundation for clinical encounters and professional practice. Assessment is the essential component of teaching and learning process. Assessing the students in the anatomy does not differ from assessing in other disciplines. Assessing in the anatomy has to obey the same general parameters as there are objectivity, validity, and reliability. Assessing educational objectives of cognitive domain will need different assessment tool than assessing educational objectives of psychomotor or the affective domain. The introduction of new education methods, tools, and innovative curricula has necessitated a change in the assessment and evaluation, as well.
Assessing teaching-learning outcomes in anatomical knowledge is a composite task that necessitates the evaluation of theoretical, practical, and clinical knowledge. Assessment of anatomical understanding in problem-based or competency-based curriculum requires many assessment tools and no single method of assessment can effectively test the knowledge, skills, and attitudes.
Written examinations are the most commonly employed method for the assessment of cognitive skills in medical education. Written examination has various types of questions such as essay type, modified essay type, short answer type, and multiple-choice questions. The most common type of question used in the written examination is the essay questions. The essay questions allow students to express their ideas, assess the higher order of cognitive domain and as there are no options to select by guessing such bias are eliminated. The disadvantages of essay questions are a smaller number of questions, limited sampling, and unfair distribution of questions over topics, vague questions, etc., Blueprinting overcomes these issues and increases the validity of examinations.
The evaluations depend upon the assessment tool and item analysis which is consisting of analysis of individual questions and analysis of the whole test after the test is conducted. Postvalidation process is basically a statistical method that is called as item analysis. The item analysis is a valuable, relatively simple but an effective process to check the reliability and validity of questions. The difficulty index (DIF) and discrimination index (DI) usually calculated for objective type of questions and it is difficult to analyze essay questions. In the literature, only one study has been done on the item analysis of essay questions. Analyzing the essay questions will aid the teachers to decide whether the particular item to be retained, modified, or discarded. The present study was taken to evaluate the quality of essay questions by analyzing DIF and DI. It will help us in identifying specific technical flaws in the questions and improve the skill of examiners in item writing.
| Material and Methods|| |
The cross-sectional study was conducted on a cohort of 200 1st year undergraduate medical students in the department of anatomy at our institution during the academic year 2018–2019. Three internal assessments were conducted during this duration. Question paper for the same was prepared based on the blueprint and validated by subject experts of the medical education unit. Each paper comprised of one long essay question for 10 marks, 5 short essay questions each carrying 5 marks and 5 short answer questions each carrying 3 marks. Each question was assigned to a particular faculty for the evaluation and key answers were provided to avoid interobserver variability. The answer scripts were evaluated, and marks were allotted for the items. In the current study, long and short essay questions of the first and second internal assessments were analyzed. The study protocol was approved by the Institutional Ethics Committee.
For item analysis, results of all students were ranked in the descending order, from highest marks to lowest marks. Then papers were divided into quartiles based on scores. Top 27% scores as upper quartile or high scored groups (n = 54) and bottom 27% scores as lower quartile or low scored (n = 54) groups. Only these two (high and low scored) groups were considered for the analysis. Papers with average scores, middle quartiles were excluded from the study. Mark range was decided for short essay questions with the help of ANGEL group who formulated the score range and mean of individual question. For each question if students achieve 5 − 3.5 marks, it was considered as correct answer, i.e. A. For each question if students achieve 3 − 2 marks it was considered as near to correct answer, i.e. B. For each question students achieve 1.5 − 0.5 marks, it was considered as near to incorrect answer, i.e., C. For each question students achieve 0 marks or not answered was considered as incorrect answer, i.e., D. Mark range for long essay was decided by experts with the help of ANGEL group. For each question, students achieve 10-6.5 marks will be considered as correct answer, i.e., A, who achieve 6-4.5 marks will be considered as near to correct answer, i.e. B, who achieve 4-0.5 marks will be considered as near to incorrect answer, i.e., C and who achieve 0 marks or not answered will be considered as incorrect answer, i.e., D.
DIF and DI were calculated to evaluate the essay questions.
H = Number of students gave correct options in high score group
L = Number of students gave correct options in low score group
N = Total number of students in both groups
DIF was calculated using the formula
DIF= ([H + L]/N) ×100
DIF value is expressed in percentage. Its range is 0–100. Its recommended value is 45–60, and its acceptable value is 25–75.
Interpretation of DIF:
- >70% = Too easy
- 30%–70% = Average
- 50%–60% = Good
- <30% = Too difficult.
- DI was calculated using the formula, DI = 2 × ([H-L]/N)
- DI value is expressed as fraction, which range from 0 to 1.
Interpretation of DI is:
- ≤0.2 = Poor
- 0.21–0.24 = Acceptable
- 0.25–0.35 = Good
- ≥0.36 = Excellent.
All data are reported as mean ± standard deviation of n items (number of questions). The relationship between the item DI and DIF values for each test paper was determined using the regression analysis with the help of SPSS (Statistical package for the social science) Windows, Version 26.0. (IBM Corp. Released 2019. IBM SPSS Statistics for Armonk, NY, USA) and the coefficient of determination was given by R2. P of < 0.05 was considered to indicate statistical significance.
| Results|| |
Item analysis was done for essay questions of 1st and 2nd internal assessment papers (06 items of each). The difficulty and discrimination indices of items according to cognitive levels were obtained, and these statistics are shown in [Table 1]. Out of 12 items, difficulty value of 25% was under recommended range, 58.33% were under acceptable range and 1.67% were not within acceptable range [Table 1]. DI of 91.67% indicates that the questions can be recommended for assessment and DI of 8.33% indicates that the question should be discarded.
|Table 1: Difficulty index and discrimination index of different questions|
Click here to view
Analysis of the data indicated that there was a wide spectrum of level of difficulty among the essay items in both the papers. The DIF of paper 1 ranged from 27.14% to 54.28% and of paper 2 ranged from 12.85% to 67.14%. The DI of paper 1 ranged from 0.54% to 0.77% and DI of paper 2 ranged from 0.14% to 0.74% [Table 2].
|Table 2: Mean difficulty index and discrimination index for the questions in internal assessment|
Click here to view
The relationship between the item DI and DIF was determined using the regression analysis, and it is statistically significant. Both in the 1st and 2nd internal assessment essay questions we can observe as the difficulty increased the discrimination also increased [Figure 1] and [Figure 2].
|Figure 1: The relationship between item difficulty index and discrimination index values of 1st internal assessment|
Click here to view
|Figure 2: The relationship between item difficulty index and discrimination index values of 2nd internal assessment|
Click here to view
| Discussion|| |
The effective assessment of knowledge acquired is an essential element of medical education. Developing an appropriate assessment tool plays a major role in curriculum development and the it should be regularly evaluated. Having prepared and assessed a test, a faculty needs to know how good the test questions are and whether the test items were able to reflect students' performance in relation to their learning.
The reliability and ability of assessments to effectively discriminate between good and poor candidates of the assessment are the important considerations in evaluating an assessment tool.
Assessment instrument should not only assess appropriate cognitive domain but also should be able to withstand the scrutiny of content and construct validity, reliability, fidelity and at the same time discriminate the performance levels of the students being tested.
Essay questions are the commonly used item to test the cognitive skills and the effectiveness of the assessment tool depends on how the questions are framed. The first mandatory step for quality assessment is standardization of essay questions and for an assessment to be reliable and valid, a systematic selection of items with degree of difficulty and discrimination is necessary.
Usually, faculty believes that the questions framed are satisfactory and they are able to assess the real ability of students. Item analysis is a valuable procedure performed after the examination that provides information regarding the reliability and validity of the test item. It allows recognizing too difficult or too easy items and the items which are not able to differentiate between students who have learned and those who have not, and this serves as an effective feedback to teachers about quality of each item. The items can be removed, changed, or modified based on the item analysis for future use.
Item analysis results are influenced by the factors such as the number and quality of students and purpose of the test. Hence, before discarding an item for poor discrimination, think of the factor(s) that may contribute to such poor discrimination. Frequent evaluation of questions through item analysis helps to make a valid pool of essay questions and save time and energy for faculty. The process of developing a good test and good items is complex and time-consuming and involves numerous steps from creating items to pretesting them, revising, and editing items, etc.
DIF is a measure of how easy or how difficult a question is which is given to students, higher the index lower is the difficulty of the question and vice versa. DI indicates the ability of a question to discriminate between a higher and a lower ability student and its range is between 0 and 1.0. DI value of 1.0 indicates an ideal question with perfect discrimination and minus value means more students in the lower group are answering that item correctly than students in the higher group. The DIF and DI are reciprocally related.,
Before deciding on test consider whether the item difficulty level appropriate for testing given objective and does the item discriminate adequately and then to decide which items to include, revise, or omit from a test. Qualitative techniques also be used for generating and analyzing the data which is grounded in the voice of students rather than psychometric-statistical inferences. The units of analysis are the words of students rather than their numerical scores.
Item analysis is frequently done for multiple choice questions while there is only one study done on essay questions in the literature. In the present study, 83.33% items are in acceptable range of DIF and 1.67% items are not acceptable. 91.67% items were good to discriminate between students who have studied and who are not and only 8.33% (one question) item to be discarded and replaced. Item analysis done on physiology items showed that DIF value of 62.5% questions were within recommended and acceptable range and 50% questions were within acceptable range while 37.5% questions were within nonacceptable range. Discrimination value of 100% questions was under recommended and acceptable range and none of the questions were under nonacceptable range.
Mean DIF calculated for MCQ items was 57.92% ± 26.88% (P < 0.05, confidence interval >95%). Out of the 04 test papers conducted in the anatomy, the mean DIF scores of the individual tests were ranging from 40% to 70%, except in abdomen paper in which higher range for mean DIF was 83%. Abdomen test paper contained higher in the number of very easy questions (7 out of 15), leading to test paper easy for students. DIF of 76.90% items was between 30% and 70%, thus majority of items were of ideal to acceptable in difficulty.
| Conclusion|| |
In the present study, among all the items analyzed, only one item was not within acceptable range and need to be discarded while remaining items were within acceptable range. A well-framed essay question is an efficient tool to evaluate different levels of cognitive domain among students. Item analysis helps to observe the item characteristics and to improve the quality of the test by item revision. The faculty has to be trained to frame an essay question which can assess the higher order of cognitive domain and can clearly discriminate between the students who have studied and who are not.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Singh K, Bharatha A, Sa B, Adams OP, Majumder MA. Teaching anatomy using an active and engaging learning strategy. BMC Med Educ 2019;19:149.
Brenner E, Chirculescu AR, Reblet C, Smith C. Assessment in anatomy. Eur J Anat 2015;19:105-24.
Yaqinuddin A, Zafar M, Ikram MF, Ganguly P. What is an objective structured practical examination in anatomy? Anat Sci Educ 2013;6:125-33.
Chakravarty M, Latif NA, Abu-Hijleh MF, Osman M, Dharap AS, Ganguly PK. Assessment of anatomy in a problem-based medical curriculum. Clin Anat 2005;18:131-6.
Khan GN, Ishrat N, Khan AQ. Using item analysis on essay types questions given in summative examination of medical college students: Facility value, discrimination index. Int J Res Med Sci 2015;3:178-82.
Rao C, Kishan Prasad HL, Sajitha K, Permi H, Shetty J. Item analysis of multiple-choice questions: Assessing an assessment tool in medical students. Int J Educ Psychol Res 2016;2:201-4. [Full text]
Chauhan PR, Ratrhod SP, Chauhan BR, Chauhan GR, Adhvaryu A, Chauhan AP. Study of difficulty level and discriminating index of stem type multiple choice questions of anatomy in Rajkot. Biomirror 2013;4:1-4.
Palmer EJ, Devitt PG. Assessment of higher order cognitive skills in undergraduate education: Modified essay or multiple choice questions? Research paper. BMC Med Educ 2007;7:49.
Mahjabeen W, Alam S, Hassan U, Zafar T, Butt R, Konain S, et al.
Difficulty index, discrimination index and distractor efficiency in multiple choice questions. Ann Pak Inst Med Sci 2017;13:310-5.
Suruchi, Rana SS. Test item analysis and relationship between difficulty level and discrimination index of test items in an achievement test in biology. Paripex Indian J Res 2014;3:56-8.
Siri A, Freddano M. The use of item analysis for the improvement of objective Examinations. Procedia Soc Behav Sci 2011;29:188-97.
Pande SS, Pande SR, Parate VR, Nikam AP, Agrekar SH. Correlation between difficulty and discrimination indices of MCQs in formative exam in physiology. South East Asian J Med Educ 2013;7:45-50.
Sim SM, Rasiah RI. Relationship between item difficulty and discrimination indices in true/false-type multiple choice questions of a para-clinical multidisciplinary paper. Ann Acad Med Singap 2006;35:67-71.
Gnaldi M, Matteucci M, Mignani S, Falocci N. Methods of item analysis in standardized student assessment: An application to an Italian case study. The International Journal of Educational and Psychological Assessment, 2013;12:78-92.
Tavakol M, Dennick R. Post-examination analysis of objective tests. Med Teach 2011;33:447-58.
[Figure 1], [Figure 2]
[Table 1], [Table 2]