Creating quality multiple-choice questions (MCQs) seems like a relatively simple task, right? Ask a question, and write four answers; three are incorrect, and one is correct. Strap in, folks… this is going to get cynical. In writing this blog post, I want to do three things:
- Write an MCQ that reflects the vast majority of mandatory compliance training questions I have encountered.
- Test out whether AI replicates these poor-quality questions (spoiler alert: of course it does), and
- Show how you can create better quality MCQs (by yourself).
The mandatory compliance training MCQ standard 😬
One key feature of mandatory compliance training quizzes is that many of the answers are incomprehensibly obvious to anyone who has never read the material. This moves participants towards ‘just do the assessment’ vibes. Here’s my attempt:
Jenny’s team is moving office. There is an 80kg, 180cm tall bookshelf that Jenny wants to move to her new office. Should she:
- Try to move it on her own.
- Order her subordinates to move the bookshelf.
- Buy a new bookshelf with jewel-encrusted shelving.
- Arrange for the bookshelf to be moved to the office via the organisation’s relocation service.
- Ask the building management of the new office whether there is adequate shelving.
- 4 or 5 as appropriate.
- All of the above.
- None of the above.
- 1 through 3, in order.
Ok, this was fun, but there are features of this question that appear in many MCQs:
Choice of names: A caucasian protagonist with a fairly common name.
Obvious distractors: Several obviously incorrect responses (why is there always the ‘do it yourself’ OH&S response?).
A longer, more nuanced correct answer: A nuanced response that is generally longer than the others, indicating that it is the likely correct answer (in this case, 4 and 5, so therefore the even more nuanced 6: ‘4 or 5, as appropriate’).
All/none of the above: All of the above (90% of the time, this is the correct answer), and none of the above (thrown in because we couldn’t think of another distractor, and we used all of the above, so that’s where our brains went).
Complicated question formation: So I have to match the letters to the responses, then read all three in order, then think about whether that’s correct?
Right, it’s clear that I’m no good at this, so let’s give AI a try. Surely this will provide me with a high-quality MCQ with absolutely none of the common pitfalls, right?
Does AI replicate this level of quality? 🫣
Here’s the prompt:
Write me a multiple-choice quiz question with a scenario about a staff member who wants to move a bookshelf to her new office. The answers should state four choices in the scenario; one should be correct. There should be a policy/compliance feel to the answers.
Here’s what Perplexity gave me:
Jane, a staff member, wants to move a heavy bookshelf from her current office to her new office across the building. According to workplace safety and compliance policies, what should Jane do before attempting to move the bookshelf?
- Move the bookshelf herself quickly to avoid disrupting other staff.
- Ask a colleague to help her and ensure they both lift the bookshelf properly using safe lifting techniques.
- Use the elevator to move the bookshelf without notifying building management or using any moving equipment.
- Request assistance from the facilities or maintenance team trained in moving heavy office furniture to prevent injury or damage.
The similarities with my question are remarkable. Of note:
Choice of names: A caucasian protagonist with a fairly common name. I made a deliberate choice to make a point, but Perplexity’s choice was based on data. The lack of representation in the LLM sources means that the responses will inherently contain a lack of representation. Gross.
Obvious distractors: Several obviously incorrect responses. Ok, so mine were a tad facetious, and Perplexity’s have a little more nuance (if it’s a safe technique, it’s fine, right?).
A longer, more nuanced correct answer: A nuanced response that is generally longer than the others, indicating that it is the likely correct answer. Ok, so while it’s not significantly longer than the others, it’s still true that in poorly written MCQs, the longest answer is the correct answer.
All/none of the above: Responses such as all/none of the above “do not operate reliably or validly” (Towns, 2014), and ‘none of the above’ effectively turns the question from a multiple choice into a true/false question. If the purpose of the quiz is to test what participants understand, then neither of these answers is successful in doing this. ‘None of the above’, doesn’t confirm the participant knows the correct answer, just that it isn’t one of the listed answers. Similarly, ‘all of the above’ can be selected as long as the participant knows at least two answers are correct, meaning there could be gaps in the participant’s understanding.
Can you create quality MCQ questions by yourself? 😎
A better question builds the nuance into each answer and avoids the common pitfalls of MCQs. Let’s try again:
Jing’s team is moving office. There is an 80kg, 180cm tall bookshelf that Jing wants to move to her new office. Should she:
- Arrange for the bookshelf to be moved to the office via the organisation’s relocation service.
- Ask the building management of the new office whether there is adequate shelving.
- Check the organisation’s policies on office relocation.
Let’s look into these choices a bit further:
Choice of names: After a quick Workplace search, there are around the same number of Jings in my current institution as there are Jennys and Janes.
Less obvious distractors: In this version, 1 and 2 are reasonable responses, but there is likely information about the situation in the organisation’s policies, which would help determine whether 1 or 2 is correct. So, if you are unsure, the first step is to check the policies.
Nuance through all answers: The correct response is now shorter than the others, but more nuanced. The other two responses play on logic and misconceptions. Most organisations use relocation services, so 1 could be correct. Most organisations would encourage you to use what is already available rather than spend the money and time to relocate the furniture, so 2 could be correct. Checking the policy could reveal a third option: The organisation does not allow the moving of large furniture. However, a purchase order can be made if like-for-like furniture is not available in the new space. The bookshelf Jing wants to relocate could hold many heavy, tall textbooks, and if the shelving at the new office is mainly for decoration, then Jing can put in a purchase order for something more suitable. Since Jing is an academic and needs access to these textbooks, attempting to store them on a flimsy bookshelf is an OH&S issue.
Total number of answers: A common misconception about MCQs is that they must have four responses. A better answer is that you should only have plausible responses. This is what happens sometimes when we see ‘all/none of the above’, because the MCQ writer can’t come up with a defensible answer. “Empirical research has established that it is very unlikely that item writers can write three distractors that have item response patterns consistent with the idea of plausibility (Haladyna et al., 2002).
Given that we have now sent our participant on a treasure hunt through the policy bank (hooray!), we could build on this. Through immediate feedback, we can tell the participant that 3 is the correct response and link to the policy, and explain the process. The next question allows the participant to explore the policy and the relocation process.
Focus on plausible, nuanced distractors and building questions that reflect real-life situations.
So, writing multiple choice questions isn’t just ‘pick a question, add four answers, hope nobody notices’ kind of vibe. The truth is, most compliance MCQs fail because they’re full of obvious distractors, low effort ‘all/none of the above’ answers, and distinct ‘pick me because I’m the longest and most nuanced’ correct answers. All of which AI can replicate at lightning speed and at the expense of the environment (hooray!). Ok, I realise the irony that I used AI to generate the question in this post, but hopefully this saves some future carbon credits by highlighting its lack of quality. Ultimately, you don’t need AI or jewel-encrusted bookshelves to do better, and to be honest, the jewels may be distracting. Need to know how to create quality distractors? Check out my post How to write quality quiz questions: Scenarios, nuanced distractors, and a feedback dialogue.
References
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Applied Measurement in Education, 15(3), 309–333. https://doi.org/10.1207/S15324818AME1503_5
Towns, M. H. (2014). Guide To Developing High-Quality, Reliable, and Valid Multiple-Choice Assessments. Journal of Chemical Education, 91(9), 1426–1431. https://doi.org/10.1021/ed500076x
