A randomised test can be totally fair but it can also be biased.
A test is biased when the results have consequences that unfairly advantage or disadvantage test takers.
Is it possible to determine whether a test is fair? Whether it is equally difficult for all candidates?
Yes it is. But only in hindsight.
An analysis of the average p-value of the test is of great help in establishing the fairness of the test. When the average p-values are spread across a broad range then it is highly likely that several tests had varying levels of difficulty.
But…in hindsight is too late!
You want certainty about the fairness of your test before it is delivered to your candidates.
It will prevent you from having to deal with some considerable headaches afterwards.
So, how is this done?
How does one create a test that is fair to each and every candidate?
The key condition is that you have a clear understanding of the level of knowledge and/or skills of your candidates.
Your candidates really do not have to know the answers to all of the questions. In fact, some things are learnt by doing and through gaining experience.
So what does this have to do with randomised tests?
Because: When designing a randomised test you want to ensure that candidates who come well prepared are presented with questions that they can answer.
You want to be able to distinguish the competent candidates from those that require some further education. Evidently, you work your way back starting from the norm.
So what is the goal? What do you want to measure?
You want to be able to assess a candidate’s knowledge and insight at a predetermined level of the subject matter. You want to set a standard. And for this to work correctly you need to be a subject matter expert. You apply your knowledge of the subject matter during an item review and qualify all of the items into buckets of easy, moderate and hard questions. This method of standard setting is the foundation for a good randomised test. This approach is known as the Method of Angoff.
Depending on the testing solution that you use you ensure that all levels of difficulty are reflected in your test specifications matrix – or blueprint. QuestionMarkPerception has this to say on the subject.
Andriessen’s Sisto offers the possibility of seeding pretest items. These items do not count towards a candidate’s final score but you can use them for determining their p- and rit-values. Is it a hard, moderate or easy question.
When you have collected sufficient information about the pretest item you then decide whether it can be included in the test. This allows you to remove an item that is not performing well or you amend it and include it as a new pretest item in the next release of your test.
Taking this approach to the design and further development of your test allows you to improve its quality.
Your item bank will increasingly reflect items of a similar difficulty level.
Should you wish to use items with significantly different levels of difficulty then you will want to label your items or use a test matrix that is designed to fairly distribute these items.
Randomised tests: The advantages:
- They decrease the value of exam or item theft. Every test is different!
- It is easy to swap pretest items in and out.
- It allows you to gradually grow your item bank increasing the randomisation of items.
- Every candidate is presented with a test that is unique.
So, is a randomised test fair?
But it requires work and maintenance. Particularly in the area of item difficulty.
Source: Andriessen International Blog