How do I use recommendations to improve my test?

Once a certain number of candidates have taken your test, you'll receive recommendations on how to improve it. In this article, we explain the basis of these recommendations and how to use them.

TestGorilla uses the best practices from psychometrics and data science to generate recommendations that help you improve the reliability of your test. Your TestGorilla Score will improve when you act on these recommendations.

Note that these recommendations are not available right away. A certain number of candidates must take your test before our algorithms can produce reliable recommendations.

The first set of recommendations

The first recommendations appear when 80% of the questions in your test have been answered 50 times.

How many candidates does that equate to, exactly? This depends on how many questions you've written, and how many of them are in each candidate test (this can be 12, 15, or 20). In general, it corresponds to a few hundred candidates.

Let's say you created a set of 100 questions for a test and candidates receive a subset of 20 of these per test. In this case, your first set of recommendations appears when about 700 candidates have taken a paid test.

These particular recommendations pertain to individual questions, not the test as a whole. Three types of recommendations can appear initially:

Recommendation

When is it shown?

Why is it shown?

The item is very likely going to be too difficult. Make it easier.

Very few candidates can answer the question correctly.

The question does not help to discriminate between skilled and unskilled candidates.

There is most likely a mistake in the scoring key. Please check.

The large majority of candidates that score very well on the rest of the questions have gotten this question wrong (and/or vice versa).

In a reliable test, we can expect strong candidates to do better than poor candidates on all questions. If the reverse is true for one question, most likely the question is to blame, not the candidates.

Reduce the required number of thinking steps or simplify the steps.

It takes most candidates too long to answer this question.

Candidates have a time limit of 10 minutes for most tests. If one question takes too long, they won't have enough time for the other questions.

Once you've updated the questions based on these recommendations, you may, after another few hundred candidate tests, see some of these recommendations appear again.

The full set of recommendations appears when about a thousand candidates have taken your test

A more refined set of recommendations appears when 70% of the questions have been answered 200 times. This is (following the example above) after approximately 2,000 candidates have taken the test. Properly addressing these recommendations will improve the reliability of the test even more.

Again, the following recommendations pertain to individual questions:

Recommendation

When is it shown?

Why is it shown?

Replace this question with a similar question.

Too many candidates have seen this specific question.

To avoid questions from leaking and appearing on public websites, we use a limit on the exposure of questions.

Make this question more difficult, either by changing the question or by making the distractors more attractive.

Very few candidates answer this question incorrectly. It is, therefore, too easy. 

The question does not help discriminate between skilled and unskilled candidates.

The question is too difficult. Make it easier.

Very few candidates answer this question correctly.

Question type is multiple-response or short text.

The question does not help discriminate between skilled and unskilled candidates.

The question is too difficult. The most chosen wrong answer is [most chosen wrong answer]. Make it easier.

Very few candidates answer this question correctly.

Question type is multiple-choice or true/false.

The question does not help discriminate between skilled and unskilled candidates.

The following distractor is almost never chosen: [distractor]. Make it more attractive to choose (i.e. plausible).

The distractor has a very low incidence rate.

Question type is multiple choice or multiple-response.

If almost no candidate chooses a distractor, it is not appealing enough for candidates to consider. 

The following distractor is chosen more often than the correct answer: [distractor]. Replace it by a less plausible distractor, or avoid confusion in the question.

A distractor is chosen more often than the right answer.

Question type is multiple choice.

There can be multiple reasons why this happens. The distractor can be too tricky, or the question is ambiguous or unclear, or the correct answer is unclear.

Check if you have made a mistake in the scoring key.

Many candidates that score very well on the rest of the questions choose a particular distractor.

In a reliable test, we can expect skilled candidates to do better than unskilled candidates on all questions. If the reverse is true for a particular question, most likely the question is to blame, not the candidates.

Check if the question and its answer options are clear; remove potential confusion or ambiguity. Additionally, make sure the question is relevant with regard to the skills/knowledge you intend to test.

The score on this question does not correlate well with the score on the rest of the test.

If the question is well written and contains no confusion, it likely measures something different from the other questions in the test. For the reliability of the test, it's important that all questions cover relevant topics, i.e. topics within the required skillset.

It takes most candidates too long to answer this question. Reduce the required number of thinking steps or simplify the steps.

It takes candidates, on average, much longer to answer this question relative to the other questions in the test.

Candidates have a time limit of 10 minutes for most tests. If one question takes too long, they won't have enough time for the other questions.

 

We also give recommendations that apply to the overall test:

Recommendation

When is it shown?

Why is it shown?

Add more difficult questions.

There are insufficient difficult questions

The test should also be able to differentiate between strong candidates 

Add more easy questions.

There are insufficient easy questions

The test should also be able to differentiate between weaker candidates

Add variety to questions.

The questions are too similar

If a candidate can do one question, they can do most; as such, the tested skillset is too narrow.

Reduce the number of questions candidates have to answer.

A large proportion of candidates cannot finish the test in time.

This does not give a pleasant candidate experience.

Consider increasing the number of questions candidates have to answer.

A large proportion of candidates finish the test well before the time limit. 

The reliability of the test would increase if there were more questions answered. Candidates apparently have enough time to answer more.

It's important to take sufficient time to improve your questions based on these recommendations.

If you address these issues thoroughly, you'll likely see your list of recommendations become much smaller as time goes on. In the meantime, your test's Quality Score will meaningfully improve.