What does the TestGorilla score mean?

As a testpreneur, you'll want to keep a close eye on your TestGorilla Score. Your success depends on this number.

What is the TestGorilla score?

The TestGorilla Score is a measure of the overall quality of a test and can range from 0-100%. It's compiled from three underlying scores:

  1. The reliability of the test: We calculate this number using statistical methods, described further down below.
  2. The validity of the test: We determine this score using input from both candidates and customers (also detailed further down below).
  3. Customer feedback: After a customer has made their hiring decision, we ask them to review the tests they've used in their assessment. Specifically, we ask "Would you use this test again?" The answer to this question determines this third score.


The overall TestGorilla Score plays three important roles:

  • It represents our quality standard. We maintain a high minimum TestGorilla Score. Tests below the minimum score are not available to customers. Testpreneurs need to make improvements to these tests before we make them available again.
  • It determines testpreneurs' variable pay (the revenue share). See our article on testpreneur rewards for details on how this works.
  • It guides test recommendations for our customers. We recommend tests with high TestGorilla Scores to our customers if the tests fit the roles they’re hiring for. In other words, the higher the TestGorilla Score, the more a test is promoted.

Breaking down the TestGorilla score: Reliability and Validity

At TestGorilla, we aim to provide tests with high reliability and validity. Therefore, we incentivize and enable our testpreneurs to improve their tests in a few ways.

First, let's define what we mean by the two terms.


Reliability
is the extent to which test results are consistent, stable, and free of random fluctuation. If a scientist measures the speed of something many times and the results are consistently close to each other, time after time, then the measurement is reliable.

The same goes for a screening test. If the same candidate were to take a test many times, and every time the score was the same, the test is reliable.

Another way to determine the reliability of a test is to look at the internal consistency of the questions. If question scores correlate strongly with each other, the test is internally consistent.


Validity
is the extent to which a test truly measures what it's supposed to. If a test is reliable but not valid, it's not a useful test.

For example: if a kitchen scale's calibration is off by half an ounce, it'll measure the weight of your ingredients consistently wrong. The scale's measurement is reliably consistent, but at the same time invalid, since it doesn't measure the true weight.

Similarly, if a screening test includes questions that don't accurately measure the subject at hand, or measure something different from what the test description claims to, the validity of the test is compromised.

Measuring reliability and validity

Now that we've clearly defined these two categories, let's break down how we measure them.


Reliability
can be measured in multiple ways, most of which involve statistics. Test results from many candidates are necessary to perform these measurements with reasonable accuracy.

Within TestGorilla, the reliability of a test is expressed by the "Quality Score," and is based on these very statistics. The Quality Score for a test is visible once a sufficient number of candidates have taken the test.


Validity
can also be measured in multiple ways, but unlike reliability, most of these measurements involve human judgment.

Here are a few ways we measure validity:

  • Face validity is the extent to which a test seems relevant to stakeholders such as test-takers. It's a subjective measurement.
  • Content validity is the degree to which a screening test matches objective standards and accurately measures formal job requirements. This involves the judgment of qualified experts on the subject (in other words, peer review).
  • Predictive validity measures to what extent the test score correlates with job performance. This form of validity is most important for pre-employment testing. Ultimately, our customers use TestGorilla to predict how their candidates will perform in the job.

TestGorilla measures the predictive validity of tests by asking customers how well their hired candidates perform in the job. The more a test's results correspond to actual job performance, the higher the predictive validity for that test.

If there aren't enough data points yet to measure the predictive validity of a test, TestGorilla uses a measure of face validity. For instance: imagine a candidate has just finished a DevOps test. The candidate is then asked for feedback: "In your opinion, did the test accurately measure your skills in DevOps?" The subjective scores given by candidates make up this temporary validity score.