How many users are necessary to evaluate usability?

The question about the number of users necessary to get enough feedback on your interface is recurrent in a testing process. Do I need 5, 15, 30, 80 or millions of users to improve the interface and reach the goal I have set ? This post attempts to answer to this question.


Did you know that this question is still the subject of much debate among the variety of the UX disciplines? It is rooted in a story that began in the 80s and which is still current today. One would like to set a number as a global rule. But we show in this post that : 

  • It is a false debate : 5 users, 15 users, 30 users, 80 users, millions … this is overbiddind and things are eventually much simpler,  
  • A distinction must be done between an evaluation context and another to perform it in the appropriate way.  

Each of these numbers make sense… But not under the same conditions and for the same test objectives. To choose the right number of users, context must be properly defined and you need to ask yourself 2 questions:  

  • What is the goal of my tests ? 
  • What am I testing ?
  • Is it for qualitative or for statistical purposes ?


These questions are part of an iterative design process in a user-centered approach as described in a first part of this post. 

For each stage of design, specific tests are designed and therefore the question of the size of your user sample is questioned. The enlightening of this question si presented in a second part.

Finally, we test on a specific scenario the scientific well-known precept of : “you need 3-5 user to get 85% of critical problems of your interface”.

The user-centered design approach : an iterative process

When it is about to develop a new interface, connected to a new feature, redesign or product, a 4 stage iterative process should be setup :

Figure 1: The main steps of the interface conception cycle.

At each of these stages, tests are carried out for the quality of the interface or the product, its usability or the quantity of traffic.

Two points must be distinguished :

  • Qualitative evaluation :  during stages 1, 2 and 3. Iterations should be operated to progressively but quickly suppress problems and send to development.
  • Quantitative evaluation : at stage 4 and over (product release), once the solution is deployed and on which we want to do analytics or statistics.
 Number of users = f(test nature, design stage)

The answer is two-folds :

Before deployment = qualitative evaluation

At early design stage and before deployment, where qualitative evaluation is required, 3 to 5 users are needed.

This answer has been the same since I started promoting “discount usability engineering” in 1989. The pioneering work of Virzi (1992), modeled by Nielsen (1993, and popularized by Nielson Norman Group in his well-known blogs (Why you only need to test with 5 users ? or How many test users in a usability study ?)  agree :

To assess the ergonomic quality of an interface at a given time : between 3-5 users of the same typology (demographic criterion, profession, etc.) are sufficient.

Why ? 

  1. The probability to discover new problems on a same use-case decreases as increases the number of test users. In a word, every test users highlight the same errors. This leads to find 80% to 85% of critical errors with 3 to 5 users.
  2. The design improvement process is iterative. Therefore, once we have these initial feedbacks, we apply the changes and we retest things. From the second round of tests, the problems will become rarer but deeper (problems with task flow structures, etc.) because a defective visual prevents the user from going further.
  3. Testing is expensive. Testing on fewer users but more often gives a higher cost / benefit ratio. It is better to have 3 improved versions tested successively with 5 users than a single one with 15 users. The insights will be more constructive in the first case and very redundant in the second case.

After deployment or for specific tests 

At release and post-release stagesquantitative evaluation is necessary and must be done on a consistent users sample size.

Two categories must be considered :

  1. Evaluation of user journeys with regard to certain quantitative metrics such as: journey time, journey errors, etc. In order to have analytics and starting making statistics, you need at least 30 users.
  2. Specific quantitative analyzes: eye tracking (around 40 people), A / B testing (around sixty at least).

Use-case: Value of a tester

 The goal of this part is to evaluate the contribution of testers in a user testing approach of the usability of an interface.  

Let’s buy a bicycle in LeBonCoin !

The use case we chose was the following: go on the Leboncoin’s website in order to find a bicycle for the scientific director in Rennes (UXvizer’s headquarter). 

The 6 testers were aged between 21 and 36 years old and quite familiar with the website and its functioning. Their user journey has been recorded such as they can be analyzed according to:

  • some usability criteria (waiting time, transition types, scroll speed, wipes and patters),
  • accessibility criterion (contrast ratio of the textual part) and,
  • visual criteria (colors, empty zones, loading time).

User Journey type

The main representative user journey – among the 6 testers – is based on the following 12 screens :

Figure 2: Main screens of user journeys of 6 testers.

Main attention points

By averaging the tests of the 6 users, 13 attention points have been found in the Leboncoin’s interface:

  • Contrast ratio: some words – especially the location – are in light grays which imply a low accessibility level on these words .
  • Colorimetry : the color palette in the screen can be erratic. Well, this can happen when there are photos.
  • Empty areas : because of the images loading process, holes appear in the middle of screens. 
  • Patterns : linked to the search most of the time.
  • Inhomogeneous page transitions : wipes, fade-in, fade-out, fade-in, a mixture of all these transitions are used which lead to a lack of clarity. Let us note that this problem is solved in the app since the logo page is used with a dissolve manner to a context change.
  • Transition time : There is one user which had to spend 16% of its time because of transitions (weak connection signal).
  • Bug patterns in the loading process (repetitive pattern alternating: loading ‣ page loaded ‣ loading ‣ page loaded).
  • Waiting time (transitions + loading time) is quite present : on average the users wait for 10% of their journey.
  • 30 to 40% of time is spent to specify the localisation and find answers to the request. 
  • 10 steps are needed on average before reaching the answer of the first request (bicycle for women in Rennes).
  • So many little contexts are mainly associated with the loading process.
  • One user scrolls a lot and quite fast.

Testing approach : little is better than nothing

Now, here is the protocole we followed to check the rule : “3-5 users are enough to find 85% of critical issues.”

Randomly, one user is picked among the remaining testers and the additional found errors  are accumulated on the previous ones. All the combinaisons of possible order of users have been executed (720 combinaisons in total) and an average curve has been computed. The barplot in Figure 3 below shows the number of errors found at each additional tester.

Figure 3: Evolution of the number of found errors at each additional tester.

What we can observe is that :

  1. One tester procures a lot of benefits in usability validation. The gain between there is no user to test the interface versus there is at least one user to test is huge. Indeed, most of the critical errors (60%) are found with one user.
  2. Additional testers are beneficial since more errors are found but,
  3. The gain to add a new tester decreases with the number of users which have already tested the interface. 

What to keep in mind ?

For qualitative evaluation of your interface during the mockup, prototyping or development phase :

  1. One tester is better than none.
  2. You don’t need a lot of users: 2-5 users is enough. But, test your interface quite often and iteratively. Spare therefore time and money.
  3. Testing is an iterative process so even if all issues are not found, at least the most critical ones will be found. If there are remaining ones, they will be found in another user testing round.

Partagez ce post !

Vous aimerez peut-être aussi

Abonnez-vous à notre newsletter !

Une fois pas mois (pas plus) recevez les nouveautés concernant l’UX et l’UI.