Efficiency in Accessibility Testing or, Why Usability Testing Should be Last

Accessibility testing is the testing of a system to determine how the system will perform when accessed by persons with disabilities. Testing of all ICT products and services is important to determining what issues exist and what risk any accessibility issues create for the organization. In my experience, the majority of testing efforts are undertaken after the fact. That is, the system being tested is already in production. As a consequence, it is important to ensure the test effort is efficient at finding the most accurate data in the quickest fashion – in other words, cheap and fast. Doing so hopefully means the development team will also be fast at fixing the errors as well.

Many accessibility advocates argue that since accessibility is really all about the user, the best way to know whether the system is accessible is to test it with real people. This is obvious but does not, in my opinion, mean usability testing is the only (or even preferred) way of doing accessibility testing. In fact, I am of the opinion that usability testing should be the last type of testing performed on a system. This is due to the time and expense of usability testing and the fact that cheaper and faster methods can be used prior to usability testing which are likely to make the eventual usability test even more valuable. Let’s look at the variety of testing approaches and discuss each.

Automatic Accessibility Testing

Automatic accessibility testing is the use of a desktop or web-based application, web service, browser plug-in or IDE plug-in which can access the source of the web page(s) to be tested. The source is then assessed against the tool manufacturer’s built-in heuristic checks to determine if any errors exist.

What can I find? Automated testing affords the ability to definitively test for roughly 25% of accessibility best practices. Another 35% or so can be checked by machine testing but the results require human verification. Many of the automatically discoverable issues will be high impact.
What will I miss? Approximately 40% of accessibility best practices cannot be tested for at all using an automatic testing tool because they’re either too subjective or complex to test for automatically.
How much does it cost? It costs as much as the tool and the staff’s time to use it and to triage issues discovered.

Despite the fact that a large number of accessibility best practices cannot be tested for automatically the fact remains that, by volume, a large amount of accessibility errors can be found automatically. Data I have shows that on average, when an organization is new to accessibility their sites will have about 40 errors per page that are discoverable via automatic testing. If the tool you use has the ability to do spidering, this will make automated testing by far the most efficient means of gathering accessibility data.

Manual Accessibility Testing

Manual accessibility testing techniques can vary pretty widely. Manual accessibility testing can involve code inspection, hardware manipulation, changes to software settings, or the use of assistive technologies to test for specific accessibility best practices.

What can I find? Well, for the most part, everything. That is, if you’re sufficiently skilled and have enough time.
What will I miss? That which you don’t have the skills or time to find.
How much does it cost? The fully loaded salary & benefits of staff person doing the testing or the billing rate of your consultant.

Manual testing has the benefit of being highly accurate, provided the person doing the manual testing is sufficiently qualified. Both effectiveness and reliability of the expert judges are significantly higher than non-expert judges, which can mean significant differences in the quality of results from such testing. Furthermore, manual testing can be very time consuming. Depending on the page features & complexity and thoroughness of the testing, manual testing can take between 1-3 hours per page. You can see then that it won’t be possible to test every page of a large site using manual testing only. Doing so would be prohibitively costly and time consuming. It also makes very little sense to use humans to test for things that an automated tool can catch. This is why I advocate doing automatic testing first.

Use Case Testing for Accessibility

In software and systems engineering, a use case is a list of steps, typically defining interactions between a role and a system, to achieve a goal. Use case testing therefore, is testing how the system responds to those interactions. In the case of use case testing for accessibility, the practice is to perform the test case using assistive technologies to determine how the system performs when used by a person using that assistive technology.

What can I find? Typically what is found is a close representation of the system’s real level of accessibility and, most importantly, how severe any accessibility errors are.
What will I miss? Often what will be missed are the specific in-the-code flaws that cause the test case to fail.
How much does it cost?The fully loaded salary & benefit of your staff person or the billing rate of your consultant doing the testing. It will also cost to procure the necessary licenses for the assistive technologies being tested with.

I love use case testing. The primary problem with use case testing is that the quality of results can vary pretty significantly. As with manual testing, the real value of this type of testing depends largely on the skill of the people doing the tests. Also, it is important to note that use case testing is not usability testing. Because test cases are scripted and the testers may be intimately familiar with the system (and possibly the assistive technologies they’re testing with), good results in use case testing will not correlate to good results in usability testing. Finally, success with one specific type, brand, and version of an assistive technology will not necessarily correlate to success with all assistive technologies. Still, this is a highly valuable type of testing, especially when done by a skilled tester.

Usability Testing

Usability testing is a technique used in user-centered interaction design to evaluate a product by testing it on users. Traditionally, the tests performed are a cognitive walkthrough using a think-aloud protocol, though other methods exist as well. In the case of usability testing for accessibility, the tests involve the use of persons with disabilities as the test participants.

What can I find? You can find out exactly how easy (or difficult) it is for real users with disabilities to use the system. As with use case testing, you can find specific interactions which are problematic or even impossible to complete.
What will I miss? As with use case testing, you will often miss the specific bugs in the code that are causing problems.
How much does it cost? The fully loaded salary & benefits of your staff person or the billing rate of your consultant doing the testing. It will also cost any necessary lab/ equipment rental or procurement and any necessary transportation and stipend costs for participants.

Usability testing delivers one key data point that no other type of testing can provide: It tells us how bad our problems are. While it is true that a highly skilled reviewer or use case tester can give us their experienced insight into how bad a problem is (and they are very often right) there’s nothing as accurate as real users having real problems. Unfortunately, doing usability testing (for accessibility) too early may be a poor use of time and money if the usability tests are hampered by obvious accessibility errors.

How & when to do each type of testing

I am of the opinion that doing any testing for accessibility, regardless of methodology, must be aimed at one primary goal: Gather the data necessary to understand what needs to be fixed to make the system better. The end result of any testing effort must be more than just a vague report saying that stuff is broken. In order to truly deliver value, you must supply the executive and development staff with a clearly delineated list of necessary improvements so the developers of the system can take the information in the report and fix the system. The further that our test results deviate from that goal of a clearly delineated issues list, the less value delivered by your test efforts. This is a basic tenet of SQA. The report for each issue must be accurate, descriptive, and helpful. Items that are vague and not (obviously) actionable are likely to be ignored or, at the very least, take longer to verify and fix.

Not only do we need to be able to generate a list of issues that is highly detailed, but I am of the opinion that we should be able to do so as quickly as we can. Our bosses (or clients) care most about getting the job done right, getting it done fast, and getting it done cheaply. You must gather the data necessary to determine very specifically exactly what needs to get fixed, and you must do so with what is quite possible significant limitations of time and money. Each method has its strengths and weaknesses. Each of them have a time when they’re appropriate & a time they are not appropriate.

Comparison of the four different testing types

Note: Above statements on quality, time, and cost is based on the per-issue quality, time, and cost.

Type of Testing	Quality of Issues Report	Time	Cost	Represents Users
Automatic Testing	Highest Volume, High Detail	Fastest	Cheapest	Lowest
Manual Testing	High volume, Highest Detail	Slow	Moderate	Moderate
Use Case Testing	Low volume, Low Detail	Moderate	Moderate	High
Usability Testing	Lowest volume, Lowest Detail	Slowest	Highest	Highest

Above, I’ve rated each type of testing based on a couple of factors I think should be considered when choosing which type of testing to be used (and when):

Quality of issues report: Will the report contain a full list of the specific issues that exist (volume) and will the individual issues reported contain enough information (detail) to verify and fix the issue?
Time: How long will it take to find the issue(s) and complete the testing?
Cost: How much will it cost to find the issue(s) and complete the testing?
Represents users: Does this type of testing represent the experience of real users of the system?

The above information isn’t without its caveats and I do make some assumptions:

The quality and characteristics of automated testing tools can vary quite significantly.
The usefulness of an automated tool depends highly on the tools’ proper configuration.
The skill level of the person(s) doing manual and use-case testing is vital for ensuring quality results.
There is an array of usability evaluation techniques. I’m mostly referring to in-lab testing with participants doing a cognitive walkthrough using a think-aloud protocol. More on that below.

Due to everything I’ve said above, I tend to rely on each method in the order I’ve been listing them: Automated, Manual, Use Case, Usability. I believe strongly that we should do automatic testing first. In fact, I don’t think we should do any other type of testing on a system while it is failing automated checks. We should never be paying humans to tell us alt text & form labels are missing. Such things are machine testable and automated testing is a cheap, quick, and efficient at finding very specific items in need of repair. It is, however, far from perfect and can often miss some very important issues. You cannot merely stop with automatic testing. This is where manual testing comes in. A skilled reviewer, using a well-defined & structured methodology can close most of the gaps undiscovered by automatic testing. A highly skilled manual reviewer can also determine the metrics necessary to assist in prioritizing the remediation of the issues found. But once the automated testing and manual testing are done, then what? This is where use case testing and/ or usability testing come in. Such testing, I feel, is the ideal way to supplement the automated and manual accessibility testing.

Why Usability Testing Should Be Last

Traditional usability testing of the type I described earlier is, in my experience, the most time consuming and expensive way of gathering accessibility testing data:

You need to have the necessary lab equipment and a location in which to test
You must recruit the proper participants. This is easy to do by yourself for a one-off test but much more difficult if you do it frequently, which may necessitate hiring a recruiter.
You must pay the participants. Typical stipends are $50-$100 per participant and tests often have between 8 and 12 participants.
You need to have a test system that mirrors the participants’ test systems. Do they use JAWS or Window-Eyes or ZoomText or MAGic? What version? What operating system? What browser? Etc.
How do you get the participants to your test location – especially if they can’t drive and you’re not in a major metropolitan area? You may have to go to them, which would necessitate a portable lab.

One time when I mentioned the above, someone made a snide comment that this all looked like Reductio ad absurdum. They’re mostly right, I’ll admit. There are lots of ways to do cheap usability testing. Jakob Nielsen talked about Guerilla HCI back in 2004. You don’t have to have a lab or equipment. You don’t have to go to the participant or even have them come to you, you can do it over web conferencing like GoToMeeting or WebEx. But if you’re going to do usability testing on a large scale (meaning multiple times per year) you do have to sort out how to recruit new participants each time, you do need to pay the participants, you do have to design your tests, conduct the test sessions, and collate & analyze the data from your notes. All of this is time and money – often significant time and money when done frequently.

Understanding how to test, when to test, and what to test for accessibility is something that I think often eludes even people who’ve been involved in accessibility a while. This is because their focus is (and rightly so) the user and therefore testing with real users seems to be the logical best choice for validating accessibility. I argue, however, that compared to the other types of testing for accessibility, usability testing is an inefficient means of reaching our end goal. Our end goal is not the results of the test but rather an accessible system. Because of the importance of accessibility we should seek to find data accurately and efficiently. For that reason alone, usability testing should be done later in the lifecycle – after automated, manual, and even use case testing.

My company, AFixt exists to do one thing: Fix accessibility issues in websites, apps, and software. If you need help, get in touch with me now!