Web Accessibility Testing: What Can be Tested and How

The Short Version

Read this if you’re disinclined to read the entire list of specific WCAG Success Criterion and look at how each can be tested.
If someone was to ask me what I consider to be my biggest strengths when it comes to accessibility, I’d say it is in testing. I’ve been involved in doing accessibility testing for the last 9 years of my career. I’ve used nearly every enterprise tool that exists for accessibility, been involved in developing them, developed my own, and have done hundreds of hours of manual code review, use case testing, and usability testing. As tedious and boring as some people make testing seem, I truly enjoy accessibility testing. I think of myself as a CSI, looking for clues left behind by developers. One thing I am interested in is finding the most efficient ways to do testing.

The thing about automatic testing is that there are some things which can be tested by machine testing quite reliably. For instance, either you have provided an alt attribute for IMG elements or you have not; either you have provided a LABEL for form fields or you have not. These things are easy to verify with machine testing and they should be done with machine testing – I call these “Automatically Testable”. On the other hand, there are things which can be tested for but the results would need to be verified by a skilled human reviewer. For example, while machine testing can verify that the alt attribute exists for an image, it cannot tell you whether the value supplied for the alt attribute is a suitable alternative for the image. That requires a human. Automatic testing can look for common types of failures, such as alt text that is too long, too short, or contains typically ambiguous words. I call these “Manual Verification Required”. Finally there are some items which are either too difficult for a machine to do or too subjective in nature for a machine to do with any degree of reliability. I call these “Manual Only”.

With the above in mind, I’ve undertaken an exercise to determine exactly what can be tested. Here is what I’ve found:

Testability of Best Practices by WCAG Level

WCAG Level	Auto	Manual Ver. %	Manual Only %
Level A	25%	29%	46%
Level AA	17%	41%	41%
Level AAA	23%	24%	53%

What does this mean?

In short, it means that relying solely on automatic testing is probably a bad idea. I still maintain that automatic testing has significant value due to increased efficiency, but it is very clear to me that one cannot rely only on a tool to test for accessibility. A skilled reviewer is critical and the more risk you have, the more important skilled humans become.

The data

These three tables detail how many items can be tested for using automated testing. Also provided are items which can be found with automated testing but require manual verification, and finally, those things which can only be reviewed manually. For a more detailed description see the section on Methodology, below.

Level A

S.C.	Total BPs	Auto	Manual Ver. %	Manual Only %
1.1.1	26	31%	35%	35%
1.2.1	5	40%	0%	60%
1.2.2	3	0%	0%	100%
1.2.3	3	0%	0%	100%
1.3.1	33	18%	64%	18%
1.3.2	8	13%	75%	13%
1.3.3	3	0%	0%	100%
1.4.1	3	0%	33%	67%
1.4.2	4	0%	0%	100%
2.1.1	8	88%	13%	0%
2.1.2	2	0%	0%	100%
2.2.1	4	25%	0%	75%
2.2.2	2	0%	50%	50%
2.3.1	4	50%	50%	0%
2.4.1	4	0%	100%	0%
2.4.2	2	50%	50%	0%
2.4.3	5	0%	20%	80%
2.4.4	2	0%	50%	50%
3.1.1	1	100%	0%	0%
3.2.1	4	0%	75%	25%
3.2.2	3	33%	33%	33%
3.3.1	2	0%	0%	100%
3.3.2	6	50%	17%	33%
4.1.1	3	100%	0%	0%
4.1.2	13	23%	69%	8%

Total % by type, Level A

Automatic: 25%
Manual Verification: 29%
Manual Only: 46%

Level AA

S.C.	Total BPs	> Auto	Manual Ver. %	> Manual Only %
1.2.4	1	0%	0%	100%
1.2.5	2	0%	0%	100%
1.4.3	1	100%	0%	0%
1.4.4	7	71%	14%	14%
1.4.5	2	0%	50%	50%
2.4.5	1	0%	100%	0%
2.4.6	4	50%	50%	0%
2.4.7	2	0%	100%	0%
3.1.2	1	0%	100%	0%
3.2.3	1	0%	100%	0%
3.2.4	4	0%	25%	75%
3.3.3	1	0%	0%	100%
3.3.4	2	0%	0%	100%

Total % by type, Level AA

Automatic: 17%
Manual Verification: 41%
Manual Only: 41%

Level AAA

S.C.	Total BPs	Auto	Manual Ver. %	Manual Only %
1.2.6	1	0%	0%	100%
1.2.7	1	0%	0%	100%
1.2.8	1	0%	100%	0%
1.2.9	2	0%	0%	100%
1.4.6	1	100%	0%	0%
1.4.7	1	0%	0%	100%
1.4.8	8	50%	25%	25%
1.4.9	1	0%	0%	100%
2.1.3	6	83%	17%	0%
2.2.3	1	0%	100%	0%
2.2.4	2	50%	0%	50%
2.2.5	1	0%	0%	100%
2.3.2	4	50%	50%	0%
2.4.8	1	0%	0%	100%
2.4.9	4	50%	0%	50%
2.4.10	3	33%	67%	0%
3.1.3	1	0%	0%	100%
3.1.4	2	0%	100%	0%
3.1.5	1	100%	0%	0%
3.1.6	1	0%	0%	100%
3.2.5	6	17%	50%	33%
3.3.5	3	0%	33%	67%
3.3.6	2	0%	*0%	100%

Total % by type

Automatic: 23%
Manual Verification: 24%
Manual Only: 53%

How I arrived at this data

The data presented above does not use or describe any information from any previous or current employer. Instead, I used my own empirical background in accessibility testing since 2003 by doing an exercise much like the following:

Create a list of all WCAG Success Criterion, organized by Level. There are 3 levels in WCAG, Level A, Level AA, and Level AAA
For each Success Sriterion create a list of the applicable Best Practices. In other words, answer this question: What does it really take to conform with this Success Criterion? Under Success Criterion 1.1.1, for example, I’ve created a list of 26 Best Practices that relate to that single Success Criterion. Others may have only one, most others range between 2 and 6
For each Best Practice, determine whether it can actually be tested for using automatic means.
1. If the answer is ‘yes’, then I mark that as an item under the ‘Auto’ column.
2. If a tool can test for a specific Best Practice, but the results must be verified by a human, then I marked it in the ‘Manual Verification’ column.
3. Finally, if the Best Practice is too complex or too subjective to be handled effectively by an automated tool, I marked it as ‘Manual Only’

Some Caveats

This list does not take into account the capabilities (or lack thereof) of any particular tool. The only consideration I gave was whether I feel that a tool should reasonably be able to test something effectively or not. Tools tend to vary quite considerably in their capabilities, so as the saying goes: Your mileage may vary.
Some tool vendors may voice disagreements with this list and assert that they can test for some of the items that I show as ‘0%’. I would argue that the reliability of these tests would be so low as to not be considered.
The fact that an item can be tested for automatically doesn’t mean that it can find every single instance of a violation automatically. This is especially true in the case of bleeding edge technologies (like WebGL, for instance)
Your list of Best Practices may differ from mine and may be more or less rigorous than mine. Mine, for example, are generally isolated to HTML, CSS, and Client-side scripting. To get a head start on determining your own Best Practices, take a look at the WCAG Techniques information

Why I created this resource

This list should not be taken as a stab against tools and tool vendors. I’ve said numerous times that we should do automatic testing first. When done properly, automatic testing offers a degree of efficiency that cannot be matched by other methods. However, that doesn’t mean that we should leave our compliance up to automated testing. As the tables above show, there are 9 WCAG Success Criterion (in Level A and AA) that cannot be tested for in any meaningful manner using a tool. There are another 13 that can be tested for automatically but require a human to verify. Full compliance and risk mitigation always requires the involvement of a skilled professional reviewer, even when you have a tool as well.

My company, AFixt offers full accessibility services from testing, training, consulting, and remediation. If you need help, get in touch with me now!