In Everything you know about accessibility testing is wrong (part 1) I left off talking about automated accessibility testing tools. It is my feeling that a tool of any kind absolutely must deliver on its promise to make the user more effective at the task they need the tool to perform. As a woodworker, I have every right to expect that my miter saw will deliver straight cuts. Further, if I set the miter saw’s angle to 45 degrees, I have the right to expect that the cut I make with the saw will be at an exact 45 degrees. If the saw does not perform the tasks it was designed to do and does not do so accurately or reliably, then the tool’s value is lost and my work suffers as a consequence. This is the case for all tools and technologies we use, and this has been the biggest failing of automated testing tools of any kind, not just those related to accessibility. Security scanning tools tend to generate false results at times, too.
I’m not sure if this is their exact motivation, but it often seems as though accessibility testing tool vendors interpret their tool’s value as being measured by the total number of issues they can report on, regardless of whether those issues are accurate. In fact nearly all tools on the market will tell you about things that may not actually be an issue at all. In this 2002 evisceration of Bobby, Joe Clark says “And here we witness the nonexpert stepping on a rake.” He goes on to highlight examples of wholly irrelevant issues Bobby had reported. From this type of experience came the term “false positives”, representing issues reported that are inaccurate or irrelevant and it is a favorite whipping post for accessibility testing tools.
It would be easy to dismiss false positives as the result of a young industry, because nearly all tools of the time suffered from the same shortcoming. Unfortunately even today this practice remains. For example, in the OpenAjax Alliance Rulesets, merely having an
audio element on a page will generate nearly a dozen error reports telling you things like “Provide text alternatives to live audio” or “Live audio of speech requires realtime captioning of the speakers.” This practice is ridiculous. The tool has no way of knowing whether or not the media has audio at all let alone whether the audio is live or prerecorded. Instead of reporting on actual issues found, the tool’s developer would rather saddle the end user with almost a dozen possibly irrelevant issues to sort out on their own. This type of overly-ambitious reporting does more harm than good on both the individual website level and for accessibility of the web as a whole.
No automated testing tool should ever report an issue that it cannot provide evidence for. Baseless reports like those I mentioned from the OpenAjax alliance are no better than someone randomly pointing at the screen and saying “Here are a dozen issues you need to check out!” then walking out of the room. An issue report is a statement of fact. Like a manually entered issue report, a tool should be expected to answer very specifically what the issue is, where it is, why it is an issue, and who is affected by it. It should be able to tell you what was expected and what was found instead. Finally, if a tool can detect a problem then it should also be able to make an informed recommendation of what must be done to pass a retest.
False positives (or false negatives, or whatever we call these inaccurate reports) basically do everything but that. By reporting issues that don’t exist, they confuse developers and QA staff, cause unnecessary work, and harm the overall accessibility efforts for the organization. I’ve observed several incidents where inaccurate test results caused rifts in the relationships between QA testers and developers. In these cases, the QA testers believe the tool’s results explicitly. After all, why shouldn’t they expect that the tools results would be accurate? As a consequence, QA testers log issues into internal issue tracking systems that are based on the results of their automated accessibility-testing tool. Developers then must sift through each one, determine where the issue exists in the code, attempt to decipher the issue report, and figure out what needs to be fixed. In cases where the issue report is bogus, either due to inaccuracy or irrelevance, it generates – at the very least – unnecessary work for all involved. Worse, I’ve seen numerous cases where bugs get opened, closed as invalid by the developer, and reopened after QA tester retests it because they’ve again been told by the tool that it is still an issue. Every minute developers and QA testers spend arguing over whether an issue is real is a minute that could be spent on remediation efforts for issues that are valid. Consequently, it is best to either avoid tools prone to such false reports or to invest the time required to configure the tool in a way that squelches whatever tests are generating them. By doing so the system(s) under development are likely to get more accessible and developers less likely to brush off accessibility. In fact, I envision a gamification type impact to this approach of only reporting and fixing real issues. A large number of these “definitively testable” accessibility best practices can often be quick to fix with minimal impact on the user interface. Over time, developers will instinctively avoid those errors as accessible markup will become part of developers’ coding style and automated accessibility testing can remain part of standard development and QA practices, instead finding anomalous mistakes rather than instances of bad practice. This possibility can never exist while trying to decipher which issues are or are not real problems because developers are instead left feeling like they’re chasing their tails.
Current automatic accessibility testing practices take place at the wrong place and wrong time and is done by the wrong people
The best way to avoid this is, as the popular refrain goes: “Test early and test often”. Usability and accessibility consultants worldwide frequently lament that their clients don’t do so. This website, for instance, happens to perform very well in search engines for the term “VPAT”, and about once a week I get a call from a vendor attempting to sell to a US government agency that has asked for a VPAT. The vendor needs the VPAT “yesterday” and unfortunately at that point any VPAT they get from me is going to contain some very bad news that could have been avoided had they gotten serious about accessibility much earlier in the product lifecycle. In fact, as early as possible: When the first commit is submitted to version control, and when the first pieces of content are submitted in the content management system. Testing must happen before deployment and before content is published.
Stay tuned for part 3 where I talk about critical capabilities for the next generation of testing tools.