Today I came across a post by Simon Harper titled Web Accessibility Evaluation Tools Only Produce a 60-70% Correctness which is essentially a response to my earlier critique of a seriously flawed academic paper. I submitted a response on Simon’s site, but I want to copy it here for my regular readers. One thing that specifically bothers me is why do the responses continue to dodge the specific challenges I raise? You cannot claim something without evidence and you cannot supply data for one thing and claim that it leads additional, wholly unrelated conclusions. So, here goes:
Simon,
Good post, and thank you for the response. It is unfortunate, however, that you didn’t read or respond to what I wrote. It is also unfortunate that the paper’s authors have similarly chosen to not respond directly to my statements. The blanket response “well, just replicate it” is an attempt at dodging my response and my [specific] criticisms of the paper (which again, you admittedly haven’t read). Furthermore, there’s little use in attempting to perform the same experiments when the conclusions presented have fully nothing to do with the data.
You said:
“Web accessibility evaluation can be seen as a burden which can be alleviated by automated tools.”
Actually, they don’t say that.
“In this case the use of automated web accessibility evaluation tools is becoming increasingly widespread.”
No data is supplied for this at all.
“Many, who are not professional accessibility evaluators, are increasingly misunderstanding, or choosing to ignore, the advice of guidelines by missing out expert evaluation and/or user studies.”
No data is supplied for this at all.
“This is because they mistakenly believe that web accessibility evaluation tools can automatically check conformance against all success criteria.”
No data is supplied for this at all.
“This study shows that some of the most common tools might only achieve between 60 and 70% correctness in the results they return, and therefore makes the case that evaluation tools on their own are not enough.”
Of all the things you said, this is the only thing actually backed by the data from the paper. Literally everything else is a case of affirming the consequent.
The data that they do present is very compelling and matches my own experience. The significant amount of variation between the tools tested was pretty shocking as well, and once you get past the unproven, hyperbolic claims, it is very interesting.
If this paper’s authors were to gather and present actual data regarding usage patterns (re: the claim that “the use of automated web accessibility evaluation tools is becoming increasingly widespread”) then I wouldn’t be so critical. There is no question that the data needed to substantiate this and similar statements simply isn’t supplied.
Finally, I’d like to address the statement “evaluation tools on their own are not enough”. As I say in my blog post, this is so obvious that it is hardly worth mentioning. No legitimate tool vendor says this. I’ve been working as an accessibility consultant for a decade. I’ve worked for/ along/ or in competition with all of the major tool vendors and have never heard any of them say that using their tool alone is enough. Whether end users think this or not is another matter. Again, it’d be great if the paper’s authors had data to show this happening, since they claim that it is.
The implication from this paper is that because tools do not provide complete coverage, they should not be used. This is preposterous and, I believe, born from a lack of experience outside of accessibility and a lack of experience in a modern software development environment. Automated testing, ranging from things like basic static code linting, to unit testing, to automated penetration testing is the norm and for good reason: it helps increase quality. But ask *any* number of skilled developers whether “passing” a check by JSHint means their JavaScript is good and you’ll get a universal “No”. That doesn’t stop contrib-jshint from being the most downloaded Grunt plugin (http://gruntjs.com/plugins). Ask any security specialist whether using IBM’s Rational Security is enough to ensure a site is secure, and they’ll say “No”. That doesn’t diminish its usefulness as a *tool* in a mature security management program.
Perhaps what we need most in terms of avoiding an “over-reliance” on tools is for people to stop treating them like they’re all-or-nothing.