Published: March 29, 2019

In the Accessibility and Usability Lab, when we test digital content for accessibility, we want to make sure our work has integrity and quality. As a result, we reflectively examine our process by considering what makes measurement strong in the realm of research. It is easy to make decisions based on evaluations that do not consider meaningful measurements. When purchasing toothpaste, I am likely to consider factors such as the cost, the brand, claims made on the packaging, and perhaps less consciously, the taste and color. However, the purpose of toothpaste is to improve oral health. None of the previous factors actually contribute to the effectiveness of a particular tube of toothpaste at doing what it is supposed to do. As a result, those measurements are not a valid evaluation of the toothpaste; a measurement is valid when it actually measures the intended target.

We want our process for evaluating accessibility to be open to critical inquiry and to be defensible, because that is a measure of its validity. If you were to arrive home to find your most valued possessions piled up around your neighbor’s living room, you might be understandably disturbed. However, if you had the opportunity to learn about the surrounding events and your neighbor’s choices, you might learn a water main broke on your side of the building, and your neighbor had time to rescue your belongings as the water rose towards your home. Maybe you think your neighbor should have just shut off the water, seeing as she works for public works in the city. Or maybe you wish she had also rescued the box of Grandpa’s writing you always meant to go through and read, but at least you will understand how things came to be the way they are, and you can evaluate if that was the best possible outcome.

In the same way, when we test for accessibility, we try to reduce the number of factors that will unexpectedly impact our results. We are aware that everything from testers to the platforms we use and if content updates can change how accessible we find content to be, but in the end, we do our best to take calculated approaches to our work, and be transparent about our process so others can understand how we find what we find. In the AUL, we carefully consider both the validity and transparency of our process. However, there is not necessarily an established standard that we have adopted for providing valid results; so, we have done and will continue to develop one that works for us.

Perhaps the most significant challenge to determining the validity of our evaluations has to do with the variety of definitions of the term “digital accessibility.” Since accessibility is the end goal, it is necessary to know exactly where that target stands. In the university world, my observation is that the most widely accepted definition of digital accessibility is the Web Content Accessibility Guidelines (WCAG) 2.1 (which until recently was 2.0). WCAG is a committee formed through W3, an organization responsible for establishing standards in how web content is created so as to increase the likelihood it will work across platforms and time. Since W3 and WCAG tell people what digital content should look and behave like in general, it makes sense that they would be the standard for what is accessible. However, it has been our experience in the AUL that this criteria cannot fully capture the accessibility experience of digital content. Criteria might say that individual parts of a process need to function under particular situations, but it is harder to create a criterion capturing the overall problem that a multi-step process is confusing to navigate. Furthermore, the Department of Justice has determined that WCAG 2.0/2.1 is not the only standard for determining accessibility. The DOJ currently says that an entity can use other methods, such as providing a 24/7 staffed call center for making content accessible (DOJ Says Failure to Comply with Web Accessibility Guidelines Is Not  Necessarily a Violation of the ADA.) In the AUL, we have decided to use WCAG 2.1 to support what we identify as problematic, but we will not limit ourselves to the constraints of the WCAG criteria.

Looking past WCAG criteria, there are many other ways to define accessibility. Accessibility could refer to what someone can use given the proper technology and background knowledge, or it could refer to what is easy to use. There is a significant difference between the two definitions, because assistive technology contains different features based on the product, and within each piece of software, there are different tools available to a user based on their skill level. It is entirely reasonable to expect that one user might be able to find a way to make part of a website work, while another user cannot. The difference between the two is based entirely on the users’ background and skill level. Conversely, content can be organized and operate in ways that make it likelier that a user will be able to quickly and easily use it. Another way of looking at accessibility might include the variety of devices on which content works. One way some blind and low vision users have found to increase their access to content is to use the mobile version of content; the mobile version is frequently less cluttered and uses more basic elements. Also, users might prefer using tools available on a mobile device, but not on a laptop. When content works on multiple devices, it gives the user freedom to use what works best for them. It is important to consider the many ways of defining accessibility in order to understand how our definition determines the validity of the AUL’s evaluations.

In the AUL, we take into account both what the content owner intends the user to be able to do and the self-reported experience of blind and low vision users when we determine how accessible content is. The evaluation process begins with a conversation with a representative in charge of managing a product to learn what they intend for all users to be able to do with it. Then, at the very least, a blind user will review the product and write a report on their experience. If we do a more complete evaluation, a sighted person writes a script to test all the functions offered by the application, then observes the blind and low vision testers as they work through the script. She then writes a detailed report based on the results of the test. Since accessibility is predicated on the needs of blind and low vision users, their experience is central to determining what is or is not accessible. We have developed this approach over the four Years of the AUL’s existence through a combination of learning about what other groups, mostly accessibility experts in the university setting, are doing, and our own examination of what works. We argue that our method of evaluation is valid, because we start by comparing our results to a widely accepted standard, WCAG 2.0/2.1. When we choose to ignore parts of WCAG, or we include issues not specified within WCAG, it compromises our ability to claim we know our evaluations are valid because they are part of a commonly accepted standard. However, we argue that our work is valid, even though we do not adhere to WCAG. When we steer away from WCAG, we verify by using a native user’s expert opinion. We use native users, because we believe that once a tester has visually evaluated digital content, they can only approximate the blind or low vision experience, so it is central to the validity of our work.

Reliability is the other concept usually paired with validity in research; it is the concept that illustrates whether the research is accurate and can be replicated. However, in qualitative research, where the goal is more to describe a variety of experiences in a specific context, we want to provide details that allow the reader to determine the level of confidence they place in the research process and the decisions made at each step.  When testing content in the AUL, we want our process to be transparent, so those relying on our work can trust it.

In her Book Qualitative Researching, 2017, Jennifer Mason provides a series of guidelines to increase one’s confidence in a piece of qualitative research. Mason (2017) provided the following guidelines for the qualitative researcher:

The research should be conducted systematically and rigorously

  • When a tester completes a review and writes up their findings, there is more variation inherent in the final report, due to the lack of structure in how they review content and the format of the write up. However, in a formal report, there is a script for the testing which is structured around the types of issues found during testing, relating them to the WCAG standards whenever possible, and providing details to fully describe the issues encountered and the impact the issues have on assistive technology users. We want our work to be transparent, and are partially attempting to make it more so in articles such as this one.

It should be strategic, flexible and contextual

  • Over the four years the AUL has conducted testing, we have developed an approach that thoroughly evaluates accessibility in a manner that takes into consideration our limited resources. We do our best to explain accessibility problems, considering that there are an infinite number of ways for a combination of factors to create new problems, and we are strategic in our approach. We have a solid base of familiar problems we can use in our evaluations, but we also do our best to adjust to new situations and include those in our work as well. Furthermore, our flexibility goes beyond what happens in each round of testing to include adapting to what people on campus need from digital content. With respect to contextuality, every evaluation begins with multiple conversations to determine what the test will look like, based on a variety of factors. There are some broad guidelines that shape each test, but we also know no two tests will be identical. We believe we can compare results from one test to another as far as broad themes such as how accessible the products were or the kinds of problems that came up, but we also know we cannot claim to apply our results broadly-- i.e. all menus should take “this” form- because different factors come to play in each test.

The researcher is accountable for its quality and claims

  • The AUL is the only group on the CU Boulder campus (and one of  few groups in the US), that base their work on the expertise of native users, versus on a more developer-centric expertise. We are also only one of the few accessibility testing shops that does not rely on automated testing, and are thus able to dive deeper into the user experience. We deliberately chose our focus, knowing it will shape the kind of information we’re able to provide to our audience. As a result, we excel at conveying the user experience, and we are often times not able to recommend a solution to a problem. All of this being said, we stand behind our decision to build our expertise on accessibility and user experience components, providing what we can on the solutions side, while acknowledging we do not have the capacity to provide solutions to all the iterations of digital content and accessibility issues we find.

He/she should engage in critical scrutiny or active reflexivity

  • We attempt to be self-reflective in a variety of ways. At least once a year, we have a retreat to analyze what we are doing, how we are doing, and where we are going next. We also follow professional groups such as the WebAIM and Educause Accessibility list serves so we can stay abreast of what other professionals are doing. We attend a variety of conferences, such as Colorado Learning and Teaching with Technology (COLTT), Accessing Higher Ground (AHG), the American Anthropological Association (AAA) annual meeting, etc. to get a variety of perspectives, as well as to present our work. In addition, every test report that is produced is reviewed and critiqued by someone other than the author. All of the above interactions give us an opportunity to critically analyze what we do, as well as a bigger picture of why we are doing it.

The above points come together to create a research audit trail. The audit trail allows others to follow the steps taken and the decisions made around those steps in reaching the end product. It allows the reader to decide if the process has led to trustworthy results. There are many automated accessibility checker tools out there, attractive both for their ability to review thousands of pages on a regular basis and for the impression of providing an objective answer as to whether content is accessible. If an accessibility checker can provide a seemingly objective answer about accessibility, than the audience might not feel as though they have to look critically at the process behind the conclusions. Because we are identifying a quality that is intrinsically based on human experience, and since we are working in a realm where there is not an extensive body of research establishing best practices, it is important to look critically at the process.

In the AUL, we regularly compare our evaluation strategies to with others doing similar work. We want to stay at the forefront of conducting meaningful evaluations, and we are always thinking critically to improve our work. I mentioned that we mostly base our testing on WCAG criteria, but we do not limit ourselves when we can present a more comprehensive report by going beyond WCAG criteria. WCAG is a tool we have available to us, and the ability to cite it strengthens our results, but our evaluations are stronger in that they are not limited to the dimensions of discreet criteria laid out in WCAG. CU Boulder’s policy and standards currently identify WCAG as the definition of accessible content, but hopefully that will expand going into the future, based on our experience testing for accessibility in the AUL.



Mason, J. (2017). Qualitative researching. Thousand Oaks: Sage.