Attention!

The content on this site is a materials pilot. It represents neither changes to existing policy nor pending new policies. THIS IS NOT OFFICIAL GUIDANCE.

Testing, testing, 1-2-3


Iterative development

Ask how the state approaches security, performance, and migration testing.

Ask how project leads interact with the testing process.

  • Bad: The team cannot answer what types of testing they are doing, only that they test at the end of the process.
  • Meh: The team can describe their testing approaches but testing is done by a siloed team.
  • Good: The team can demonstrate their testing approaches. Testing and development is done by the same team.

Iterative development

Ask how developers test changes before delivering a feature for review.

  • Bad: Developers don't test changes before adding them to the project.
  • Meh: Developers manually test features, but don't do automated tests.
  • Good: Developers can show how they run manual and automated tests before adding changes to the project.

What's this about?

What types of testing should a software project undergo and what does testing a project even mean? In this conversation, Princess Ojiaku, Matt Jadud, and Heather Battaglia talk about what kinds of tests are important and when they should happen. This conversation includes a guest appearance by SHOUTYBOX, the screen reader that lost its way. Bonus: a demo video of a screen reader.

Lesson outline

Active listening: A conversation on software testing (~1h, solo)

This lesson's material is a conversation between Princess Ojiaku, Matt Jadud, and Heather Battaglia. We talk about what kinds of tests are important and when they should happen. This conversation includes a guest appearance by SHOUTYBOX, the screen reader that lost its way. A full transcript is (with a few "ums," "ahs," and flubs removed) is provided below, if you'd rather read the conversation.

A conversation about software testing.

Listen actively

As with previous conversations, listen or read actively:

  • Keep a thread running in the back of your mind regarding your own projects, and turn up your “bullshit filter.”
  • Pause the conversation at any point that you hear something that makes you think about some aspect of a project you’re working with that makes you… wonder.
  • Reflect and take notes on the themes and ideas discussed to share with your group. In an active listening/reading process, it is this reflection, not the content itself, that is most valuable.

The process of listening actively takes time. You can listen closely on a first pass or have a first listen on a walk, then sit down and do a second listen where you take some notes. Do what works best for you.

A conversation about software testing

Matt Good morning, my name is Matt Jadud. I’m an innovation specialist at 18F. I work in the Engineering Chapter and I’m here with two colleagues, in particular my co-conspirator on the SO, MD course. Princess, would you mind introducting yourself?
Princess Hello, I’m Princess Ojiaku and I’m also an innovation specialist here at 18F in the Content Chapter…Design Chapter. I want to introduce our special guest. I’ll have her say hello.
Heather Hello, I’m Heather Battaglia. I’m the current director of engineering at 18F.
Matt Heather, would you mind, we wanted to – Princess and I had been wrestling with the concept of testing. It seems like it’s not a small thing and that’s what we want to have a bit of a conversation about here today. We want to focus it in a way that makes sense to our colleagues, the state officers, but also want to put it terms of your background and expertise. So would you mind sharing a bit more about your background for us?
Heather So my background is primarily in front-end development. I started out in journalism doing charts and graphs and all sorts of fun interactive things for some newspapers and segwayed into civil service a few years ago. My primary focus is “front of the front-end”, as you might say. That includes things like accessbility and mobile testing and browser compatibility and working very closely with designers to make sure that things are working properly. I have experience across the stack but primarily I’ve focused on those very “user-facing” pieces of applications.
Princess That’s awesome. So front-end just means that you just touch everything that the user sees directly?
Heather Yeah, I work in between the user operates and where you might have a database that stores the information that the user enters. So I’m in that in-between layer where I’m talking to your data store and I’m also showing that data to the user in a way that makes sense to them and making sure that they’re having a good experience – kind of passing information back and forth across the application.
Matt That in-between space – so you talk about that in-between space is that an easy place to be? And is it easy making things work right for human beings?
Heather I don’t think there is a space where one is making things that work for human beings that could be considered easy.
Matt That’s fair. So if it’s not easy and we’re talking about testing and you’ve talking about this complex space. We’ve already got some material where we’re talking about data and data migration and what it means for that data to be robust and reliable. And so you’re talking about now how do we make thigs work for people…Would you mind giving us a bit of a sense of what does testing feel like as you look at each of those directions? So if I imagine that I’m standing in the middle and I can look back at the database and look forward to the people.
Heather Sure. Testing is one of those phrases that gets thrown around software development a lot and it can mean any one of dozens of different things depending on which piece of the application you’re focused on at any given time. So for a couple of extreme examples, you can have unit tests, which test a tiny little piece of code, either on the front-end or the back-end, and will tell you if that tiny little piece of code is functioning that way that it should be. It will not tell you if code that relies on that piece of code is functioning correctly, however. For that, you want to go for something called an intregration test, that will test across all pieces of your application to see if something is functioning in entirety, the way that you expect it to. One of my favorite real-world pictures of this is if you’ve ever been in a public bathroom where you have a hand drier sitting atop a paper towel dispenser, or a hand drier sitting atop a trash can where one might throw paper towel, and when you move across that automatic sensor, the drier goes off and it blows paper towels everywhere. That’s an example of when an intergration test might be really helpful because both pieces might be functioning as you expect but they don’t function together as you would expect. So those are a couple of differnet types of testing. When you’re talking about back-end vs. front-end tests (you can get more specific in there also) you can test things like whether a migration is happening in your database correctly. Those are definitely things you want to write tests for to ensure those pieces of code are functioning correctly. On the front-end, you can write things like regression tests, which will indicate whether or not something has had an unexpected visual impact on your user interface. You can also do things like accessibility testing, which really is not testing in the sense that the other thing are testing, because it’s a manual process, mostly. The first question I ask When I’m approaching a piece of software is, What needs to work regardless of what changes? What are the things that should be consistently true of this piece of software? Those are the things that you want to test. How you test them varies on where you are in the application.
Princess Thank you. I love the bathroom example. It really ties it together. So many questions I have about that! You mentioned that there’s manual tests and automated tests. Once you’re looking at a piece of software you’re kinda of determining what you need to test. And some things can be automated and some can’t. How do you know what’s what? Are there some types of testing that are usually automated or some types that are always manual? And how would you know what to do and what to look for there?
Heather That’s a really good question. Generally the closer you get to where the user is, the more manual testing you will have. Because on the back-end, data layer side of things it can be a lot easier to write very concrete tests that test very specific things, but that’s not how humans work.So the closer you get to the spot at which a human starts interacting with the system, the more likely it is that you going to need to supplement your automated testing with manual testing. And that’s particularly true when you’re talking about things like cross-browser tests. It’s particularly true when you’re talking about things like accessbility or testing for mobile devices. There are automated tests that you can run for all of these things, but they’re never going to catch all of the edge cases that your audience might encounter.
Princess That makes total sense because I think humans are probably a lot less predictable. You don’t know what people are going to do when they go to your site, or your application, or whatever and start trying to use it. So you might automate some things and find that that’s not how they use it at all.
Heather Yeah something like a usability test, for instance, is really something tht you have to run with someone in the room. There’s no way to automate that at all because it’s so dependent on human behavior.
Matt I really like this thread. So when you talk about useability tests and accessbility tests, could we unpack that just a little bit? And the reason I want to is, we’re talking about a context where we’re delivering IT and application services in the context of Medicare and Medicaid. We’re talking about software systems that hit a huge swatch of the population, but the people it needs to reach need the services.So what does this notion of accessbility and usability start to look like and what kinds of…can you say a bit more about what that space looks like?
Heather Sure. In general, if your software doesn’t work for the audience that you’re serving, your software doesn’t work. Usability tests and accessibility tests are some basic tools to help you make sure that the audience that you’re trying to reach knows, understands, can use your software in a variety of different contexts. They may be really strange contests or you may not agree wit hhow they’re trying to use things but that doesn’t mean that their use cases aren’t valid. A good example of how not to make something usable is to rely on large amounts of instruction texts. One of my favorite examples of how not to do this is any time you’ve installed a mobile app on your phone and before you start up the app it wants to walk you through four or five different pages of, “Here’s where things are! Here’s how you use this app!” That’s not a very usable app because you shouldn’t need multiple screens of text explaining how the user should interact with the application. Those are the kinds of problems that usability testing can really help with. And at it’s most basic, usuability testing is a UX researcher sitting down iwth someone who might use the application and asking them some questions, asking them to complete some basic tasks, and watching them use it and taking notes of where they struggle or where they try to do something that you haven’t enabled or they guess wrong or they have expectations that don’t align with your software. Accessibility testing is a little bit different. Accessibility testing is about making sure that folks with different physical, mental capabilites are able to interact with your app or with your software. This can include anyhting from visual impairment, which is one of the most common test cases. There’s also keyboard navigation, which a large part of this for folks who don’t have the dextrity to use a mouse or a touchpad. And there can also be colorblindness considerations when you’re doing something patciularly around maps and colored indicator dots. That was something I have dealt a lot with in my previous life as a journalist was ensuring that colorblind folks could tell the difference between what a red dot and a green dot meant on a map. But accessibility testing is by and large not something that you can automate all that well. There are browser extensions and tools that you can use that will flag errors in your html, but they catch a lot of false positives.
Princess So what kind of tools, or how do people with different abilities use the application? I know there’s screen readers for people who cannot see the screen. How do those work and do you make sure what you’re making can work with that?
Heather That’s a great question. Screen readers are new territory for a lot of folks who haven’t done that kind of testing before. And there are great resources for people who want to learn how to do it. Both Windows and Apple computers ship with built-in screeen readers. You can activate this from an acccessibilty menu within both of them if you want to try it out. Esssentially, the way they work is they will read the page out to the user and they will tell the user when they hit things like buttons or navigation or inputs. They will, let’s say you’re filling out an address form. You will probably tab through that with your keyboard because most visually-impaired users rely on keyboard navigation. When you hit an input box, the screen reader will tell you what the label for that input box is and what type it is. So if you’re filling out an adress form, the screen reader will tell you that you’ve hit the input box that requires your first name. There are also some interesting cases where, for instance, if you put in a password and it has the correct type associated with it, the screen reader will tell you that you’ve hit the password box and then will not read out the input that you put in, as a secuity feature. As a general rule, when you type, the screen reader will read back letters and words to you as you type them so you can tell if you need to go back and fix something.
Princess I can see why you wouldn’t want it to shout out your password!
Heather [Laughter] That was really concerning the first time I tried, and then realized, naturally, people have already thought about this. So, obviously it is really important to get the order of things correct, and to make sure that the screen reader is actually telling the user the correct thing about whatever they’ve loaded, or are interacting with. This can go spectacularly wrong, and you really want to catch that before it goes spectacularly wrong.
Matt Oooh. This sounds like the voice of experience.
Heather Well, the only way you get to be an expert at anything is by screwing it up a whole bunch of times. So, the most dramatic example I have of a time I caught this—in my own code! I will readily admit this is something I misarchitected from the beginning—we were creating this incredibly complex form, that was very interactive, had a lot of moving pieces, and depending on what the user input, different pieces of the form would appear or disappear. That’s a really challenging accessibility issue. When I went to test this with a screen reader, I realized that, through some fluke of the way we had built the app, it was shouting random things at the user when you loaded the page. So, you would load the page, and the app would shout random dollar amounts at you, and then would jump directly down to this group of yes/no input toggles, and would just yell “NO! NO! NO! NO!” about sixteen times, before it even told you what page you had landed on, or what form you were looking at.
SHOUTYBOX [Fade to the computerized voice of Apple Voiceover] VOICEOVER ON SAFARI CMS HITEC. APD WINDOW. MONTH IS REQUIRED. DAY IS REQUIRED. YEAR MUST BE FOUR DIGITS. FOUR HUNDRED FIFTY FIVE THOUSAND DOLLARS. FOUR HUNDRED FIFTY FIVE THOUSAND DOLLARS. FOUR HUNDRED FIFTY FIVE THOUSAND DOLLARS. FOUR HUNDRED FIFTY FIVE THOUSAND DOLLARS. THREE HUNDRED THOUSAND DOLLARS. THREE HUNDRED THOUSAND DOLLARS. THREE HUNDRED THOUSAND DOLLARS. THREE HUNDRED THOUSAND DOLLARS. THREE HUNDRED AND SIX THOUSAND DOLLARS. THREE HUNDRED AND SIX THOUSAND DOLLARS. THREE HUNDRED AND SIX THOUSAND DOLLARS. ONE HUNDRED TWENTY-TWO THOUSAND DOLLARS. ONE HUNDRED TWENTY-TWO THOUSAND DOLLARS. ONE HUNDRED TWENTY-TWO THOUSAND DOLLARS. ONE HUNDRED TWENTY-TWO THOUSAND DOLLARS. NO. NO. NO. NO. NO. NO. NO. NO. NO. NO. NO. NO. NO. NO. NO. NO.
Heather That was bad. That was was not a good experience for anybody involved. We called this the shoutybox afterwards, because it was essentially a misapplication of ARIA alerts. The screenreader assumes whatever is in that alert is the most important thing for the user to know as soon as there is content there, and so it will interrupt everything else the user is doing and yell at them. And that was what we had done.
Princess I do want to ask one followup question from that. So, it sounds like, with this application, the alert system for the screenreader was just going haywire. It’s like something you built in saying “This is very important, read this out!,” but it was on overdrive. I’m wondering if testing… how would you test that? Is that something you just catch when it’s the first time you do it? Would you do it after it was released… which, obviously that would be a bad time to do it? [Laughter] How do you catch something like that, and when do you test?
Heather So, I waited a little bit too late to test on this. We hadn’t released it (thankfully), but I needed to go back and rework a bunch of things I had done that were all based on the assumption that was going to work the way I expected it to… which it obviously did not. Generally, you want to be testing all the way through the development process. You want to start testing as early as makes sense. Ususally that means whenever you have a piece of code that is complete in-and-of-itself, you wnat to test that piece of code. It can be really hard to backfill those tests. If you wait until you’re ready to release an application before you write all of your tests, you’re going to find problems that you didn’t know existed. You’re going to have to fix them at the very end of the development cycle. And sometimes that can change assumptions that other pieces of the application are making. In my instance, I had to go rework three or four different input sets that had all assumed that this was going to be a system that worked. If I had tested this at the beginning when I had the first set done, then I could have written the other pieces in a way that wouldn’t require fixing afterwards. This is true of all testing, regardless of whether you are talking about testing migrations for your database, or testing the way your data interacts with your front-end inputs, or testing the way that it responsiveness works for mobile. The sooner you start testing things, the less you will make for yourself in the long run, and the more robust you will be able to make things when they are finished.
Princess It sounds like, as soon as you have something, you need to go back and test it so that you won’t need to go back and repeat work. If you’re looking at testing something you want to test, or you’re looking at… maybe a state is doing it, and if they’re not testing at the beginning or continuously, then maybe that’s not the best way to do it.
Heather There’s also… you will catch problems sooner. So, this can be really important when you’re talking about the way data flows between different pieces of your application. You might write something that interferes with the way that data flows all the way back to your data store without realizing it, because you’re not testing for it. You want to catch those problems as soon as possible, because those are going to change the assumptions that other pieces of code are making. They might break yet other pieces of code, and you won’t know that they’re broken, because you don’t have tests for them. So, even moreso than preventing future work for yourself, tests can prevent future bugs that you don’t know about, that can prevent your app from breaking for your audience, and you not knowing that it’s broken.
Matt Testing is… hmm… sounds like it’s bigger than people might think. And, it’s potentially easy, if a developer or vendor is working in a waterfall model, all of this testing is likely to be pushed to the end. And, from what you’re saying, there’s no end to the potential dangers, challenges, or problems that might cause a vendor to come back and say “we need another year on the contract before we can ship.” So, if we were to wrap this up a bit, my question as a State Officer should be… what kind of questions should I be asking? Are there questions around who is doing the testing? Questions around when? How would I wrap up these lessons learned in terms of both that “standing in the middle, looking at the front and back end,” in terms of the “many kinds of testings,” from unit testing, to integration testing, to accessibility and usability… any final thoughts?
Heather All engineers should be responsible for testing the code that they submit. There are some ways that, as a non-technical product owner, you can watch for signs that testing is being done. One of those is by using code coverage tools. Code coverage tools will tell you how much of the codebase is currently “covered” by a test. For parts of your application can have automated tests written for them, code coverage tools can give you a signal of how much that codebase is covered by a test.
Princess Would that be like a percentage or something? Like, we’d say “100% of this is covered,” or 90%, or something like that?
Heather Right. It would be a percentage of the lines of code in the codebase. You will almost never have 100% coverage, because there will always be things you can’t test, or you can’t write a test for in a way that a code coverage tool understands. When you’re working with a vendor, typically a quality assurance plan should have some sort of indication of the percentage you expect your code coverage to have. I believe most of the time we shoot for 90-some percent coverage, which is pretty high. But, code coverage tools aren’t enough in-and-of-themselves to tell you how many tests you have, and how much of your code covered… but they won’t tell you if you have good tests. This is where you really need someone working with you, on the state side, who has a depth of technical level to be able to assess the tests that are being written. And, that person can say “Yes! This is testing the right things!,” or “No! This isn’t testing the right things!” To make this a little bit less abstract, it’s absolutely possible for you to write a test that is self-sufficient. So, if you are testing data flows, and the tests that you write to check whether the front-end and back-end can speak to each-other is reliant on itself, then the dataflows might change in your application, and the test will still pass even though something is broken in your application. This is because the test is not robust enough to adapt to the changes that have actually occurred. By having a technical person working with you to assess tests and assess quality of tests, you can flag tests that really aren’t doing the job that you need them to do earlier. We normally refer to this kind of person as a “tech lead,” but as long as it is someone with some technical depth, you’re probably in good shape.
Princess Thank you, that was awesome! Thank you so much for sharing your expertise with us and the State Officers. This will be really helpful, helping them to know what to look for when they’re talking to (hopefully) a tech lead, or someone about testing in their state’s projects.
Matt Thank you. This was great.
Heather Thank you so much. I’m really excited that I was able to join you for this.

Bonus viewing: Screen readers (5m, solo)

If you’re curious and want to see a screen reader in action, this 5-minute video gives a good overview of how it’s used and issues that can crop up.

Sharing experience (30m, small group)

Meet with your small group and connect what you learned in this lesson to situations you’ve seen with your state projects. Consult the notes you took throughout the lesson and try to link them to a story that you can tell about a particular project. It’s probably useful to do some brainstorming on this before you meet with your small group to trade stories.

When you get together with your small group:

  1. Share your stories with each other.
  2. Figure out which ones are the best candidates for a case study or use case that would be helpful to share with other state officers.
  3. As a group, choose useful stories and write notes on how they link up with the concepts shared.
    • Include in your notes:
      • When did this story take place?
      • What were the events or background leading up to this story?
      • How did this story demonstrate an ideal or non-ideal situation?
      • What specific principles from the lesson does this story illustrate?
      • If the story shows an ideal, what were the conditions that made it work? How did it fit with the principles shared in the lesson?
      • If the story shows a non-ideal, what could have changed to make it better?
  4. Share these notes with the larger group when you meet.
  5. After discussion with the larger group, document these stories and their connections to the lesson to help other state officers understand how this lesson’s concepts apply to their work.

Discuss in community (1h, group)

You will need someone to volunteer to take some notes. Whomever was born after (but closest to) January 20th should be the note taker today.

  1. Check in. (5m timer) While people are arriving, check in with each-other. How is everyone doing? Take a moment to share something positive from the week, either at work or at home.
  2. Centering. (3m timer) We jump from meeting-to-meeting and there’s nothing healthy about that. You will get more from the next hour if you’re here. A simple breathing exercise (breathe in on 4, hold 4, exhale 4, hold 4) is a good way to clear your mind and body. There’s lots of resources online (4m20s) regarding simple centering exercises that you could investigate and use at the start of group conversation.
  3. Focus. (1m timer) Take one minute to identify one or two insights this conversation led you to. Make a note or two in your notebook so you can be focused when you share out.
  4. States and vendors. (30m timer) As a group, first share out which aspects of the conversation you found to be most interesting in your reflections. Then, after you share out which dimensions of vendor management inspired reflection on your professional practice, go back around and take a minute or two each (round-robin) to share why those ideas triggered insight. This should take roughly 30 minutes total, and try and create space for everyone to share out.
  5. Transformation. (15m timer) Were there themes that you saw emerge from your insights? Commonalities across projects? Identify what you saw as a group. Then (and more importantly), do you have any thoughts about your process with states, and how you might transform your process so as to improve outcomes? The note taker should try and capture the group’s thoughts regarding themes and process transformation for sharing back out to the group/community.

In the guides

This lesson is the beginning of a journey. If you're interested in learning more, there's material in the 18F Derisking Guide that you'll want to check out.

From the Federal Field Guide:

From the State Software Budgeting Handbook:

Wrapup (5m, solo)

Take a few minutes to share your reflections on this lesson.