The study was conducted with 3 participants from the novice user audience group representing a spectrum of user behaviors. We recommend testing with the novice audience because it would help to determine how efficient the online pizza ordering experience would be for a new customer. We wanted to adapt the redesign to fit the needs of prospective customers for a major pizza chain with none to minimal experience ordering pizza online. This would give us a better ideal of what a prospective customer watching the NFL might expect when ordering pizza for their Super Bowl party. We chose 3 participants as Nielsen claims that this is sufficient in identifying the first 85% of site errors during testing. We chose to keep the number of participants to a minimum in case we needed the time and budget to implement another round of testing before the big game.
Since its establishment, a major pizza chain, has been committed to quality. From fresh, hand-tossed dough to fresh-sliced veggies and 100% real meats, they have constantly strived to exceed expectations and to make a Better Pizza. Since their humble beginnings in Louisville, Kentucky, they have grown to become one of the largest pizza companies in the world. For 11 out of the past 13 years, America has rated them No. 1 in customer satisfaction among all national pizza chains in the ACSI.
According to the 7th annual NFL sponsor awareness survey released by Turnkey Intelligence, the pizza chain is the brand most identified by avid National Football League fans as an NFL sponsor. The stakeholders are proud of their reputation and want to carry this through to the new year. They have asked us for a redesign to provide the "best online pizza ordering experience in the world" to the NFL fans for the 2014 SuperBowl. With only 2 months left until the big game, they are eager to make any changes necessary to accommodate their loyal customers.
The purpose of the study was to evaluate the end-to-end experience of the pizza chain's site new and existing customers as they interacted with the online ordering process. Collecting this data provided the team with:
Summative evaluation is the preferred method when the "train has already left the station". the major pizza chain was well past the point of preliminary design ideas where formative evaluation could have been valuable for designing the "best online pizza ordering experience in the world". At this phase, it was ideally too late to make any meaningful design changes, but at the point where real tasks could be integrated into testing with the site already having gone live.
The only changes that should even be considered in this phase would be any major usability issues with the current product, which in our case is the online ordering service on the pizza chain's website. The only way to identify issues with the user experience was to test participants to see if they were struggling through the basic tasks of ordering a pizza. The test urged emphasis on why the implemented tasks were not effective.
The formative evaluation was eliminated because it would have focused on the skeleton of the site, which had already been developed, rather than concentrating on the meat and the flesh, or what already exists. Summative evaluation was chosen because it is much more cost effective and efficient as the site has only 2 months, November until the SuperBowl, to be radically changed. The chosen method gets right to the point and allots both time and budget in case another round of testing is absolutely necessary to provide the optimal redesign.
Summative evaluation was also the better choice as the developers needed a greater insight at this point about how the customers were currently carrying out the task of finding and ordering a pizza. The redesign task was very simple and straightforward - to improve the ordering experience. The central usability measures effectiveness, efficiency and user satisfaction - thus making summative the ideal candidate for identifying difficulties in operation.
The summative evaluation provided quantitative results that demonstrated substantial support for any changes crucial to the business. The objective results were much more beneficial for providing concrete results such as metrics, audience insights and actionable improvements to guide the redesign process.
The study collected both qualitative and quantitative data to answer several research questions, including:
The team's mission was to research users in order to optimize the online customer experience when ordering pizza from the major pizza chain. With a small span of time remaining before the Super Bowl, testing 3 participants using the summative evaluation method was the optimal path to pursue. It provided the most efficient usability testing method for gaining valuable insights on the effectiveness of the site.
All participants were computer literate users who liked pizza with novice to minimal experience ordering food online. Participant #1 and Participant #2 had no prior experience ordering pizza online. Whereas, Participant #3 had experienced ordering pizza online at least once.
Three participants were scheduled over two testing dates. All three participants completed the test. Two participants were involved in testing on April 21st and one participant on April 24th. Of the three participants, one was male and two were female. Additionally, two out of the three participants used the Flash version of the site.
A screener questionnaire was developed by the team to filter through the volunteers and to identify the ideal candidates for the study.
|Do you like pizza?||Yes/No||Continue/Terminate|
|Are you available April 21st through 25th during business hours?||Yes/No||Continue/Terminate|
|Would you be able to commute to Kent State University Usability Lab?||Yes/No||Continue/Terminate|
|Have you ever ordered anything online?||Yes/No||--|
|Have you ever ordered food online?||Yes/No||Continue/Terminate|
|How often do you order pizza?||Weekly/Monthly/Yearly||--|
|How do you currently order pizza?||Phone/Online||--|
|If online||How many times have you ordered pizza online?||Once - Continue
2-5 times - Terminate
5 or more - Terminate
|If not online||Would you ever consider ordering pizza online?||Yes - Continue
No - Terminate
|Approximately, how much time do you spend on the internet each week?||0-5 hours
|Of those hours, how much time do you spend for work versus leisure?||--|
The screener questionnaire (Table 1) was developed by the team with the novice user taken into account.
The initial question, "Do you like pizza?" was derived from Jared Spool's required qualification "motivation" in mind. Someone who doesn't like pizza should be disqualified immediately because they will have no interest in the goal of the study leading to invalid results. If they don't like pizza, when would they ever be ordering it?
Our next requirements were time and location. It is necessary for the participant to be available during the testing hours as well as be able to travel to the testing facilities.
The question, "Have you ever ordered anything online?", provides insight on how experienced the potential participant is with the internet and leads to the follow up classifier question "have you ever ordered food online?" If they've ordered food online, they're experienced enough to use the major pizza chain's site, but not necessarily an expert behind the computer.
If they have experience ordering food online, the next questions are classifiers determining how many times they have ordered pizza online or if they have ever even ordered pizza online. The team has targeted novice users. Novice users are defined by participants who have ordered food online but not pizza and those who have ordered pizza online but only once.
Finally, the last questions give us more of an ideal who likely would be using the internet to order pizza in the future thus determining their level of motivation. These final classifiers give us an ideal who is more likely to need to order pizza online in the future and to narrow it down to the final 5 users if we have more volunteers than we had anticipated.
The participants most fitting the criteria of the screener questionnaire were chosen to participate in the testing session which required the completion of several basic pizza ordering tasks on the pizza chain's site.
The team developed the tasks with novice to minimal users ordering pizza for a party. Test participants would have to walk through the site just as if they would if they were ordering pizza for a Super Bowl party. These basic steps provided an insight of the existing site issues.
To begin the ordering process, any visitor is required to enter a delivery address to get started. Ideally, most customers would prefer to sign up for coupons and deals in anticipation of savings and potential future orders. Whether it would be an elementary school's party or a NFL Super Bowl party, the group ordering guidelines would need to be applied to determine approximately how much food was needed to feed a large group. Plus, planning a party for a might have made the customer want to schedule in advance in preparation.
The next obvious step in the process would be to order the desired products - whether it be a pizza, pop or a side. Everyone on the pizza chain's site visits the site to place an order for pizza. Once the order was completed, the user would have to proceed to checkout and place the order. Entering special instructions for the delivery man is an optional site feature, but could come in handy when ordering within an apartment complex or a party center.
Whether the party was a hit or a bummer, the user could want to positive remarks for an amazing experience or to even provide feedback communicating to the pizza chain how they could improve upon their site.
The initial scenario and tasks created by the team was as follows:
Scenario: You are the room parent for your child's 5th grade class. The kids are having their promotion party next Thursday to celebrate their transition to middle school and you have been asked to order pizza for the party. To do this, please complete the following tasks:
Due to further discussion with stakeholders, the team changed the study tasks to fit the CEO and CTO's preferences and to better align with the goals of the redesign. Also, the team figured 3 tasks were adequate enough to answer all the most essential questions as well as to fit the testing time restraints. The initial tasks and the preferred tasks were meshed together.
At the core of participants' explorations of the major pizza chain's website were a series of "typical" tasks that party planners were likely to attempt while using the site. These tasks were intended to represent key site functions and features, and to facilitate participants' exploration of a range of the pizza chain's website information mechanisms. The tasks finally decided upon and involved in the study are as follows:
We are going to be looking at majorpizza.com web site. Even if you are not a majorpizza fan, imagine that the people you are with are fans of majorpizza and that is where you will be ordering the pizza. There are bunch of people at this party and you need to order 3 pizzas.
Tell me a little more about your experience ordering pizza from this site. Is it what you expected?
You want to sign up for deals and coupons but you don't want to register. You just want to give them your email. Can you do this and what do you think you will receive by email?
Your pizza arrives and it is terrible! Your driver was rude and you are really upset about what just happened. You call the local store but get nowhere. You need to contact the corporate office, how would you do that?
Overall the major pizza chain's website was fairly received by the study participants. They described the experience as both "not intuitive (Participant #2)" and "easy once you get the hang of it (Participant #3)" and stated that they "liked it for the most part, but that finding the specialty pizzas was disappointing (Participant #1). Participants uniformly stated that they might consider using it again in their personal life.
Let's begin with what the major pizza chain has done successfully within the ordering process design until this point. All three participants instantly knew to use "Order Now". Next, they set their address for their order successfully without any issues. Then, proceeded to choose "Build Your Own" or "Add and Customize" from the "Order Now" page. Choosing ingredients and adding to cart caused no issues, although Participant #1 expected that the site remain on the "Create Your Own" page after adding her 1st pizza to the order. It didn't align with her expectation that she would be able to just start adding another pizza directly from that page versus returning to the main "Order Now" page to begin the process again. Additionally, Participant #3 used non-flashed and seemed at first glance not to find the mushrooms as a topping because they were listed as "Baby Portabellos" versus mushroom and had no picture to ease the identification.
Despite, the slight bumps in the findings, no significant user frustration eliminated the amazement of the building pizza feature. Participant #1 explained with excitement, "Wow, this is kinda awesome" within the pizza building stage. While, Participant #2 was impressed by the advanced pizza customization options which she had never seen on any other site.
However, despite the fair to good initial performance of the online ordering process, the team observed that the majority of the site users struggled to match the Web site's cognitive model with their own, to interpret nomenclature, and to locate information later in the process with specialty pizza ordering and the more uncommon tasks.
For instance, two out of three users* found it especially difficult to find a specialty pizza as demonstrated by the skyrocketing of clicks, faces of confusion (Figure 1) and their statements of frustration.
Participant #2 actually demonstrated agitation as his session logged out in the midst of ordering because he could not find the specialty pizzas and took several minutes of searching with no success.
*I would discount Participant #3 ease of finding the specialty pizzas as she chose the major pizza chain of the study in the icebreaker as a pizza place of choice inferring previous ordering experience with the pizza company.
Despite Participant #2's obvious frustration and Participant #3's prior experience ordering from the major pizza chain, both users were drawn towards "Special Offers" (Figure 1) within the main navigation bar when the task of ordering a specialty pizza arose. Specialty pizzas need to be clearly categorized and labeled or they will continue to provide confusion to the users, who might end up getting frustrated enough to leave the site if visiting the site to use their coupon on a specialty pizza with no success in ever reaching their goal.
Another implicit observation was the avoidance of junk mail when the participant was directed to sign up for email deals and coupons. The first thing out of Participant #1 and Participant #2's mouths were their perceived assumption that signing up would automatically lead to loads of spam. Finding "Email and Text Deals" was pretty straightforward as demonstrated through Participant #1's reaction - "'Email and Text Deals' did make it quite clear".
Fortunately, most of the nomenclature, navigation, and cognitive model difficulties that were observed from the study participants encounter can be easily addressed.
While participants often struggled with the navigation and information architecture throughout the Web site, their problems were more frequent and prohibitive for tasks involving specialty pizzas and the fear of spam.
The study observations led the team to question:
Following are a series of articles written for my usability project.
Usability defines how easy the interface of a website is to use, or how quickly and easily it is for a user to accomplish their tasks on the site. Whereas, usability testing is a subjective study of how easily the interface is used and how easily tasks can be performed. Put into perspective, the goals of usability testing is to eliminate any ambiguities, errors and user frustration by detecting these issues before the launch of the product, the company's website.
A usability study is designed to accomplish these goals in a conducted observation of the user's behaviors and patterns of use on the website. Where traditional academic research requires a long study with many participants over time, usability testing can be done with several short studies and only a few participants over the course of the site's design and development stage.
As opposed to academic research studies, a representational sample of the population is not required. According to Jakob Nielsen, only 3-9 random participants (3 participants to find 65% of problems, 5 to detect 80% and 9 to detect 95%) can achieve the same statistically significant results as a large representational sample. The first few participants are enough to detect the majority of the site's errors and issues without investing in a larger, more costly and timely study.
With just a handful of participants, two computers and a video camera in a single day are adequate for a significant usability study. Any additional participants have yet to provide any substantial difference in results.
For a few hundred dollars versus thousands, you save hundreds and gain thousands in insight from practical usability research.
Formative evaluation helps to "form" the design for a product. It involves iteratively evaluating a product throughout the development process. The goal of formative testing is to detect and eliminate usability problems. The main purpose is "to immediately improve the design".
This comes especially handy in the development if you have the resources (developers, time, budget) to make changes as you go or if there are questions holding up the process needing addressed before the team can move forward. Formative evaluation actually saves time and money by fixing problems before they become too exhaustive problems. It's better to nip a problem in the bud before it grows out of hand.
Methods that can be used for Formative Evaluation:
Summative evaluation, on the other hand, starts when the design of the product is near completion. Users judge the design against quantitative goals or competitive products. The purpose of summative evaluation obviously isn't to immediately improve the design, but "to evaluate versus diagnose" the design of the product.
Summative evaluation proves useful for establishing a baseline study on an already existing website, or to evaluate the usability of a competitors site to use as a comparison. It can be beneficial for a working prototype which has already been fleshed out, but requires testing. The summative approach also provides valuable insight for the current state of overall usability on any website.
The same methods of testing can be used for summative evaluation as are used for formative.
For a better understanding of the test's purposes and benefits and to reinforce the iterative design process, the product development lifecycle phases are aligned with the stages of evaluation:
The beginning of the Exploratory Lifecycle Phase of prototyping, design and testing ideally aligns with the Formative Evaluation process.
Whereas, the Summative Evaluation process is best implemented during the end of the Exploratory Lifecycle Phase.
A new form of evaluation, Verification, is introduced in the Validation Lifecycle Phase, which comes to life during final testing and the product launch stage.
I'm going to have to really practice to follow the Golden Rules of Moderating Usability Testing. I have ways to go. It's quite frustrating to read something and think "no problem, this is just common sense", but when it comes to actually doing it yourself, you're like "wow, I really had no idea how hard it would be".
After several attempts of administering a usability test, I am further convinced patience is not my virtue. After listening to myself, I'm not sure if I provided too many assists as a result of my impatience. I know it didn't feel right throughout the process - I couldn't figure out if I was crossing the line between leading and enabling the user.
Another obstacle to overcome, Sagar didn't necessarily speak out loud as I would've liked and I could sense his nervousness. It's almost as I was feeding of it and I wasn't sure how to provide comfort and put him at ease. I made several attempts to ask questions to motivate him to speak out loud, but I realize I need to stay focused on what questions or comments I make to nudge the user to be more vocal.
It was pretty fascinating to say the least. It really opened my eyes to how not every one thinks exactly like me. There were times when Sagar went to click and it definitely wasn't where I would've started. 90% of life is perspective. The usability testing was a very eye opening experience. I've just always assumed everyone finds thing online as easily as I do and that they look under the same categories and take the same approaches as I do. To read and talk about Usability Testing is one thing, but to actually conduct a session yourself is completely enlightening.
For a usability briefing, the number of pages viewed could be the most valuable quantitative measurement to test the design of the site for what users are seeking and what they are actually find when they visit the site. It provides the big picture of how much of the site is actually being viewed. The measurement for the usability evaluation could go 2 ways.
It can point the study in the right direction when determining what pages the users are actually viewing. The number of pageviews provides an ideal of for what information visitors are seeking. The data can provide insight into the restructuring of the site's content. For example, if the site is crowded - it can help to determine which categories can be combined and into which specific pages. A site would be more user-friendly if one page displayed the information versus 5, in turn, reducing the number of clicks and preventing user frustration. The goal is to not to make the user think and to point them in the right direction of what they want to find.
The data might be skewed as the users aren't viewing the pages because they want that particular information, but they're viewing them in attempt to find what they came to the site. This could easily be detected by monitoring the time spent on each page visited. If the time spent on the site is minimal compared to the number of page views - the measurement might even be eliminated from the equation for the study. For instance, if the majority of the users visit 5 pages and bounce after 1 minute, then the pageviews wouldn't even provide any benefit for the case.
From selecting the number of participants to the evaluation method, our project pieced together perfectly. Here's a recap:
Scenario: Stakeholders want to optimize the online pizza ordering experience for a medium sized pizza company before the Superbowl. They want to provide their customers with the "best online pizza ordering experience in the world". With limited time to complete the redesign project, our team planned as follows:
Choose Evaluation Method. The project team chose summative evaluation as "the train had already left the station". The pizza company's website had already been rolled out into development and it was past the point of opinions on the structural elements. If changes are necessary this late in the game, the team will have the alloted time and budget for another round of testing.
Decide on an Effective Number of Participants. The team chose 3 participants according to Nielsen's research. The first 3 participants will identify the 85% of the major mistakes whereas the missing 15% of errors don't effect the general population. Plus, choosing only 3 participants allowed us room for another round of testing. Design a Screener Questionnaire to Select Participants. Our focus was to choose novice users who have never or have ordered at least once a pizza online. The idea driving this target group was the perspective of potential customers.
Develop Tasks to Represent Real Life. We developed a scenario to test how a user would approach ordering pizza for a party as the real situation will be with the NFL Super Bowl parties.
Testing Sessions Conducted. The testing was performed one-on-one, moderator-to-user, being recorded with Screencast.
Analysis of Results Performed. The vast amount of data has to be disseminated into valuable information to report to the stakeholders.
Report Summarizing Results was Written. The information was written up to inform the stakeholders of the plan of action the team took as well as to identify any major mistakes that need addressed before the big game.