What is the difference between creating a casual game and shipping a hit title? Certainly there is a vast array of factors—including solid design, talented development team, sufficient production budget, and a strong marketing plan—that contribute to the success of any title. But one often-overlooked key is continued user testing throughout the entire production cycle. That’s not to say that it is as simple as building the game and testing it along the way. If it were that easy, then there would be less mediocre content available and higher conversion rates across the board. The concept of user testing is incredibly easy to grasp. The practice of doing it, however—and doing it well—is another story.
User testing is not rocket science, but doing it correctly so that the testing yields useful results is hard and requires multiple testing cycles. In a sense, it is similar to the continued stages of crash testing automotive companies go through while engineering and developing new cars. Engineers continually road test and crash vehicles, analyze the data, refine designs, re-engineer, and then retest.
Testing games follows the same process: develop a concept, test, develop a design, test, build the game, test, redesign and rebuild the game, crash test, then re-engineer and retest. This process can be applied to the development of any game title, be it an educational game, a core console game, or a casual downloadable game. However, in casual development this iterative process is especially important because of the end user. People who play and purchase casual games are discerning consumers, but they aren’t always the most sophisticated online gamers. They grew up on cards, boards, and perhaps Pong paddles; they weren’t born with fourteen button control devices in hand. This crowd of end users requires additional cycles of diligent user testing to further refine, simplify, and perfectly balance UI and level design.
From its inception, PlayFirst has spent significant resources refining a formula for creating and launching hit games. We have come up with a five phase research program in an attempt to turn consumer testing into more of a science. Our methodology looks like this: Informal Usability, Formal Usability, Internal PlayDate, First Peek, and Public Beta.
We see strong ROI from our testing methodology, and Casual Connect asked us to share our experience by defining and stepping through these five phases in an effort to explain how to put the theory into practice. The take home message is: test often, test smart, re-engineer, and re-test.
Step One: Informal Usability
Testing isn’t cheap, and redesigning based on usability feedback is even more expensive. The
entire idea of iterative testing is to find issues as quickly and cheaply as possible—to make sure
the product being developed actually entertains the target market as much as the designer feels
the game should. Therefore, it is beneficial to begin performing informal testing as early in the
development cycle as possible; however you must balance the need for early feedback with the
need to make sure the game is ready to test. Testing too early may result in false negatives simply
because the game mechanic just isn’t playable. For this reason, we try to map the First Playable as
a marker for when to begin doing Informal Usability.
Objective: The goal of Informal Usability is to refine the game concept and design and then make any changes necessary to get the Pre-Alpha build ready for more formal usability testing.
Methodology: During Informal Usability we bring people from the target market in to our office to spend 30 to 60 minutes playing the First Playable build. It is best when administered by an impartial third party. We have a Marketing Brand Manager run these test sessions, and in a pinch we’ll use the game’s Designer or Producer. The preferred method is to have a game build that includes tutorials in place so a user can just sit down and begin playing with very little instruction. If the game doesn’t have necessary tutorial scaffolding, then we conduct the test sessions with minimal guidance and watch the users stumble along rather than directing them how to play. We use various San Francisco Bay Area websites to recruit users for Informal Usability. This can be done fairly easily in most metropolitan areas by posting game testing ads on various community sites. Prepare a wellcrafted qualifier document to use for selecting test candidates, and then begin interviewing candidates to narrow down the final pool of testers. We offer a small monetary payment along with free game coupons for the service of performing the usability tests.
Time Period: Informal Usability should take place over two-to-six weeks building up to Pre-Alpha. That said, it is the one test phase that can and should continue throughout all of production up through Beta. Phases of user testing can also begin prior to First Playable; however, those earlier phases would be more like paper and prototype testing and should have a different set of goals and criteria related to teasing out potential design issues. (One we’re always looking to determine as quickly as possible: “Is it fun?”)
Best Practice: Be objective, ask open questions without leading the testers, take good notes, and apply the results to make game design changes. Also, be sure to know the audience. It is great to sample a large cross-section of users, but temper results that come from users outside of the core demographic. Ideally, the testing is focused on core users with people outside of the core group supplementing the testing. For example: Does your mom like you? Then she’s not a good tester. Friends and family are great for initial prototype testing, and potentially late cycle testing, but not usability testing. Finally, make sure not to recycle testers from the pool of candidates. It’s not a good practice to be in the business of training professional testers.
The Scariest Moment I’ve Ever Had at a Usability Session: It wasn’t seeing people cry (which has happened by the way), but hearing our Creative Director look a game designer in the eye and say “the big issue you have with your game is that you don’t have a game.” After many months of design and development, one can imagine this didn’t go over very well. Incidentally, after the usability test we spent six months putting “a game” into the game, and it paid off. The game performed below average in Usability but then had a 4.28 (out of 5) ranking in First Peek with 36% of users saying they would purchase the game. That’s great!
Step Two: Formal Usability
Building from the rounds of Informal Usability
and iterating on design changes, the next phase of
user testing is to take a solid and stable Pre-Alpha
build into a formal research center to conduct
Formal Usability studies.
Objective: Identify authentic consumer experiences with the first 45 minutes of game play. The take-away is a detailed report capturing user rankings with top 10 bullet lists of what is and isn’t working in the game, along with suggested solutions for addressing what isn’t working.
Methodology: Use a professional third-party
facilitator conducting formal tests that are
recorded on DVD. Design and development
teams are on-site (or patched in via video
conference) watching the usability studies
in real-time. PlayFirst uses XEO Design (www.
xeodesign.com) in Oakland, California, for
most Formal Usability studies.
Time Period: The Usability Study is one intense day of six to eight individual one-hour study sessions. The entire Formal Usability phase takes about three weeks: Week One is the kickoff, writing the test plan and recruiting users; Week Two is the pilot test (dry run) and the Usability Study; and Week Three is Usability analysis and review.
Best Practices: All key design decision makers on the development team should be present for the Formal Usability study. This is the most critical part of Usability. Spending a day working together and watching real users play and respond to the game live and in person is invaluable. Have an open mind, and expect the unexpected. Regardless of type of game, there is always something new to discover. Typically the biggest concerns turn out to be non-issues and the features thought to be most locked down often have the biggest usability issues and require the most redesign work. The morning after the Usability, gather everyone together and debrief to thoroughly understand all of the issues before beginning to work on solutions. As a publisher, we have found that experiencing usability testing together with one or more members of the development team is a critical component of maintaining alignment through often difficult decisions surrounding goals, scope, budget and schedule.
Step Three: Internal “PlayDate”
PlayFirst has a fairly rigorous QA Alpha test
cycle requiring a game to be feature-complete
with a representation of all functionality and no
missing assets before being approved for the
Alpha milestone. Once a game hits that milestone
it is then ready for an internal testing cycle we
call PlayDate.
Objective: Identify issues and red flags with game mechanic, design, and look and feel in preparation for the First Peek release.
Methodology: Employees at PlayFirst play the first hour of the game and fill out a survey with feedback.
Time Period: It is one hour of testing, completed by various people over the course of one or two days. Occasionally, though rare, a second PlayDate is conducted later in development as a way to substantiate design changes.
Best Practice: The people working at PlayFirst come from a large and varied talent pool. We have found over time that the organization as a whole is very good at predicting sales performance of a game once it hits the market. Timing this phase of testing is key to ensure the next phase is a success and yields optimal testing data. What is most important about the PlayDate phase is to obtain tangible feedback that can be acted upon and implemented in preparation for the First Peek phase. For developers that do not have fifty or more employees, you might need to be creative in coming up with a cheap but “clean” testing pool. You could consider some combination of friends, family and your most loyal end users.
Favorite All-time Quote from an Internal PlayDate: “I was starting to enjoy the game, but the headache-inducing clanking sound was so ear-piercing I couldn’t stand to play the game longer than three minutes.” Interesting note: This quote was specific to the PlayFirst title, Mahjong Roadshow. Although we fixed this sound effect, the game never turned a corner. It performed mildly or below average at each phase and its First Peek ranking is noted below. The game unfortunately never performed once hitting the market. It’s an example of what can happen when you somewhat ignore the data telling you the game will be a miss with your target market.
Step Four: “First Peek”
Nothing is more eye-opening than reading
feedback from a thousand real users stating why
they hate a feature or why they love the game’s
audio, art, or story. Actually, the one thing that
is even more telling is seeing the real metrics
data capturing how users played the game. It is
very interesting to read survey feedback stating
one thing and then to review the metrics data
indicating the complete opposite.
Using PlayFirst’s Playground SDK as the
development architecture gives us the ability
to easily track this data. As a process, PlayFirst
has an analyst who works with the developer to
create a metrics dictionary detailing the specific
play session information we want to collect.
Then, working from the hooks within the SDK
framework, the developer is able to code the
metrics and easily build a First Peek version of
the game source.
Objective: The business models for the casual download space are ever-evolving, but the core model is still focused on the 60-minute trial. For this reason, the main object of First Peek is to finely tune the game for the 60- minute trial in preparation for the final version of the shippable game.
Methodology: A 60-minute content limited build is released to several thousand users in the PlayFirst beta community. Users fill out a survey at the end of the trial, and metrics data is collected and tabulated at the end of the First Peek phase.
Time Period: The First Peek version is made available for one week and during that period users can play the trial and submit feedback. A large bulk of feedback comes within the first few days, which then allows the development team to immediately begin analyzing data and begin considering changes.
Outcome: First Peek is the most telling phase of user testing. Two incredibly useful pieces of information are gleaned: quantitative data that shows how users actually played the game, including data points such as where users got stuck and how many click strokes were made to complete a level; and the qualitative feedback with overall exit survey rankings. The quantitative data is used for level tuning, sometimes level redesign, and game-play balancing. It is also used as a measurement of success, or failure, of specific game features. The survey rankings have great accuracy at predicting a game’s conversion rate once launched to the public. It is a one to five ranking system, with five ranking best. The data has proven that users ranking a five have a high probability of purchasing, and thus the total percentage of fives is a marker for a game’s potential performance. For example, someone may say “I love this game and I can’t wait to buy it,” but then will rank it a three. That user may download the game, but most likely will never purchase it. The char t below puts this ranking phenomenon into context by providing an inside look at how various games have performed in First Peek. Any time over 35% of people rate a game a five (out of five), it’s good. Anything above 40% is really good. On the other hand, anything below 30% isn’t great, and anything below 20% is bad.
Best Practice: At PlayFirst, First Peek is a little bit
like Groundhog Day in that depending on the
outcome it tells how far or how close a game
is to hitting a launch date. The critical business
decision is to use the data wisely to determine
how much more time and resources should
be put into a game. If a game has an average
ranking and a large number of users identify
a specific problem, then we must make a
business decision: Will the eventual rate of
conversion be sufficiently high to justify the
time and cost of “fixing” the problem?
In addition to the potential financial impact
of addressing changes after First Peek, it is
equally important to make sure a game is
ready to go into First Peek. If we release into
First Peek a game that we know has a flaw, it
means we’ll have 500 to 1,000 users spending
time telling us about that flaw. It ends up
being a partial waste of time and the data collected is less valuable. Similarly, it’s critical
to ensure that specific features we want user
feedback on are in the game and functioning
properly. It may seem obvious, but we learned
this lesson the hard way. (For instance, if you
want to get reactions to voiceover dialogue,
make sure the audio is actually audible.)
Step Five: Public Beta
After months of development, four tough
phases of user testing, and the grueling QA cycles,
the game is ready to go live on www.playfirst.com.
This is when the game developers sit back, and
when the PlayFirst producers, marketing, sales,
and PR folks really kick into gear.
Objective: Track sales, watch forum posts, read reviews, pay attention to leader boards, and prepare for Channel Launch.
Methodology: Marketing rolls out go-tomarket launch plan, PR begins building a buzz, press begins reviewing the game, and then the game launches on the PlayFirst site. Teams immediately begin tracking performance.
Time Period: Public Beta continues for the first six weeks after the game launches on PlayFirst, after which the game begins going live on partner sites.
Best Practice: Pay close attention to
performance. Watch what users are saying,
track customer service reports for any odd
issues, and be patient. There is a tendency after
a game launches to overreact to what people
are saying or to make gross assumptions from
early sales reports. There are a few occasions
when PlayFirst has made a design change to a
game after launch and then re-launched prior
releasing it to the Channel. This is only done
when the risk is low and confidence is high
that the changes will improve conversion. For
cases like these, PlayFirst games on PlayFirst.
com have an updater technology built in to
facilitate updates post-launch so that the
entire consumer base is on the same version
regardless of when they downloaded the
game.
Conclusion
So is this methodology truly a success
formula? Well, after this process kicked into gear
in the second quarter of 2007, three of the twelve
games PlayFirst published in 2007 won Zeeby
awards (Diner Dash: Hometown Hero, Chocolatier,
and Dream Chronicles), and five others were huge
financial successes. By comparison, several of
the games that launched in the beginning of
2007—games that did not go through the full
five cycles—did not perform very well. Then
2008 was another breakout year for PlayFirst and
continued to yield great success with hits such as
Dream Chronicles 2: The Eternal Maze, Cooking Dash,
Wedding Dash 2: Rings Around the World, Pet Shop
Hop, Doggie Dash, Dairy Dash, Parking Dash, and
Nightshift Legacy: The Jaguar’s Eye. We believe that
our hit rate would not have been possible without
the insights derived from extensive consumer
testing and related development iterations. That
isn’t to say that everything is done to perfection
and that there isn’t any room for improvement,
but rather that there’s extraordinary value in
robust testing.
Of course, a phased approach to user testing is by no means original to the PlayFirst publishing model. Furthermore, the practice of such an approach will not guarantee a hit game. There are games that suffer from lack of proper focus in the early phases of testing, which results in poor ratings at the later phases of testing, which in turn leads to mediocre sales performance because proper time and resources weren’t applied to making improvements. It is hard to properly conduct the phased approach to user testing, and it takes tremendous collaboration between the design, development, and publishing teams. Making games is fun, but it is painful. Most importantly, it requires humility and a lot of laughter, and a willingness to change.