Getting the most out of your tests

Here are some general principles to use when figuring out how to structure your tests.

Learning goals dictate the type of test

Restaurant concepts often utilize some variation of three types of tests. Concepts usually choose the right testing program based on the risk, opportunity, and how quickly you want to be in market. Just because a solution is tested once does not mean it could not do the same test again (should it fail or something significant must change).

Turbo tests

- Typically, a few stores, for a short period of time, like a weekend or week, to gain some early understanding of operational feasibility. They are often done when the change is new to the organization or more significant, and you want to get some early calibration.

Ops tests

- Typically, five to ten stores gain a more robust understanding of the impact on operations and costs for roughly four weeks. Usually, some level of customer input/engagement is done, which can also be factored into go/no-go decisions.

Market tests

- At this point, companies often look for something representative of the system. The intent is to have a level of certainty around the impact on the whole organization should the solution be brought into the market. Store count could be ten plus. Test duration is typically set to ensure that enough of the customer base sees/experiences the change to understand the impact on frequency better.

Better ensuring read accuracy

As a variety of things can influence test results, measurement tools should help diminish or control “pollution” from:

Inherent trends within the P&L
Industry / competitive changes
Seasonality

The standard structure in restaurants and retail is pre/post net of control (PPNOC). The illustrative example below demonstrates the basic structure. By incorporating both a pre-period and a control group the aforementioned issues can be mitigated. As seen below, in the absence of a pre-period and control group, the test could be viewed as generating a positive result.

	Pre-period	Test period	Net change
Test group	-1%	3%	+4%
Control group	1%	6%	+5%
Net change			-1%

– %’s are expressed in this example as change in revenue

For tests that run for longer periods, this can be represented even more powerfully with a line chart to help illustrate if a build/decline is happening with a period-over-period view.

For something to show up in this type of analysis, the type of change needs to be significant enough that it will show in the metric being evaluated. For example, if revenue is what is being evaluated, the change will need to be significant enough that it could materially move that line.

Determining groups

Test groups are ideally meaningful enough in size per type of test and are representative of the larger group/system. The control group does not receive the benefit of the system.

Determining periods

Periods are also set based upon the type of change. For example, if it’s something that anticipates customer behavior change, then the test period would need to be long enough for the customer to be able to both see the change and respond to the change. Pre-periods should ideally be close to the test period, long enough to mitigate any short-term fluctuations, and avoid any specific seasonality like holidays.

The test should be indicative of how/what would be brought into the field/market

The last test before a rollout must represent how the company intends to bring the change into the market. For example, if a company were employing a market test and planned to do a system roll with marketing support, then this needs to be included during the test as well.

Tests validate hypotheses rather than develop hypotheses

Tests are generally not to determine how best to do something. Ideally, solutions will have been thoroughly vetted for impact across key functional areas (operations, sourcing, supply chain, training, HR, etc.) before being brought into the field. While tests can be used to understand how well the proposed solution works for affected areas, the goal of a test is to assess the solution rather than use the time in the field to work out how to deploy a solution actively.

For example, decisions around workflows, where it lives in the storage and on the line, how long it’s held, etc., should all be done before any in-store test.

Sample sizes

Restaurants will flex their sample size based on the level of risk involved. Generally, concepts operate at a 95% confidence level with a 5% or lower margin of error.

These should not be viewed as hard and fast rules but rather principles to help guide initial thinking for your tests and testing.