A Sense of Vision

A Sense of Vision

The Development of Copy Testing Expertise

Extraordinary Effectiveness: a chance encounter between Pittsburgh’s All-Pro lineman, Mean Joe Greene, and a young boy who idolized him. August, 1979.


During the 1920’s and 1930’s, TV was no more than a laboratory curiosity; 40 years later, it was the dominant media form in the world. By the turn of the millennium, U.S. advertisers spent $120 billion in measured media and television accounted for almost half (47% or $56.8 billion) – more than twice second-place newspapers ($25.7 billion), and almost three times that spent in consumer magazines ($19.9 billion). During much of this time, G&R has been a leading research partner in helping companies learn to harness this powerful force. Over the years, G&R has innovated many of the research techniques that are now considered standards in the industry. We have tested more than 60,000 commercials and are now credited with having helped shape the face of modern advertising messages. How to apply television’s vast potential to particular marketing messages came by degrees, the highlights of which are the topic of this brief review.


  1. Early Days
  2. Research Gets Serious
  3. Pre-Testing Becomes Practice
  4. The Physiological Dimension
  5. Proof is Delivered
  6. Advanced Testing Solutions
  7. Recap

Early Days

Back in the 1950’s, when television was becoming the leading mass entertainment and marketing force, when shows were sponsored by single advertisers like

  • Milton Berle and Texaco
  • Danny Thomas and General Foods
  • And the GE Theatre with Ronald Reagan,
Mirror of America

G&R was already well established, having developed the first commercial copy testing system in print. Recognizing TV’s vast potential along with the advertisers’ need to know how viewers were receiving and reacting to their television commercials, G&R pioneered television copy testing as well.

Central-location, forced-exposed testing was the standard technique then used with G&R’s “Mirror of America” being the first permanent and fully equipped interviewing center in America for doing such work. Located just outside of Princeton, NJ, it had an auditorium for 200 people, as well as complete facilities for group and individual interviewing. It was a facility where both advertising and research ideas could be born and examined.

Piel's Beer Story Board

The First TPT Tests: Pitchmen Bert & Harry for Piel’s Beer. September, 1963

Research Gets Serious

In the early 1960’s, as sole-sponsorship programming gave way to multiple sponsors and individual commercials, G&R introduced the first real-world copy testing service, called Total Prime Time (TPT). For its time, TPT was a research tour-de-force. Twenty-eight times a year – 4 times for each night of the week – G&R interviewed 1500 men and women and measured every commercial that appeared on the 3 networks during prime time.

As George Gallup explained:

TPT not only let us measure commercial effectiveness for both client and competitor, but it allowed us to determine the external effect of the medium and led to the first sales validation of the system where we found that change in commercial recall and change in commercial persuasion significantly correlated with change in sales, and that the two together were better able to predict change in sales than either alone.

TPT became a knowledge mother lode. For the first time, companies were able to establish the wide range of performance the commercials achieved. TPT showed that on a typical night the average difference between the most memorable commercial and least memorable in any product category was 6.4 to 1. Advertisers were able to empirically establish which commercials were stronger than others based on the quality of their content. The large number of commercials being tested allowed the factors most responsible for driving performance to be isolated and quantified.

TPT also introduced masses-of-data analysis to quantify the effect of the television medium on advertising performance. Companies learned from TPT research that a commercial’s performance was influenced more by the time within any night at which it ran than by the different nights of the week on which it aired. Commercials that aired earlier in the night performed better than those that aired later. Commercials in regular programs outscored commercials that ran in movies.

Pre-Testing Becomes Practice

As the marketing value of television messages began to be recognized and the costs of airing and producing commercials increased, more attention was paid to researching commercials before they aired and before finished versions of them were produced. As a result, in the early 1970’s G&R introduced In-View, which was the first real-world system to use invited viewing, which has become the recruitment standard for most on-air copy testing. The test commercials were still exposed on-air and at-home, but people were recruited to watch individual programs rather than being interviewed after they had stumbled across them as a part of their normal viewing habits. The benefit of invited viewing is that viewers could be found more economically but still be naive as to the purpose of the research. Additionally, advertisers could make economic spot buys rather than expensive national buys when they wanted to test commercials. A few years later, G&R used the advantages of In-View to become the first research company to test rough commercials on air. As a result, many advertisers expanded their testing activities.

Case History: AT&T

Version 1

AT&T SurpriseClick to Zoom

Version 2

AT&T Surprise - 2Click to Zoom

Performance Comparison

AT&T Case Study ChartClick to Zoom

AT&T produced a rough-cut storyboard presenting their phones as a great gift idea.

  • It opened with a couple showing what they gave Dad (George) before and discussing what to give Dad now;
  • The solution was a new AT&T phone, which was shown;
  • It ended with Dad receiving the phone.

The spot was tested in In-View and problems were detected. While Recall was above average and communication satisfactory, Persuasion was below norm and quite low.
Playback by respondents showed they were confused with both the opening gift discussion and the transition from the ‘before’ beginning to the ‘after’ ending, that many respondents did not like the phone model selected, and there were many comments that Dad (George) did not seem particularly overjoyed with the gift.

Changes were made to the commercial…

  • The story line was simplified.
  • Several phone selections were shown.
  • George’s (Dad’s) reaction was made significantly more enthusiastic.

The revised spot was retested in In-View and found to have solid improvements.

Recall was maintained, Communication was richer, Persuasion doubled and negative comments were all but eliminated.

The Physiological Dimension

One of the methodological problems inherent in survey research is that it is based on self-reported responses. Among other things, this form of data gathering assumes that the respondent is being honest, is in touch with what he or she is thinking or feeling, can remember things accurately, and is not being influenced by the social setting of the interviewer/respondent relationship.

In recognition of this, efforts have been applied for many years to finding a physiological measure of commercial effect that is outside of the respondent’s control. Work has been done in such areas as

  • Pupil dilation
  • Skin resistance
  • Heart rate
  • Voice stress

Interest in this area was not new at G&R. As far back as 1939, George Gallup bought the improbable-sounding Ruckmick Afectometh to experiment with galvanic skin response. By and large, however, such measures have been shown to deal primarily with arousal and to be wanting when applied to understanding advertising effectiveness.

During the late 1970’s rapid advances were made in brain research, particularly in the area of affective response where researchers now believed they could separate positive responses from negative. On the strength of this, G&R teamed with the Rutgers School of Medicine to see whether or not brain research technology could be used to predict performance patterns of different commercials, as well as what differences might occur from multiple exposure.

The research was based on previously tested commercials covering a wide range of products. Each commercial was part of a pair that had widely divergent high-low scoring performance. The commercials were embedded in a new hour-long television program which respondents saw individually as their brain waves were continually monitored every two seconds.

The findings were novel and innovative and led to some interesting conclusions.

  • the brain wave response was always higher on second viewing than on first and
  • the data clearly distinguished between program content and commercials.

But, as has since been replicated by others, analysis of the obtained brain wave patterns was disappointing for a copy testing perspective. They were not able to distinguish between high and low commercials on recall and showed some mild distinctions between hi/lo commercials in terms of persuasion, but only within secondary measures.

Study Details: G&R Brain Waves Study

Brain Waves

BrainwavesClick to Zoom

Brain Waves were taken at four sites in the brain:

  • Left and right frontal areas to pick up Alpha waves
  • and left and right temporal which picked up Alpha, Beta and all other waves.

Alpha waves generally dominate brain activity for a wide-awake person who is not paying attention to anything in particular. As the nervous system becomes aroused, the Alpha waves are diminished and replaced by Beta waves. At the time of this analysis, either a diminution of Alpha or an increase in Beta can be used to measure the process of the nervous system becoming activated.
While the inability of brainwaves to differentiate between strong and weak commercials was disappointing from a copy testing perspective, the study added credence to other observations about how commercials perform, including Herb Krugman’s observation that on first viewing, the viewer thinks, “What is this?” and on the second viewing, “What of it?” Under multiple exposure conditions,

  1. When a high commercial was exposed twice vs. the low exposed once, both recall and persuasion were higher with the higher scoring commercial.
  2. And when the exposure pattern was reversed, the lower commercial had stronger recall and persuasion measures.

The study also provided further credence to the copy testing results when respondents who were re-contacted three days later and subjected to the standard G&R Impact questioning, gave responses that correlated highly with previously obtained results on both recall and persuasion.

Proof is Delivered

From the beginning, G&R has conducted a variety of independent and client-supported validation studies to demonstrate that its measures of real world consequence are able to discriminate among commercials that produce more or less sales.

During the 1950’s, the Advertising Research Foundation sponsored a study that compared G&R’s recall-based technique to Starch’s recognition-based technique, the practical consequence of which was the abandonment of recognition as a single measure evaluative copy testing technique.

During the 1960’s, G&R conducted the first large-scale validation study in television copy testing. By tracking buying habits using pantry audits and linking them to copy quality measurement from the TPT system, G&R empirically demonstrated the validity of recall and of persuasion for the first time.

During the 1970’s, testing techniques remained much as they had been, but with increased use of customized design, through added diagnostics, target group testing, and more advanced statistical techniques.

But it wasn’t until the 1980’s, that the first independent study was conducted to determine the sales predictive validity of the various techniques and measures used in copy research. In 1982, the ARF initiated the Copy Research Validity Project. G&R was the only research company to sponsor the project, as were 29 Leading Advertisers, 11 Major Ad Agencies and the 4-A’s.

The final design involved…

  • 6 copy testing methods
  • 5 pairs of commercials with a known winner and loser in each pair based on actual in-market sales performance
  • 30 cells with 400 – 500 interviews in each cell
  • 12,000 – 15,000 total interviews

The results were both informative and surprising. Liking was found to be the most predictive measure, closely followed by recall. Perhaps more importantly, combinations of measures were found to be more predictive than any one measure alone.

The most powerful combination was Liking and Recall, which had a predictive index of 466, predicting 14 of the 15 possible results correctly.

The study showed that (1) advertising works by producing measurable incremental sales results, (2) that content matters, since advertising that differs only as to content (same spending and media plan) produces different sales results; and (3) copy testing works, as copy testing measures discriminate between stronger and weaker business-building commercials.

The results of the study need to be viewed with some caution because of the homogeneity of the commercials upon which they are based (major established consumer packaged goods). But the scope of the project, its being anchored by sales data, and its independence, make its conclusions valuable and important.

Advanced Testing Solutions

As knowledge and technologies advance, so do opportunities to improve testing methods, and G&R has responded with new services to fit the various testing needs. In addition to In-View, which is still used by advertisers wishing to get the most real-world measure of a commercial’s performance, G&R offers three core television designs.


InTeleTest was the first copy testing service to utilize the inherent freedoms of personal recorders, including better sampling, more controlled exposure environment, and enhanced diagnostic batteries. In its unique design…

  • Commercials are imbedded in a never-before-seen TV Pilot Program, just as they would appear on network TV.
  • Respondents are given the cassette or DVD and are asked to view the new program at home, in their normal manner, at their convenience.
  • The day after viewing, respondents are re-contacted and questioned about their reaction to the new program and about the commercials.
  • Questions include evaluative measures as well as full-sample diagnostics and other measures validated in the ARF study.

InTeleTest introduced five major improvements to evaluative copy testing.

  1. Sampling was improved. Advertisers were no longer geographically confined. InTeleTest went from 2 or 3 cities to 10 or more cities according to client needs.
  2. Context was better controlled. Both the programming and advertising material were tightly specified, unlike previous systems where the commercials were tested in whatever shows and in whatever pod positions that happened to be available.
  3. Costs were reduced. The research no longer required the purchase of airtime to test copy.
  4. Better information. New questions – both evaluative and diagnostic – were added.
  5. Full sample diagnostics became the norm. For the first time, re-exposure became practical in a real-world setting to give total sample supplemental support.

Case History: Lean Cuisine

Here’s an example of how InTeleTest uncovers and pinpoints a number of unexpected negative reactions to a Lean Cuisine commercial.

The commercial message was “More satisfying” – the answer to a problem inherently associated with diet foods. It featured a svelte woman in a bathing suit being ogled by a group of young men as she walked by. There were also shots of the package and the finished meal.

Performance Metrics

Lean Chisine Case Study charts
Click to Zoom

Even though the commercial opened on the box to help establish brand identity, its fast scene cuts, weak audio-visual sync, and the woman who upstaged the product produced a lower than average recall score.

As a result, only 5% of respondents identified “Satisfying” as the main point, and 25% noticed the portion size featured in the ad.

Additionally, the spot scored well below norms on both persuasion and ad liking.

Further diagnostics revealed that the commercial was significantly below average on positive attributes (such as “Told me something new,”) and 3 times higher on negative attributes (such as “In poor taste.”)

When rated against an adjective checklist it was significantly below average on: Believable; Worth remembering; Convincing; True-to-life; Warm; Sensitive; Amusing; Imaginative. The commercial was above average on: Fast-moving; Lively; Phony; See too much; Irritating; Silly. Overall, the results were surprising… disappointing… consistent… and actionable.


One of the more interesting and significant developments in cognitive communications research theory, is the Elaboration Likelihood Model (ELM) of Petty and Cacioppo. The crux of the ELM theory is that communications work not by what they tell us, but by the thoughts that what they generate in us. These thoughts – the arguments and counter arguments – are what influence our underlying belief structures and dispose us to act. In ELM, there are two overlapping paths to persuasion. On the Central path, we think about the brand and on the Peripheral path we think about the advertising.

Of course, the idea that thinking is important is not a new concept in copy testing. G&R has used open-ended Idea Communication as a central part of its Impact system since its inception in the early 1950’s. The Leo Burnett ad agency did quite a bit of innovative work with it via their Viewer Response Profile in the early 1970’s. What ELM added is a sound conceptual foundation, which has strengthened the acceptance of this measurement dimension in copy research.

In the 1990’s, G&R decided to expand on its expertise in coding and analysis of open-ended responses by developing FasTrac, a forced-exposure, intercept system that uses in-person interviewing for data gathering. The system offers five main distinguishing features.

  1. The respondent is exposed to the actual stimulus.
  2. A unique battery of open-ended questions leads to a rich array of Thought Listings for ELM-type analysis.
  3. It includes all of the evaluative measures that were found to be most predictive of sales by the ARF’s Copy Research Validity Project.
  4. It is fast.
  5. It is relatively inexpensive.

FasTrac is geared for today’s demanding pre-testing standards and schedules when there is the need of face-to-face interviews and deep probing. Suitable for rough or finished ads or commercials, as well as other forms of audio/visual stimulus, FasTrac provides a full range of in-depth measures of communication and reactions, to enable actionable and timely decision-making by the advertiser and its agency.


Most recently, G&R has designed a suite of Internet-based copy research solutions that blend its sales-validated measures of communications effectiveness with the logistical advantages of today’s online technologies. Respondents are exposed to stimulus digitally and interviewed via online, self-administered questionnaires which can be accessed using their computer, tablet, or smart phone at their convenience. Non-directed thought listings and multiple closed- and open-ended questions provide a rich battery of evaluative and diagnostic measures for deeper insights. Advanced proprietary software enables stimuli in all media to be presented securely across multiple platforms. Unique design features enable us to investigate the latest constructs in communication effectiveness, including priming, framing, advanced persuasion, and transportation.

The result is a comprehensive assessment of an ad’s ability to communicate its intended messages that is also fast and cost effective. WebCheck’s customizing options, such as target sampling, special diagnostics, clutter reel intrusiveness, strong normative benchmarks, and the application of advanced modeling techniques, precisely and actionably reveal the strengths and weaknesses of the test advertising.


AAAA – ARF Emotions in Advertising Project

CERA Panogram - Budweiser CERA measurements of positive and negative responses to Budweiser’s “Wassup” commercial.

Since its inception, G&R experimented with various methods for measuring emotion-based advertising response. More recently, G&R has been working with a new data-driven, physiological technique. After a four year development phase, the result of this work is CERA (Continuous Emotional Response Analysis), a pioneering, multi-modal measurement system that uses facial EMG and paper and pencil measures to assess both the emotional and cognitive responses to media messages.

Traditional verbal and paper and pencil copy measures have been criticized as limited in their ability to assess the effectiveness of commercials that use emotion to achieve their advertising goals. Emotion is not primarily a language-based experience, and cognitive effort is required to put experience into words. Also, these verbal measures are retrospective in that respondents have to think back to remember what they felt, and as well the reporting is susceptible to social demand influences. Respondents may not be able or willing to put into words their complete and accurate emotional response to an ad. Similarly, knob turning and picture/symbol sorting approaches are also criticized for putting a cognitive filter on emotional response measurement. And early physiological tools have usually turned out to be limited measures of arousal rather than valence.

The CERA methodology consists of obtaining the facial EMG activity measures during an uninterrupted viewing of a television program clip that includes pods of test commercials. Open-ended and closed-ended questions get at the degree of branding that takes place, the ideas it communicates, and people’s reaction to specific content and executional elements, all of which can be compared against norms as in typical advertising research. FEMG is considered the gold standard for measuring emotional valence, the split-second, pre-cognitive “decision” we make about how we feel about a stimulus, including a brand.

CERA has raised the copy-testing bar for emotions-based measurement by providing precise, continuous quantitative measures and qualitatively rich diagnostics about a commercial’s effectiveness and emotional impact. For many advertisers, this may become one of their next communication imperatives.


G&R developed the first copy testing systems in both print and TV. Over the years, it has innovated many of the other research techniques that are now considered standard in the industry and are the basis of many competitive systems. With an unmatched experience base of 50 years and 60,000 tests, G&R offers one of the largest knowledge bases about advertising effectiveness in the world. We are committed to providing our clients with better tools and new understanding for a better sense of vision to help them in today’s very competitive marketplace.