Why people hate stats, but businesses love stats

The declining authority of statistics – and the experts who analyse them – is at the heart of the crisis that has become known as “post-truth” politics. And in this uncertain new world, attitudes towards quantitative expertise have become increasingly divided. From one perspective, grounding politics in statistics is elitist, undemocratic and oblivious to people’s emotional investments in their community and nation. It is just one more way that privileged people in London, Washington DC or Brussels seek to impose their worldview on everybody else. From the opposite perspective, statistics are quite the opposite of elitist. They enable journalists, citizens and politicians to discuss society as a whole, not on the basis of anecdote, sentiment or prejudice, but in ways that can be validated. The alternative to quantitative expertise is less likely to be democracy than an unleashing of tabloid editors and demagogues to provide their own “truth” of what is going on across society.

From  story in The Guardian.

I find it interesting that there is a rejection of statistics in the general population, but an almost over-reliance among businesses managers.

Managers are often paralyzed in situations bereft of data — they are unable to make decisions unless there is some data to support it. I do think there is a human tendency to construe data to support their cause. It’s kind of a pseudo-intellectualism.

Contrast this with the general population, in which it’s easier to operate simply rejecting statistics outright. Though, to people’s credit, important data are often tied to confusing topics that simply aren’t worth understanding. Once you hear pundits discussing the merits of measures that have been accepted as gospel, it’s easy just to tune it out… “Why should I bother learning the nuances of the unemployment rate? Especially now that the methodology is becoming a partisan issue.”

Either way, I think anyone who works with data can do better to tell stories.

In many ways, the contemporary populist attack on “experts” is born out of the same resentment as the attack on elected representatives. In talking of society as a whole, in seeking to govern the economy as a whole, both politicians and technocrats are believed to have “lost touch” with how it feels to be a single citizen in particular. Both statisticians and politicians have fallen into the trap of “seeing like a state”, to use a phrase from the anarchist political thinker James C Scott. Speaking scientifically about the nation – for instance in terms of macroeconomics – is an insult to those who would prefer to rely on memory and narrative for their sense of nationhood, and are sick of being told that their “imagined community” does not exist.

Advertisement

A Case Against The 95% Confidence Level

Putting aside the controversy over misinterpreted p-values, I think it’s worth at least thinking about how the corporate market research world thinks about the use of statistical significance in general.

Most of us took a stats class in college and for some reason or another the 95% confidence level was the first CI we were presented with. I distinctly remember my econometrics professor telling us that the 95% level was completely arbitrary, and later on we would learn that statistical significance does not imply economic significance. And more confusing yet, that something could be economically significant without being statistically significant.

Basically, I’ve learned to stop paying attention to p-values in my world.

So, what happens when a client is testing two ads for a media buy?

For simplicity, say wee are testing which ad — A or B — better incites respondents to purchase.

We set up a null hypothesis:

H0: P(A) = P(B), where P() is a function whose output is the proportion of respondents planning to purchase the product.

Alternate hypothesis:

Ha: P(A) > P(B) or P(A) < P(B)

We collect the data, calculate the p-value to be something less than p>0.05.

Ok, great. So now what? The two ads are equal in their ability to incite potential customers? Is that really useful?

At the end of the day, all we can do is compare point estimates. If P(A) = 25% and P(B) = 24%, then, well, quite frankly, it doesn’t really matter what your p-value is. As the research provider, you either have to say:

Your p-value is low, but we still think you should go with Ad A.

or

Your p-value is low, so you need to do more research.

The latter kind of misses the point. It shows a lack of sympathy for the decision making process. A lack of awareness over the cost of research. Maybe p<0.2 was good enough considering how much additional certainty would cost.

Either way, the case against a 95% Confidence Level has as little to do with statistics and much more to do with researchers not hiding behind an academic threshold, inappropriate for corporate research.

Why We Need to Stop Saying “Masterbranding” (Not for the obvious reason)

In Stand by Your Brand, Nielsen published the visualization below, plotting Equitrend and RQ. Each dot represent a company-brand pair. The location of the dot on the vertical axis represents the brand score, EQ. The location on the horizontal axis represents the corporate/company reputation score, RQ.

The positive trend suggests that brand and reputation are closely linked. Thus, the argument, “in our connected world, one corporate misstep can go viral and result in adverse impacts on products linked to the company through masterbranding.”

The link between reputation and brand is sensible and intuitive… all very informative. Except there’s one thing…

This word “masterbranding”: I’m very hung up on it.

First, of course, it sounds a little too much like masturbating. And it reminds me IBM’s rebranding of Deep Thought II.

But more importantly, “masterbranding” suggests that the model of diversifying your brand portfolio and maintaining independence across the performance of your brand family no longer applies.

As I understand it, the argument seems to go something like this:

In a connected world, parent-brand family ties are more salient, so let’s not bother insulating Brand A from Brand B. People will figure out that both brands are part of the same company anyway.

I’m skeptical that the world has changed that much though.

If you want to find out who the parent company of Dove is, you can figure it out. This has always been true before social media, before the internet. Just because information is easier to come by doesn’t mean people will purposefully search the family of brands under Unilever.

Even if we do assume that our “connected world” means consumers are now more aware of the hierarchy of brands, what incentive does a company have to unify their brands?

They would only want to do that if they have such a behemoth of reputational equity in one of their brands that they want to scale up. So much so that they are willing to sacrifice a smaller brand just so the other can have a bigger reach.

But this is so seldom the case. Having several brands is a diversification strategy. Did Porsche sales see a dip during the emissions crisis? Probably a little bit, but certainly not as much as they would have if there were a unified marketing strategy. It comes down to isolating brands so that their performances are mostly independent – otherwise you’re putting too many eggs in one basket.

Think about it: Why is it that a bad Uber experience is potentially national news, but a bad Etsy seller is an isolated incident? Because Uber is positioned as a unified product and Etsy is a platform. I think Uber’s unified brand is its biggest reputational risk right now.

We live in an increasingly “platformed” world, and people understand that Joe Scmho’s racist rant on Youtube doesn’t diminish all the other good Youtube content. People will accept that many familiar companies are really not that different than Etsy, but they’re just operating on a way, way bigger scale.

Why I’m all for Native Advertising

People are down on native advertising. Apparently, imposing native ads on your readers can be construed as deception

But when I think about what differentiates native ads (say, an article about why some company is the best for consolidating student loans) from native content (an article about the state of student loans in America), I have a difficult time articulating those differences.

Both are media creatives that want my attention for the author’s self interest. The article author wants me to keep reading his future publication, maybe I will follow him on Twitter, etc. The ad writer probably just wants me to buy her product.

But I don’t see this as deception, per se. Perhaps there was some expectation that EVERY single article in the publication would be completely fact-based and unbiased. But that kind of filter is naive (whether you’re reading Buzzfeed or reading NYT).

The best point for native ads is this: Somebody is willing to pay to show me this message… it must be worth something. Imagine it was free advertising – imagine the kind of terrible quality that is? When marketers pay to publicize content, it’s usually pretty good — see, the super bowl, for example. 

Finally, I admit that it’s a little deceptive, so let’s just accept that native ads are just another step in the attempt to get consumers to click on their stuff. It’s a cat and mouse game… some marketer figures out a clever new way to get people to pay attention, then people catch on; but by then some other marketer has found a new way to trick people. (Remember those: “one weird trick” ads? Those were really big a few years ago, but not so much anymore.)

Whatever it is, what comes next will likely rely on people’s need to find shortcuts. 

There’s No Such Thing as Social Media

I read a question the other day…

How much do you trust information from the following sources?

Social media

I don’t understand how a question like this could possibly be answered by a respondent. It implies that social media is some sort of sentient being that socializes information — some of which is true and some of which isn’t.

It’s like asking if you trust information that you hear over the phone. I mean, Social Media is a platform across which messages are conveyed. Yes, some social media platforms may have more “fake news” than others, but how can a respondent be asked to evaluate the veracity of a statement simply based on the medium through which it was sent? They can’t. And they shouldn’t.

The other part that irks me about these question is that often you’ll see something like this:

Social media (e.g., Facebook, LinkedIn, Twitter)

Seriously, if this is your view of social media, then you really need to get with the times. Any marketer knows that the demographic makeup of the users of these sites is VASTLY different, and, thus, the content of the communications promulgated through different social media platforms is going to be VASTLY different. The corallary is that the trustworthiness of each of the platforms will be evaluated differently.

A question like this is better framed as an “all else equal” question. For example:

If you heard/read [INFORMATION], from [PERSON], how much would you trust that information if you read/heard it on each of the following media/platforms?

Asking the question like this limits your scope a little bit, but how much scope are you really looking for here. I assume you just want a general sense of how some piece of information would be interpetted on different platforms … does it really matter what that piece of information is? Do you really need to generalize to the point that the question becomes unanswerable for the respondent?

A Reason NOT to Do B2B Research

If you’re an industrial manufacturer, then chances are your client to salesperson ratio is pretty low.

This is what really baffles me when I see businesses with this kind of ratio doing quantitative market research among their industrial clients.

Seriously!? Do you really not trust your sales team to get the information they need to do their job? Do you really think an anonymous survey will elicit better information than your salesperson sitting down with the buyer and asking them what they need to know?

It’s either that or your salespeople are not being incentivized properly.

Listen, if you’ve got a business where you have 5-10 clients per salesperson, think about how the aggregate data would actually be used? Are you just collecting data to make your boss feel better about your customer sat figures? Then I suggest you think long and hard about whether or not the research is actually worth it.

You don’t do research to prove to someone else that your customers are happy. You do research to give the people who are the face of your company the tools they need keep their clients happy. And frankly, an aggregated driver analysis showing which attributes factor most into your customers’ likelihood to purchase is really not doing much to your salespeople, who are interacting with their clients on a one-to-one basis.

If your salespeople can’t figure out how to make their clients happy, then maybe you need new salespeople. But seriously, I recommend you question whether or not aggregated data is going to be useful to the people for whom it really matters … or if you’re just trying to make your boss happy. Either way, trust me, your boss would prefer bigger sales numbers than bigger customer sat numbers.

Jeopardy!, The Princess Bride, and Game Theory

I like Jeopardy! It’s cerebral and I usually get home from work around the time it comes on. For the most part, the trivia aspect is fun, but there is some level of contestant interaction that gets exciting too.

The winner of Friday’s show (Jan 20, 2017) employed an absolutely brilliant strategy in final Jeopardy! It’s something that regular watchers of the show probably caught, but I thought it was cool and wanted to talk about it, so here goes.

Background:

Going into Final Jeopardy! the three contestants scores were:

Neil (returning champ): $9,200
Cathy:  $1,200
Hardy: $12,200

Final Jeopardy! category: WOMEN SINGERS

Final Jeopardy! clue: What she calls her “Love of Many Colors album”, a 2016 release by this singer is her first No. 1 country album in 25 years.

Assuming you’re familiar with how Final Jeopardy! works… you probably know that sometimes the leader ought to assume that their closest competitor will bet all of their money.

In the case above, it means that Hardy assumes Neil bet $9,200. If Neil is correct, then he will end up with $18,400.

Hardy ended up betting $6,201. We can surmise that he did, in fact make two assumptions:

H-1. Neil would bet to 2x his score
H-2. Neil would respond correctly

Why? Because if assumptions H-1 and H-2 above hold, then Neil will end up with $18,400. The only way Hardy can win at that point is if he is correct and bets at least $6,201. (How much above $6,201 may depend on how confident he is about the category).

This is where it gets interesting. In game theory, there is something called level-k thinking. The best way to describe level-k thinking is by example.

Imagine you’re playing Rock-Paper-Scissor against Bob.

You like Rock. It’s strong, easy to throw… you know it’s the best. So, you throw rock. That’s you’re level 0 strategy. It means that you did not consider AT ALL what you’re opponent might do.

But now you think, “what if Bob knows that I like to throw rock?” Now, you’re at level-1. “If Bob knows I throw rock, then I should throw paper.”

Level-2: Bob has figured level-1 out, and conclude he’ll throw rock, so you throw paper

Then it can just go on and on ad finitum.

That’s roughly k-level thinking, although some of the details may be a little wrong. The moral is that when you have a strategic game, you are apt to consider what your opponent thinks,

Back to Jeopardy! So, Hardy bets $6,201. Of course, Neil doesn’t know what Hardy bet; but he’s clever. What does Neil do?

He bases his strategy on two assumptions:

N-1. Hardy would bet based on assumptions H-1 and H-2
N-2. Hardy would respond incorrectly

What makes this so cool is that N-2 is really a corollary to N-1. Think about it, if Hardy bets $6,201, then the ONLY way Neil can win is if Hardy responds incorrectly. So, there’s no reason to operate under any other assumption. If Neil assumes Hardy will answer correctly, then Neil loses regardless of his bet.

If N-1 and N-2 hold, then Hardy ends up with $5,999. What then should Neil bet?

No more than $3,200. And, indeed, that’s exactly what he did. He was wrong, but he ended up with one more dollar than Hardy. Final scores:

Neil: $6,000 – Winner!
Hardy: $5,999
Cathy: $2,400

Now, nothing stopped Hardy from employing a higher k-level strategy. He could have bet $6,199 assuming that Neil would have bet assuming N-1. But he would have looked really silly losing by one dollar if Neil had just decided to bet his whole score.

Plus, at some point, you just end up looking like Wallace Shawn from the Princess Bride.

PS: The correct Final Jeopardy! response was: Who is Dolly Parton?

 

Exactly How Are You Planning to Use that Question?

Q123 In the past six months, how have you gotten information about [COMPANY]?

I cringe when I see this question.

Seriously, if you think this question will give you ANY clue as to how your consumers, stakeholders, or whoever are consuming information about your company, then you’re probably not doing your job right.

First, the media landscape is simply too nebulous to give you discrete answer choices that make sense. I see Facebook and LinkedIn clumped together as a generic “social media” response… Do you really think the people who get information about a company on LinkedIn are the same as the people who get information on Facebook? And still, do you really think the INFORMATION across the two platforms is the same? You’re gonna get a story about Ford’s earnings report on LinkedIn and a story about human mobility on Facebook — how could you possibly have the niavte to clump those together.

Second, people don’t remember. There’s so much data out there with Facebook ads, Google ad words, click rates … all things marketing folks know better than I. All this data makes this kind of question moot. The actually behavioral data is out there if you’re willing to look.

The time on the survey is better spent elsewhere.

Two Brands, One Score

The other day I was reviewing survey results and found something a little peculiar.

The results for [Luxury brand] were the same as [Economy brand] on brand quality. Keep in mind, it’s the same company — think Tide vs. Gain, think Lexus vs. Toyota, etc. — but its lower end brand was rated just as highly as its luxury brand.

It frustrates me when I see a researcher package this up and ship it to the client without really questioning what’s going on. And it frustrates me more when clients don’t question it.

 

Listen. It’s not wrong that a survey question yields the same results for two clearly distinct brands; but if a survey question is returning the same value, you need to evaluate what that survey question is actually measuring.

If you expect that a survey question would return different outputs for different brands, and it doesn’t, then get rid of the question or re-write it. Either way, it’s not providing the value that you are looking for, so there’s no sense in keeping it.

Or you can look at this another way and ask yourself: Why are these two brands being rated the same? Is there a latent concept being communicated that is not obvious at the first pass of the results. Maybe respondents are considering the cost of the brand, and really what is being measured is [BRAND QUALITY] / [PRICE]. Maybe the luxury brand is less well-known and non-responses are diluting the total score.

Either way, passive acceptance that “my luxury brand and economy brand have the same brand ratings” is an unacceptable analysis and, frankly, a waste of your market research dollars.

Significance Testing is not Magic

A ten-minute survey produces a lot of data. And if we’re trying to study the differences between different demographics, then you end up with a lot of comparisons. (A 20-question survey comparing men/women, three age ranges, and three income ranges yields 140 comparisons!).

In a typical data table, these comparisons will be “tested” for significance – usually denoted by a little superscript next to the percentage in a cell.

But significance testing like this is misguided. Here’s why…

Every significance test implies a hypothesis was tested

Generating 140 t-statistics implies that we had 140 hypotheses that we wanted the data to help accept/reject. That’s simply never the case in corporate market research. Hypothesis development is a careful and painful process. It requires a theoretical understanding of the subject matter followed by a painstakingly detailed experimental design and execution.

Corporate market research is usually much more exploratory. Deadlines and budgets mean we can’t always design the study exactly how we want. The result is surveys that are typically a little longer than we wanted in the hope that something in there will have the answer we’re looking for.

5% of 140 is seven

Conducting mass hypotheses tests puts the researcher at risk of interpreting random differences as true differences. (https://xkcd.com/882/)

If you are using a 95% confidence interval, then 5% of the time you’ll get a false-positive (type I error). That’s really dangerous when you have 140 comparisons because it means seven differences may not be real.

It doesn’t make the results any more scientific

Testing 140 hypotheses at once is like throwing a bowl of spaghetti against the wall and seeing what sticks. Putting on a lab coat doesn’t make what you did an “experiment” any more than computing 140 t-statistics does.

Survey questions are not independent

Experiments are supposed to be independent. The results of hypothesis 1 should not have any impact on the results of hypothesis 2. For example, if the survey results show men are more likely than women to do X. Then later in the same results you find men are less likely to do Y, by computing two t-statistics, you’re implying that these results are independent of each other.

Of course, there’s still value in the survey results. But a good researcher needs to recognize that a t-value of 1.96 or more doesn’t represent some magical boundary between reportable vs. non-reportable data.

Treat your data as a set, not an enormous combination of combinations.