There are three kinds of lies: lies, damned lies, and statistics :)
People can tell lies with words, but that doesn’t make words bad. By the same token, people can tell lies using statistics, but that doesn’t make statistics bad.
Statistics is like a foreign language. If you don’t study a foreign language, you won’t be able to comprehend it. By the same token, if you don’t study statistics, you won’t comprehend it.
The problem with statistical information is not just the fact that some people lie with it. The root of the problem is misinformation. You have heard the phrase “I know just enough to be dangerous”. Well, what happens when someone writes an article or gives a talk, using statistics that they don’t know much about? There is a good chance they will use them incorrectly. They didn’t set out to intentionally tell a lie, they just reached a little beyond their ability and failed to communicate some key information, or communicated some information they should not have.
If you are reading and article or listening to a talk and the statistical information is abstruse, that should be a warning sign that you are in over your head. Think about some of the fine print legalese you have seen in loan documents and such. You probably feel pretty leery about that information, unless you are an attorney and you understand the language.
I don’t think abstruse statistical information should be taken to be a sign that someone is lying anymore than legalese should be assumed to be a lie. So, what can you do if you feel in over your head with some statistical information? Basically, “Trust, but Verify”, as Ronald Reagan used to say (http://bit.ly/4KMLhv).
I’m sure there are plenty of books on this subject so I probably wont’ do justice to the subject, but as a professional statistician who has worked on thousands of research studies since I first became a statistical consultant in 1993, maybe there is some value to what I will toss out there for others to debunk, I mean comment on ;)
So, my list of the top 3 statistical debunking criteria is:
2. Telltale Signs
1. Source: Who or what is the source of the information? What is their agenda? What are their credentials? If the communicator has a hidden agenda, they may be more likely to use statistics to their advantage. If the author has a Ph.D., they might have a hidden agenda, but more than likely they want to protect their integrity and I would expect them to be less likely to like using statistics. If the author has nothing to lose so-to-speak, and they stand to gain something if they persuade you to believe what they are saying, you might want to “Trust, but Verify”.
2. Telltale Signs:If you detect inaccuracies in some of the information from the source (author/speaker), that should be a warning sign that they are reaching beyond their ability and/or intentionally misinforming you. That should lend suspect to the integrity of the entire communication, including the statistics.
If a key piece of information is missing, like the sample size, that should be a warning sign. For example: More than 75% of those surveyed agreed that the project was an utter disaster. For all you know, the sample size was 4, not a very representative sample of a population of size 100,000 or whatever.
The communicator should know the audience and speak to their level of statistical understanding. If a person is giving a talk to a group of Ph.D. statisticians, they don’t have to stop and explain basic statistical concepts to prepare them for the rest of the talk. On the other hand, if the intended audience is expected to have only a basic familiarity with statistics (e.g. a statistics 101 course in college), then the speaker/author should take more time to explain the methodology so that the listener/reader can fully grasp what is being said. If the communicator appears to be “disconnected” with the audience, that should be a warning sign.
3. Methodology: A credible communicator with good intentions would clearly articulate the limitations of their claims. Every statistical study has limitations. A small sample size limits the ability to generalize the results to other settings for example. Or, the sample might be biased. Or, important variables (confounding variables) may have been omitted from the analysis which could alter the findings.
The communicator should document the procedures they used to conduct the study. Or, they are voicing an opinion and referring to statistics, they should cite the source of the statistics. Maybe they should explain the methodology used by the source. Was the sample size a fair representation of the population, did the researchers use sound research methodology.
Basically, what it comes down to is, if you feel you are in over your head regarding some statistical information, you have to first ask: Is it me? or is it him? Maybe you just don’t understand the language well enough to know if the person is telling the truth or not. On the other hand, if the person has a shady reputation and they are trying to sell you some ocean front property in Arizona, then he just might be telling a lie with statistics as well ;-)