Should we question Big Data?

Kaiser Fung is a professional statistician with over a decade of experience applying statistical methods to marketing and advertising businesses. His acclaimed blog, Junk Charts, pioneered the critical examination of data and graphics in the mass media. Kaiser is my guest next week on the podcast and this is an excerpt from it. Kaiser recently wrote his second book, Numbersense: How to Use Big Data to Your Advantage.

Joe: Before the podcast, we talked about John Paulos. He of course wrote the book,A Mathematician Reads the Newspaper and how he frames what we read in the newspaper. I thought your book did a great job of reframing how – or framing how we should be looking at Big Data without saying who is right or wrong, but putting it in our court saying “We need to frame it.”

Kaiser: I think it is very important people need to realize before they dig into the book is, I focus on a very specific aspect of Big Data which is not receiving enough attention right now. We keep hearing about Big Data and it is like volumes of data, there are all kinds of new types of data. We are tracking everybody, every movement and that is true and an important aspect of it. Much of that aspect is very much what I would call supply folks – it is all about the people who are doing the data work, and the people that are tracking us. My book is focused on the consumption side, so one consequence of having so much data is that it is going to be that there are tons and tons of people who are going to come to us with all kinds of arguments, and they are going to tell us that their argument is supported by such and such data. Number Sense

For most of us who do data analysis, it is probably even for people who read data analysis, you will realize that you can pretty much find data to support anything you want to say. So what is going to happen; we are going to have a lot of contradiction and confusion. There will be so much data analysis out there; we do not know what to think. In Numbersense, what I try to do is to give people, as you say, a framework to start thinking about how you would interpret all these things out there. If you have two sets of researchers who are telling you contradictory things, and they have their own data sets to support it; how do you tell which one is believable and which one is junk. Like you alluded to this is not an exercise in figuring who is a hundred percent correct and who is not. Our problems are so complex, and the data sets, even though they are so luminous will never be complete. We will never be able to know for sure that, you are right, and he is wrong.

So, I mean I encourage people to take a skeptical attitude, and to basically develop your own framework for interpreting the data analysis out there. I would have expected some people would probably not hundred percent agree with everything that I have say in the book either. That is totally acceptable, that is sort of part of the mentality of how you approach the interpretation of data analysis.

Joe: Well that is really the whole point to it is – really who do we believe, and how do we analyze? If we want to talk about topical discussions, let us just talk about Eric Snowden for a minute. Should we embrace Big Data, or should we be somewhat sceptical and scared of it?

Kaiser: It is interesting because I just put up a blog post this morning – It is about Snapchat. I wrote that blog post, and at the very end, I cited Eric Schmidt, the past CEO of Google. He had a quotation for that I used to think “Oh my god, this is so creepy.” He said something like “There is no privacy anymore if you do not want anyone to know that you have done something, then you should not have done it in the first place.” I used to think that it is like a common trend how creepy this technology companies are, but I think all these revelations are essentially making me rethink what he actually said. No matter… whether we like it or not, the data is out there and somebody will collect it. It is extremely easy for some people to collect it, we just cannot avoid it. So I think he is just basically saying “If you take that as your starting point, then you should think about whether you should be doing things that you do not want other people to know.” It is a different thing from saying it is creepy.

Kaiser first book was Numbers Rule Your World: The Hidden Influence of Probabilities and Statistics on Everything You Do.

About Kaiser: He holds an MBA from Harvard Business School, in addition to degrees from Princeton and Cambridge Universities. He is Vice President of Business Intelligence and Analytics at Vimeo, a high-quality video hosting platform for creative people. He previously worked at Sirius XM Radio, American Express, [X+1], Exodus Communications, and Sonus Networks. He is also an adjunct professor at New York University teaching practical statistics.