Why do we trust the numbers?

by Logan McLaughlin

Anyone who has ever done anything online in the past ten years has a distinct image in mind when they say the term “analytics” . This isn’t to be confused with “the algorithm”, but analytics refers in a broad sense to quantitative data. Except it doesn’t. When most people say the word “analytics” in reference to online content or activity, what they are actually referring to are processed numbers, curated figures, and graphs that they feel empower them to make data-driven decisions. But often these analytics are simplified, highly visual and presented as self-evident truths because they are designed to be that way. Analytics in this fashion are presented at a surface level, they make the end user feel smart because they look at a chart and immediately see the pattern.

This isn’t to talk down to or invalidate the work of data analysts who process raw data to find trends, or to those like myself who do the work of combing through various kinds of data, both quantitative and qualitative, to extract insights or answers to questions we or our employers may have. No, the purpose of this piece is to examine the perceived objectivity of analytics and interrogate why, even as a good baseline of understanding, analytics are often reductionistic and in some cases misleading. 

I’m not going to sit here and rehash “How to Lie with Statistics”, but instead I’m going to focus on the concept of reductionism. Reductionism is a term thrown around a lot in academia to refer to simplifying or describing complex phenomena in simpler, non-specific terms. A good example of this would be when people assume that working in statistics has a lot to do with misleading and misdirecting, based on the title of the book “How to Lie with Statistics”. We all know that would be a gross oversimplification.

Reductionism is at the core of why we see analytics as self-evident. We as humans like when things are simple, and in modern contexts, especially online contexts, we are inundated with information so much so that we are drawn to simplifying the information to make it more digestible. Numbers and charts happen to be a great way of doing this, but also come with the inherent bias that numbers = truth. Nobody likes to hear bad news or feel like data isn’t actionable so we often spin things positively or present data in terms of goals. Charts and metrics have a distinct lean toward growth mindsets, providing projections with an eye toward the next major milestone. 

Reductionism can appear in how variables or numbers are labeled or calculated. Over the years we have collectively been trained to look for engagement as a function of impressions made or view count. But the way that view count is represented is fuzzy. A notable case of this occurred in 2019, when a settlement found that Facebook attempted to drive revenue and content to the platform by defining a view of a video as 3 seconds of that video being played. Many of those videos would automatically play as a user scrolled through their feed, thus that 3 seconds was often enough to count those who ignored or scrolled past a video as a “viewer”. This is contrasted by YouTube, which does not include autoplays as part of view count, but instead only counts a user intentionally starting a video and watching for more than 30 seconds as a view. 

Facebook’s definition, and obfuscation of that definition, precipitated a wide variety of companies choosing to go all in on pushing their content to Facebook because the metrics showed their content got more views on the platform. Advertisers especially liked this because of the correlation often propped up between view counts and “reach”. This mass content migration toward Facebook is often attributed as one of the causes of decline in independent video hosting sites that previously were able to monetize their modest audiences in a sustainable manner. Rather than driving viewers to their own site where they had more sway in monetization, many publishers and creators at the time looked at the metrics and said “Facebook is where the views are”, while Facebook themselves succeeded by charging content creators for access to their communities and making money off their own ads. All of this simply because videos have more “views” and therefore more reach on Facebook. 

“Reach” and“engagement” are terms that are hand waved into a “if you’re in this business, you know what it means” category. However, doing any sort of search for a unified definition of the term mostly lands you at the homepages of several different analytics apps or data brokers, all of whom provide either fuzzy definitions of the concept or simply eschew a definition entirely for “use our app and we’ll tell you”. Either way, the message here is “math is hard, let us do it for you”. This is perhaps the most reasonable part of the pitch, but doesn’t answer the question of what “reach” actually is, or why it’s valuable. From an academic standpoint we would say that we failed to operationalize the variable; there is no standard definition of reach as a variable nor how to calculate it that is commonly understood. This lack of definition means that you have to try and figure out how each individual site or service may be operationalizing or measuring reach, and why they are incentivized to use the definition they use. 

This is not to say that data and analytics are useless, but that we see a movement toward data without hypotheses, or rather data bereft of personally relevant hypotheses. The main difference between a cool chart being a neat factoid and using data to drive decision making is the intent behind the data collection. While exploratory, open ended research often casts a wide net, it still has defined parameters such as the topic to be explored. It is easy to fall into the trap of the “more data is always better” fallacy, and often this goes hand in hand with the collection of analytics devoid of any hypothesis. Knowing why you want to collect data makes that data powerful, but we get so inundated with the idea that analytics are both self-evident and essential and that the decisions they help us make are a foregone conclusion. Developing hypotheses first helps you operationalize and define your variables and how you want to measure them. Defining your own parameters gets you closer to the sources of your own data, be it your community’s or customers’ sentiments, or a measure of the “success of an activation”. Defining your own data needs, testing your own hypotheses, and truly building an insights solution is often a more labor intensive process, but you’ll find that it is well worth the effort.  

-

SpaceTime Strategies specializes in working with clients to determine what goals they’re trying to reach and how they can leverage data and the gamer demographic to accomplish those goals. Once we’ve figured that out together, we create a plan to gather and organize the data that you’re looking for. To talk to a gaming ethnographer more about our services, contact us and mention you’re looking for information on research and data.