How to get the right data, at the right time: the data analytics process explained
In a volatile, uncertain, complex, and ambiguous environment (VUCA), organizations and individuals constantly attempt to conceptualize what’s going on around them in some way. The business world is volatile because you can’t predict many of the challenges and unknown variables you will face during your lifetime, nor can you predict their duration, in many cases. The world is uncertain because sometimes there are many variables at play, and each one of them creates instability in some area of your organization.
Controlling all the variables is almost impossible. Some phenomena in your environment are complex (i.e., impossible to grasp completely). You can only deal with complexity to a certain extent by holistically analyzing the situation at hand and making some sense of it. In the organizational world, ambiguity plays a key role in every situation. Most of the time, you can’t pinpoint causal relationships among the many variables at play. And correlation is not causation, although believing the opposite can be comforting and self-serving sometimes.
Uncertainty is a constant truth of human and organizational existence. You can only control your reaction to the complexity of the world around you. Living an examined life requires the acceptance of things as they are while working diligently to do your best with what you have where you are. You can cope with VUCA through vision, understanding, clarity, and agility. You can establish a high-level vision for where you would like your business/product/life to be in ten years. That’s your north star that keeps you intrinsically motivated when navigating the utter complexity of the world. You can gain more understanding, clarity, and agility by striving to use fact-based data and the design thinking mindset to make decisions and decode some of the complexity embedded in reality.
In organizations, navigating complexity is one of the core jobs of data analytics teams. Data analytics is the discipline that provides you with understanding and clarity by defining your essential organizational structure and the most relevant data points you need to get reliable information you can use to make decisions. You can define your data architecture and analytics by following a four-step process (Seiter, 2019):
Framing
The first step in data analytics projects is framing the problem. The initial problem is usually a business-related problem. Businesses are complex by definition, and there are tons of opportunities to improve the quality of decisions. Framing the problem means defining exactly what you are addressing, and from which unique point of view (i.e., the frame you choose).
As a startup owner, you may want to get insights into the total number of customers who subscribe to your product for specific time periods, and the churn rate, and understand why some customers are unsubscribing from your product so you can improve it according to the insights you get. That can be a basic business problem definition. Getting truthful insights into customer behavior can help leadership teams craft their products more intentionally using real-life data rather than guesses.
For every business problem, there is an analytics problem. Analytics problems can be descriptive, prescriptive, and predictive. Depending on the business problem, you can identify which kind of analytics data you need. Descriptive analytics provides data that merely describe the business situation as it is. There is no inference or intention to use the data to make decisions. Descriptive data is useful to see reality as is through data.
Prescriptive analytics uses your data to provide actionable next steps. In this instance, your data is not just described. There is also an algorithm that extracts patterns in the data and provides suggestions on the business decisions to make according to those patterns. Similar to a doctor, it prescribes the most logical next business decision.
Predictive analytics uses your data to provide future predictions. This type of analytics approach merely provides you with scenario probabilities depending on your past and present data points. Unlike prescriptive analytics, there is no decision recommendation here. You can predict future patterns based on the past, and then you can use your business acumen to deduct the best decisions based on the predictions of the algorithm.
Allocation
Now that you have framed the business and analytics problem, you can move on to allocation. This refers to understanding what data you need, the IT resources required to solve the challenge, and any human resources needed. The data you need depends on the business problem you defined in the previous step.
To get data on the people subscribing to your product and leaving your product after a few months, you need to identify the source of the data. Do you already have it stored somewhere? Or do you need to also develop the whole infrastructure to get the data into your database?
At this stage, you want to get a clear overall picture of your current and desired analytics situation. The process of developing your solution will depend highly on your current situation. If you already have the data somewhere, you will follow a different process than if you had to start from ground zero.
And the same goes for IT resource allocation, which will vary depending on your current state of data. IT resources allocation includes choosing the software and hardware you need to devise a solution to your business analytics problem. Together with IT resources, you also want to define the human resources needed for the successful implementation and maintenance of the overall infrastructure.
Analytics projects do not just end once there is a system developed. Things need to be maintained, and that’s a whole job in and of itself. Failing to maintain a system properly may lead to low data quality and a non-functional data flow that would make it impossible to draw useful conclusions from your data.
Analytics
Once you have defined the type of data and resources you need, it is time to take action and implement the process of extracting the data from the source and developing the database infrastructure. Your database is the single source of truth where you store and manage your data for your analytics implications and visualizations. A database is composed of relational or non-relational tables, depending on your data model.
At this stage, you want to create an entity-relationship diagram to map out your entire database structure and the tables included. You will then use a statistical model to extract the data you need depending on its type (e.g., cluster analysis, regression analysis, etc.). You can read more about data processing types here.
Once those two steps are complete, you will use the ETL model to get the data into your single source of truth (database). ETL stands for “extract, transform, load”. You extract the data from the source; you transform the data to ensure it is structured (i.e., with clear headings and data types); you load the data into your data warehouse. Sometimes, the “transform” and “load” steps can happen simultaneously thanks to powerful tools that have been developed recently. You can find a list of ETL tools here.
Presentation
Ultimately, business stakeholders do not care about all the steps above. The reason why you develop a business analytics solution is to solve a business problem. A business problem likely involves decision-making challenges. The key element that stakeholders will be looking at is the visualization of the data you collected and cleaned up in the previous stage. Visualization involves dashboards displaying the key KPIs (key performance indicators) that your stakeholders (people interested in using your solution) have asked for.
Data visualization tools include Tableau, Google Data Studio, Power BI, and others. Your visualizations will only be as good as the quality of the data you collected during the previous stages. Garbage in, garbage out, as one often hears in data science. You need to ensure you do a good job in all the previous steps in order to truly benefit from the final visualization dashboards.
Some time ago, I wrote about the design thinking mindset. Design thinking can be used in data analytics projects, especially for the first three steps above. That’s because design thinking would enable you to involve stakeholders throughout the process (which could increase buy-in), ensure everyone is truly heard, and get real feedback from stakeholders during the entire analytics project (empathize, define, ideate, prototype, test). Read more about design thinking here.
There are many parallels between the analytics process and design thinking. To define the business problem, you need to empathize with your stakeholders and get to understand them. To define the best tools for the job, you need to ideate and compare them as rationally as possible. To get feedback from stakeholders along the way, you need to prototype visualizations. To iterate on the project and ensure every need is met, you need to test along the way.
So, applying design thinking to data analytics projects can be valuable. The core objective of data analytics is to support business decisions with real, unapologetic data. There is no space for subjectivity in data analytics, unlike decision-making. Data is one of the most abundant resources in organizations in the 21st century. Making sense of it requires discipline and a holistic understanding of the entire business using system thinking.
If you find this post valuable, consider signing up for the weekly reflection here, my once-a-week newsletter with curated content and short reflections to live with intention.
USEFUL AFFILIATE LINKS
Get one free month on the pro plan in Make (automation software)
Get 20% off any Centered subscription (deep work sessions tracker with AI coaches) by using the discount code
SIMONE20
here.