In this post, My aim is to answer the question ‘What really is an aggregation and why do we need it?’. The way I am going to explain aggregation is by helping you answer the following 2 questions:
- How do we make sense of Numbers?
- How do we make sense of Text?
Here is the gist of the video:
Aggregation is nothing a but a technical name to a highly intutive process that we all undertake when we are presented with a lot of information – when we have too much information, we tend to summarize it. So, that is what we are doing with numbers here – we are summarizing them to 4 typical summary values (Total, Min, Max & Average) to make sense of them without going through every single value.
Here is a text-version of the video for those of you who prefer to read than to watch – please note that this is NOT a transcript of the video.
If you are given a set of numbers, how would you make sense of it? Typically, you would find the total value of the set of numbers. For example, if you are given the salary information of all the employees in a company, you would start by looking at the Total Salary.
And how do you visualise numbers? Do you remember the ruler/scale that we have used in our school days? We have been trained to plot numbers in graph papers with axis representing numbers – that is how Tableau is going to plot numbers to help you visualise them. So, whenever you visualise numbers, think of a ruler/scale.
Apart from Total salary, what else do we talk about Salary? We compare average salaries between different industries; we wonder what the highest paid Tableau job would be; and we also talk about the lowest salary (the entry-level salary) in a data science job – those 3 (Average, Max and Min) are the 3 other ways we make sense of a set of numbers.
We, humans, when presented with more than a handful of numbers, want to reduce it to a manageable size. That is why we resort to calculating these 4 values (Total, Min, Max and Average) from any data set, whether they are 10 records, 10 thousand records or even 10 million records. In most datasets, the first thing we want to know is Total – that is why Total is the default in Tableau (and in all other BI tools).
If I ask you to give a name to these 4 values (Total, Min, Max and Average), what name would you give? To make the naming process easier, let us summarize what we are doing here. The below picture show you how we take a set of salary numbers for 10 employees and arrive at the total value.
- We take whole set of numbers and reduce it to one value.
- So, we get ‘ONE number’ which tells us something of value about the whole set of numbers.
Do you remember the assignments you had in school where you had to read a long story/novel and summarize it? What did we do there? We took a lot of information and reduced it to few key pieces of information so that we know the essence of the story/novel.
That is the same process going on here – we take a set of numbers and reduce it to a few summary values (total, min, max & average) that help us to get a picture of the entire data set without going through every single value.
So, summarising is a great name for the process and summary value is a great name for these 4 values (total, min, max & average). Aggregation is nothing more than a technical name for summarisation. There are other summary values like variance and standard deviation, but that takes us into the realm of statistics and I want to keep this very simple. I work on a lot of BI/Analytics projects for my clients and rarely did I have to go out of these 4 summary values.
Let us list the steps involved in summarising numbers:
- We got a set of input values
- We applied a function e.g. SUM, MIN, MAX, AVERAGE
- We got a single output – Summary value. If we want, we can plot this single value on a chart, but it does not add much value. Hence, we can just display the value in a simple table.
We started with 2 questions and we answered one of them: How do we make sense of Numbers? Now, I hope that the meaning of aggregation and the need for aggregation are clear to you. In the next post, we will answer the 2nd question: How do we make sense of Text?