Video

# Basics of Aggregation Part 2 – How do we make sense of Text?

It has been a long time since I have posted here – 4+ months. Number of external events happened that really threw my schedule into a disarray – the first and the biggest one of them was a cyclone that shook our city, Chennai (India), on 12 December 2016. This resulted in internet getting disconnected for a number of weeks and even when the connection came back, it was quite unstable. Then, we had a number of different political events happen which disrupted normal life. Hence, the flow I had in creating this series was disrupted. Once I have lost the flow, it was hard to come back. And project deadlines and international travel also made it difficult for me to get back to doing what I love.

Now, I had to spend a number of days trying to read/watch all that I have created to reorient myself. I really struggled to get started with the same energy and the ideas I had before. If any of you had read the book – Big Magic, you can understand the phenomenon I am going through better – once you start any creative endeavour, it is hard to resume if you take a break. Quite literally, the ideas go somewhere else and it is hard to bring them back. If you have not read the book, I suggest that you at least take a look at Ken Black’s short post where he talks about this book. By the way, Ken was the one who introduced this book to me and it helped me in so many ways to reflect consciously on the process of creating something – Thank you so much Ken.

Enough Stories – let us get back to the subject matter at hand. In this post, let us answer the 2nd of the following 2 questions that I introduced in the last post:

1. How do we make sense of Numbers?
2. How do we make sense of Text?

The last post answered the first question and this post will answer the second question.

Here is the gist of the video:

There are 3 ways to make sense of Text:

1. List the unique values
2. Count the number of unique values
3. Count the number of records.

When we are generating a report using any text value (which are called Dimensions in Tableau), then the default behaviour is to list the unique values. Counting the number of unique values (technically called as Distinct Count) is another thing we do often.

Here is a text-version of the video for those of you who prefer to read than to watch – please note that this is NOT a transcript of the video.

Let us go back to the sample data set and pick up a text value – Department. How do we summarize a text value like Department?

One way to summarize Department is to eliminate duplicates i.e. to list only unique values. So, we have taken 10 records with 10 values for departments and got a unique list of only 3 departments.

How else can we summarize Text? What if we just count them – How many Departments? How many Roles?, etc. This is a question we routinely ask – how many products we have, how many customers do we have. Products and Customers are text values and not really numbers – so, to arrive at a number, we need to count the unique values.

Pay attention – when we ask ‘How many departments’, we do NOT really want to count the number of occurrences (10 occurrences of departments), but only the distinct values (3 unique values) – this is a common mistake people do.

Let us summarize the 3 ways in which we we can make sense of Text:

1. List the unique values

2. Count the unique values

3. Count the number of records.

When we bring a Text value like Department into a report, the default action taken by any BI tool including Tableau is to list the unique values.

Just to recall our earlier discussion about how we can make sense of Numbers: There is only one way to make sense of Numbers: By Summarising/aggregating them to a single value using the typical summary functions like Sum (i.e. Total), Min, Max and Average.

When we bring Text and Number together in a report, you will see that default ways to make sense of Text and Number are applied : For text, the unique values are listed and for numbers, the values are summarised using Sum function thus giving us a total. Here, in this example below, we see the 3 departments listed with the total salary of all employees in that department.