It has been a while since I have been hearing a lot of things about Alteryx especially as a great ETL tool for Tableau since Tableau lacks the ability to change the shape of the data (except for the few limited features like unpivoting and union that were added, if my memory is right, in the last 1 year). Having the data in the right shape (or as we call it, having the right data model) makes working in Tableau frictionless and productive. When encountering a tricky reporting requirement, especially one that does not seem easy to build on Tableau, most newbies to Tableau try to write complex calculations or resort to some hacks to solve the problems. Whereas people who are masters in the craft, like Joe Mako, often talk about changing the data model to make working in Tableau easier. Every real-life project that we work on always has a few data model issues that makes data cleanup/data preparation necessary before bringing the data into Tableau. So, we have been on the lookout for good ETL tools that would fit the budget for our clients. But I have ignored Alteryx for a number of reasons which I will explain below and now I have decided to take Alteryx for a test drive now, primarily inspired by 2 people – Joe Mako and Ken Black. In this series, my intent is to capture my entire journey in learning and evaluating Alteryx as a tool for data preparation and data cleansing.
First let me give you a quick summary of my experience in this space, so that you can understand the basis of my opinions/judgements. I have gotten in to the world of ETL/BI tools in 2002 when I was working on building a Master Data Management (MDM) solution. My company chose Informatica as the ETL tool and Business Objects/Web Intelligence as the reporting layer and my job was to integrate these 2 tools into our product suite so that we can cut down the product development time. My initial reaction to these 2 tools were not positive – I felt that they inhibited our productivity and given that I was a hard-core programmer, I did not really like drag-and-drop stuff. But over time, I have understood the value of these tools as they enable a completely different paradigm – enabling non-technical users to do a lot of the tasks needed to analyze data that originally required developers. In the recent times, there is a new buzzword to capture this value – self-service analytics.
Now, coming specifically to the ETL tools, I have worked with (or managed projects which used) a number of ETL tools over the last 14 years. My impression of all the enterprise-grade ETL tools like Informatica and Data Stage is that they are not easy-to-use. They are great for use-cases where a large amount of data has to be moved between systems with well-defined schemas and there is a lot of time available to design, build, test and deploy to production these workflows.
What I have found in my experience was quite surprising – the really valuable data is never stored in the databases that these big ETL tools were designed to work with. That is a big statement – let me explain what I mean. Typically, the most important important information in a company from a financial perspective is budget – except for a small percentage of companies who have chosen an EPM (Enterprise Performance Management) solution like Hyperion or AnaPlan, most companies rely on the time-trusted Excel. And most companies wants to do a budget vs actual review every month. The actual data would be most likely stored in some ERP or other order-processing system. But the budget data is in Excel. None of the early-leaders in BI tools or even early-leaders in ETL tools even envisaged this possibility – so, just to get some simple Excel, you have to write a complex ETL workflow that could take days (typically weeks, if I include the time needed for testing and moving to production). Needless to say, this was frustrating – hence most business analysts faced with this task did the obvious – they exported the data from their company-standard BI tool and brought it into Excel, and did VLOOKUP or some other manual joining operation on Excel to do the Budget vs Actual analysis.
Apart from Budgets, there is a ton of other important data that is maintained in Excel – the more important the data is to the senior management, the more likely that it is maintained in Excel so that the updates can happen in a typical management meeting. The advantage/problem with the Excel data is that the structure can change very easily – end-users love this ability to change things as they want and ETL developers hate it – but it is reality.
But today, most tools support loading data from Excel, except for the industry heavyweights (is there some truth to the statement – Elephants can’t dance, despite the famous book by the IBM CEO to say the contrary?). Tableau has been adding a lot of features in the last 1 year to make working with Excel and CSV files much easier. But there are still number of requirements that we face on client projects which necessitate us to look outside Tableau. I have ignored Alteryx because I have grown into a bit of skepticism regarding ETL tools and most of them are brittle and hard-to-use especially in complicated real-life scenarios, despite whatever the marketing brochures/vidoes want us to believe. The second key reason for ignoring Alteryx was the hefty price-tag (5000 USD per license per year seemed very high, especially for my Indian customers, who hesitate so much about buying Tableau by paying 2000 USD per year for a life-time license).
However, as indicated above, 2 people played a key role in making me relook at Alteryx. The first person is Joe Mako. For those of you who don’t know Joe Mako, he is a long-time Tableau Zen Master and Alteryx Grand Master. The 3 keywords I remember hearing from him about Alteryx are “visual” and “precise control”, “easy-to-use. (I don’t remember the details as the conversation happened at least 1 year back). Here is a great summary of Joe Mako’s thoughts on Alteryx from another blogger : Alteryx and Tableau, yin and yang
The second person who inspired me about Alteryx is Ken Black through his blog on Tableau and Alteryx. He is a data scientist who has been in the field for 30+ years and had gone through the grind of preparing data for analysis and presenting them without any of these sophisticated tools. His blog has a lot of insights about how his productivity has soared by combining Tableau and Alteryx. For me, his sheer enthusiasm about Alteryx really motivated me to get started on Alteryx. Here are the articles that took me by hand and led me into the world of Alteryx:
- How and Why #Alteryx and #Tableau Allow Me to Innovate – Part 2 – Please don’t miss out reading the tremendously complex data science problem Ken Black describes in detail in the second part of the post – this is the THE article that finally converted me to Alteryx! Thanks Ken for writing such a wonderful masterpiece!
- How To Achieve Better Data Comprehension, Part 2
- Proof of Why Alteryx Is Great Software
- How I Use #Alteryx To Simplify My #Tableau Applications
So, here are the scenarios I have in mind to test Alteryx with:
- How well can it handle Excel files and CSVs? I have a number of real-life requirements with all sorts of complexity. I would be expanding on these problems as I work on them using Alteryx.
- How well can it help us clean data, especially data with spelling mistakes? An interesting variation is the same item spelled in different ways due to the original name being in a foreign language. A problem that I have seen in real-life was that the city ‘Geneva’ was spelled/stored in 80 different ways. 2 of them are valid spellings – ‘Geneve’, being the french version and ‘Geneva’ being the English version. Whenever a city name has its origin in another language, it is spelled in different ways by different people in English and we want all of them to be considered the same. The dataset I am going to test Alteryx with is going to be from my Indian clients.
- Can it help in geo-coding locations that are not recognized by Tableau? For one of our Indian customers, 80% of the locations were not recognized by Tableau. Hence, we had to resort to using Google API and R to fetch geocodes. But the process was very cumbersome and based on my experience in the local Tableau User Group (ChennaiTUG), this is a question lot of people ask me about.
- Can it process data which is stored in Excel but it is not really tabular? Often, clients take reports from their existing systems and want to use them as the basis for their reporting. There is a software which is so widely used in India, but it does not use any database – hence the only way to get the data out is by running reports and it spits out reports which looks good when printed. But to process it to extract data from it has been very painful. I have given up on this, but I want to see whether Alteryx can help me with this. If it can, then it is a HUGE benefit for our clients.
- One of our clients who enables small manufacturers to put their products on ecommerce portals works with scores of clients, let us say, 50 clients. Everyday, he is sending out an Excel to these 50 clients containing the details of the items sold, inventory, payment, etc. We are trying to automate this process with Tableau, but our client can’t afford to make everyone of his clients buy a Tableau Online account (that too priced at 500 USD per user in a country like India). I want to see whether Alteryx can provide an alternative, especially since it says that it can support sharing the output in different ways.
- There has been a number of scenarios where we have written R code to do some transformation or used a combination of ETL tools to achieve an end – for example, we had to take data from SurveyGizmo and use it in Tableau, but a number of issues existed for the data to be consumed in Tableau. I want to see whether I can read from JSON-APIs (which are incidentally supported in Tableau 10.1 released just few days back) and tidy up the data and transpose it before bringing it into Tableau.
- Can Alteryx help me with scrambling the data from real-life projects so that the process of sharing them on my blog becomes easier? Ken Black has already pointed this possibility here, but my requirements are a bit different and I want to really see how this works.
- Last, but not the least, I want to see how well Alteryx enables Predictive Analytics. I am just getting started in this area and I have enrolled in the Udacity course on Predictive Analytics that was co-developed by Alteryx and Tableau. As part of my MBA in Finance at ISB, I have taken a number of advanced statistics courses and financial engineering courses which rely heavily on statistics, but I never had the opportunity to apply them in real-life projects. But in the last 1 year itself, at least 3 of my Indian customers have asked me whether Tableau can help them with predictive analytics in estimating sales and inventory, etc. So, I would be relearning all the things I have almost forgotten about (like Linear Regression, heteroskedasticity :), etc.) and would be trying to see how I can apply them on the test projects at Udacity first using Alteryx (and hopefully, on real projects later)
A note to the Alteryx team – I have high expectations, especially as I am coming to Alteryx after being influenced by Ken Black. Also, I have had background in software design, usability, etc. – hence, I tend to expect a lot from software which claims to consumerize analytics. And last, but not the least, Tableau has really spoiled me, with such a great design that keeps things simple, but also enables super-productivity. So, I would be drawing constant comparisons between my Tableau experience and Alteryx. I hope I would enjoy the ride and I hope other Tableau users who want to learn about Alteryx will benefit from my journey.