At the previous article we fixed what dashboard we will create based on my framework with Task, Stakeholder, What to Identify and KPI's. To review it let me put the framework here again.
Then let's go forward to data understanding and data preparation process today. You can get the dataset for this training in Kaggle. The URL is here.
After you download the data, you will see some csv files and json files. We will use only CAvideos.csv and CA_category_id.json for the sake of simplicity.
So let's take a look of the csv data in Tableau.
You will see there's something wrong. Let's take a look at category_id here. At the third record, wait, what's the "Rudy Mancuso" category? You will also find many blank columns. It looks messed up. How it could happen?
In conclusion, this is because come columns like title include comma in themselves. Basically a text file can be separated by comma so Tableau recognize the comma in these columns as a separator.
Then we have to edit Tableau's text properties like this.
As you saw the double quotation mark " was used to identify a series of strings in title, for example so we used " as a Text qualifier, then here we go, it's fixed.
Then you may feel that category_id is not useful because we want to know what the category is, because we stated that "we want to identify what category to go" in the process 3 of the framework: What to Identify.
We have CA_category_id.json it seems to have a description of category_id, but unfortunately I am not familiar with json file and Tableau don't allow us to add it on Text file.
Then I decide to use JSON to CSV converter. You can open json file with any text editor and copy and paste them into the converter, then you can download the csv file generated.
Then you can join the category_id csv file.
Finally we can see the data. I hided some columns that would be useless.
Now we have these columns here. Let's rename "snippet/title" to "Category". And as for the "trending_date", it is about when the video is picked up as a trending video, but the format is "year.day.month" that makes Tableau difficult to convert it into date format. In fact, if you change it into Date format with using "Change Data Type", it is converted as "Year/Month/Day" format taht is different from the original order.
So we need to do a small data prep. The new variable is here:
Trending Date as Date
DATEPARSE ( "yyyy/MM/dd", '20'+SPLIT([trending_date],'.',1) +'/'+ SPLIT([trending_date],'.',3) +'/'+ SPLIT([trending_date],'.',2) ))
The code above is basically arranging the trending_date into a correct date format in Tableau.
From the next post we will start creating the dashboard. See you then.