Breaking down a column in Pandas into a separate CSV for display in Tableau

2

My data is coming from a CSV, which should be visualized in Tableau.

However, the data contains the column category_list, which consists of values separated by a vertical bar (|).

Since Tableau can't handle arrays inside of attributes, I used Python (Pandas) to load the CSV and manipulate the data:

import pandas as pd
companies = pd.read_csv("companies.csv")

I assume that the category_list column needs to be broken down and stored into another CSV (containing the permalink (unique ID) and category pairs).

Something like this:

permalink,category
/organization/-qounter,Application Platforms
/organization/-qounter,Real Time
/organization/-qounter,Social Network Media
/organization/-the-one-of-them-inc-,Apps
/organization/-the-one-of-them-inc-,Games
/organization/-the-one-of-them-inc-,Mobile
/organization/1-4-all,Entertainment
/organization/1-4-all,Games
/organization/1-4-all,Software
/organization/1-800-publicrelations-inc-,Internet
/organization/1-800-publicrelations-inc-,Marketing
/organization/1-800-publicrelations-inc-,Media
/organization/1-800-publicrelations-inc-,Public Relations
/organization/1-mainstream,Apps
/organization/1-mainstream,Cable
/organization/1-mainstream,Distribution
/organization/1-mainstream,Software
...

How to achieve it?

Excerpt of the original CSV:

permalink,category_list,...
/organization/-qounter,Application Platforms|Real Time|Social Network Media,...
/organization/-the-one-of-them-inc-,Apps|Games|Mobile,...
/organization/1-4-all,Entertainment|Games|Software,...
/organization/1-800-publicrelations-inc-,Internet|Marketing|Media|Public Relations,...
/organization/1-mainstream,Apps|Cable|Distribution|Software,...
...

Peter Gerhat

Posted 2017-08-05T22:28:15.163

Reputation: 173

I'm voting to close this question as off-topic because it was cross posted – Sean Owen – 2017-08-06T10:54:39.473

@SeanOwen I've deleted the other post. The question fits better into data science and the answer below is correct, so please reopen it here. – Peter Gerhat – 2017-08-06T11:06:27.327

Answers

4

If you don't need other columns, here is a solution. It splits the column, stacks in vertically and combines with "permalink" column

df.set_index('permalink').category_list.str.split('|', expand=True).stack().reset_index('permalink').rename(columns={0:'category'})

permalink   category
0   /organization/-qounter  Application Platforms
1   /organization/-qounter  Real Time
2   /organization/-qounter  Social Network Media
0   /organization/-the-one-of-them-inc- Apps
1   /organization/-the-one-of-them-inc- Games
2   /organization/-the-one-of-them-inc- Mobile
0   /organization/1-4-all   Entertainment
1   /organization/1-4-all   Games
2   /organization/1-4-all   Software
0   /organization/1-800-publicrelations-inc-    Internet
1   /organization/1-800-publicrelations-inc-    Marketing
2   /organization/1-800-publicrelations-inc-    Media
3   /organization/1-800-publicrelations-inc-    Public Relations
0   /organization/1-mainstream  Apps
1   /organization/1-mainstream  Cable
2   /organization/1-mainstream  Distribution
3   /organization/1-mainstream  Software

Then you can save it to csv.

Andrey Lukyanenko

Posted 2017-08-05T22:28:15.163

Reputation: 356

0

If it is a one-time transformation, you can simply separate the field's contents in Excel itself. Execute the following steps:

  1. Open the file in MS Excel
  2. Go to the Data tab
  3. Click on the Text to Columns option
  4. Select Delimited
  5. Set | or any other desired character to be the field separator / splitter

However, this is only in case it is not a periodic transformation requirement. If so, @Andrey Lukyanenko's solution is a better one.

vsdaking

Posted 2017-08-05T22:28:15.163

Reputation: 226