Pandas throwing "Error tokenizing data. C error" while loading data sets from URL

3

I am trying to work on the Titanic competition to get hands on experience with data science & machine learning. I tried to load up the datasets from GitHub but pandas threw the following error:

ParserError: Error tokenizing data. C error: Expected 1 fields in line 32, saw 2

I tried to follow the advice of other SO users so I added skiprows=1 parameter in my pd.csv() call to skip the first row but it didn't work.

import pandas as pd

train_dataset = pd.read_csv("https://github.com/oo92/titanic-files/blob/master/train.csv", skiprows=1)
test_dataset = pd.read_csv("https://github.com/oo92/titanic-files/blob/master/test.csv", skiprows=1)
ground_truths = pd.read_csv("https://github.com/oo92/titanic-files/blob/master/gender_submission.csv", skiprows=1)

train_dataset.head()

Andros Adrianopolos

Posted 2019-05-09T09:30:21.670

Reputation: 322

Answers

2

The path that you are accessing from is a Github repository page which is a webpage, it does not return CSV. You have to click on 'raw' option in Github and then pass the URL which in your case is:

test = pd.read_csv('https://raw.githubusercontent.com/oo92/Titanic-Kaggle/master/test.csv')

Danny

Posted 2019-05-09T09:30:21.670

Reputation: 1 068

Thank you. But Spyder doesn't let me print the head with train_dataset.head(). I have to explicitly place it inside a print() cal. – Andros Adrianopolos – 2019-05-10T01:26:07.167