Convert a list of lists into a Pandas Dataframe

62

12

I am trying to convert a list of lists which looks like the following into a Pandas Dataframe

[['New York Yankees ', '"Acevedo Juan"  ', 900000, ' Pitcher\n'], 
['New York Yankees ', '"Anderson Jason"', 300000, ' Pitcher\n'], 
['New York Yankees ', '"Clemens Roger" ', 10100000, ' Pitcher\n'], 
['New York Yankees ', '"Contreras Jose"', 5500000, ' Pitcher\n']]

I am basically trying to convert each item in the array into a pandas data frame which has four columns. What would be the best approach to this as pd.Dataframe does not quite give me what I am looking for.

Aravind Veluchamy

Posted 2018-01-05T18:40:33.767

Reputation: 621

see this question in stack overflow: https://stackoverflow.com/questions/.../getting-list-of-lists-into-pandas-dataframe

– keramat – 2018-01-05T18:46:09.003

Answers

74

import pandas as pd

data = [['New York Yankees', 'Acevedo Juan', 900000, 'Pitcher'], 
        ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
        ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
        ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

df = pd.DataFrame.from_records(data)

Emre

Posted 2018-01-05T18:40:33.767

Reputation: 9 953

12You could refine it a bit more with:

DataFrame.from_records(data, columns=['Team', 'Player', 'whatever-stat-is-that', 'position']) – Juan Ignacio Gil – 2018-01-11T10:14:28.443

1Is there a way to specify the imports more specifically? E.g. I want to specify that DataFrame["Team"] must refer to the first item of each sublist (i.e. data[i][0]) and DataFrame["Position"] to refer to the last item of each sublist (i.e. data[i][-1])? – Ivo – 2019-01-17T15:20:58.123

@Ivo: Use the columns parameter of DataFrame.from_records.

– Emre – 2019-01-17T21:27:43.977

21

Once you have the data:

import pandas as pd

data = [['New York Yankees ', '"Acevedo Juan"  ', 900000, ' Pitcher\n'], 
        ['New York Yankees ', '"Anderson Jason"', 300000, ' Pitcher\n'], 
        ['New York Yankees ', '"Clemens Roger" ', 10100000, ' Pitcher\n'], 
        ['New York Yankees ', '"Contreras Jose"', 5500000, ' Pitcher\n']]

You can create dataframe from the transposing the data:

data_transposed = zip(data)
df = pd.DataFrame(data_transposed, columns=["Team", "Player", "Salary", "Role"])

Another way:

df = pd.DataFrame(data)
df = df.transpose()
df.columns = ["Team", "Player", "Salary", "Role"]

Paloma Manzano

Posted 2018-01-05T18:40:33.767

Reputation: 211

6

You can just directly define it as a data frame as follows:

import pandas as pd

data = [['New York Yankees', 'Acevedo Juan', 900000, 'Pitcher'], 
        ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
        ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
        ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

data = pd.DataFrame(data)

LUSAQX

Posted 2018-01-05T18:40:33.767

Reputation: 713

1

import pandas as pd

data = [['New York Yankees', 'Acevedo Juan', 900000, 'Pitcher'],
        ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
        ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
        ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

df = pd.DataFrame(data)

tharun___ data enthusiast

Posted 2018-01-05T18:40:33.767

Reputation: 96

1

This one by far was the simplest:

import pandas as pd

data = [['New York Yankees', 'Acevedo Juan', 900000, 'Pitcher'], 
        ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
        ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
        ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

data = pd.DataFrame(data)

now, if the keys are the first list in the list of lists (data[0]), you can assign them to column headers in the dataframe like so:

import pandas as pd

data = [['key1', 'key2', key3, 'key4'], 
    ['New York Yankees', 'Anderson Jason', 300000, 'Pitcher'], 
    ['New York Yankees', 'Clemens Roger', 10100000, 'Pitcher'], 
    ['New York Yankees', 'Contreras Jose', 5500000, 'Pitcher']]

data = pd.DataFrame(data[1:], columns=data[0])

GManAsg

Posted 2018-01-05T18:40:33.767

Reputation: 11