merge 2 dataframe with Memory Error


trying to merge 2 data frame size with (13647309, 48)
and I'm using 32 memory.

df_train = train.merge(train_lag,on=['ncodpers','int_date'], how='left')

after I run this it takes too much memory.
is there a way to minimize memory usage when merging?


Posted 2019-02-14T02:26:19.030

Reputation: 455



The problem is that when you merge two dataframes, you need enough memory for both of them, plus the merged one. There is a workaround from a stackoverflow answer

What you can do is to read the first dataframe only (the smaller one) and then read the second in batches.

def preprocess(x):
    df = pd.merge(df_train, x, on=['ncodpers','int_date'], how='left')
    df.to_csv("final.csv", mode="a", header=False, index=False)

reader = pd.read_csv("train_lag.csv", chunksize=1000)

for r in reader:


