python - Is there a way to parallelize Pandas' Append method? -

May 15, 2014

i have 100 xls files combine single csv file. there way improve speed of combining them together?

this issue using concat lacks arguments to_csv affords me:

listoffiles = glob.glob(file_location) frame = pd.dataframe() idx, a_file in enumerate(listoffiles):     print a_file     data = pd.read_excel(a_file, sheetname=0, skiprows=range(1,2), header=1)      frame = frame.append(data)  # save csv.. print frame.info() frame.to_csv(output_dir, index=false, encoding='utf-8', date_format="%y-%m-%d")

using multiprocessing, read them in parallel using like:

import multiprocessing import pandas pd  dfs = multiprocessing.pool().map(df.read_excel, f_names)

and concatenate them single one:

df = pd.concat(dfs)

you should check if first part @ faster

dfs = map(df.read_excel, f_names)

ymmv - depends on files, disks, etc.

Search This Blog

Lix

python - Is there a way to parallelize Pandas' Append method? -

Comments

Post a Comment

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - How can I echo out this array? -

javascript - IE11 incompatibility with jQuery's 'readonly'? -