python - Is there a way to parallelize Pandas' Append method? -


i have 100 xls files combine single csv file. there way improve speed of combining them together?

this issue using concat lacks arguments to_csv affords me:

listoffiles = glob.glob(file_location) frame = pd.dataframe() idx, a_file in enumerate(listoffiles):     print a_file     data = pd.read_excel(a_file, sheetname=0, skiprows=range(1,2), header=1)      frame = frame.append(data)  # save csv.. print frame.info() frame.to_csv(output_dir, index=false, encoding='utf-8', date_format="%y-%m-%d") 

using multiprocessing, read them in parallel using like:

import multiprocessing import pandas pd  dfs = multiprocessing.pool().map(df.read_excel, f_names) 

and concatenate them single one:

df = pd.concat(dfs) 

you should check if first part @ faster

dfs = map(df.read_excel, f_names) 

ymmv - depends on files, disks, etc.


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -