python - Is there a way to parallelize Pandas' Append method? -
i have 100 xls files combine single csv file. there way improve speed of combining them together?
this issue using concat lacks arguments to_csv affords me:
listoffiles = glob.glob(file_location) frame = pd.dataframe() idx, a_file in enumerate(listoffiles): print a_file data = pd.read_excel(a_file, sheetname=0, skiprows=range(1,2), header=1) frame = frame.append(data) # save csv.. print frame.info() frame.to_csv(output_dir, index=false, encoding='utf-8', date_format="%y-%m-%d")
using multiprocessing, read them in parallel using like:
import multiprocessing import pandas pd dfs = multiprocessing.pool().map(df.read_excel, f_names)
and concatenate them single one:
df = pd.concat(dfs)
you should check if first part @ faster
dfs = map(df.read_excel, f_names)
ymmv - depends on files, disks, etc.
Comments
Post a Comment