Reading Large CSV File in Python Panda -
i have large data set 4 gb in csv format. not need whole data set, need specific column. possible read specific column instead of reading whole data set using python panda? increase speed of reading file?
thank in advance suggestion.
if have 4 gb of memory, don't worry (the time take program less memory intensive solution not worth it). read entire dataset in using pd.read_csv
, subset column need. if don't have enough memory , need read file line line (i.e. row row), modify this code keep column of interest in memory.
if have plenty of memory , problem have multiple files in format, recommend using multiprocessing
package parallelize task.
from muliprocessing import pool pool = pool(processes = your_processors_n) dataframeslist = pool.map(your_regular_expression_readin_func, [df1, df2, ... dfn])
Comments
Post a Comment