python - Pandas read csv out of memory -
i try manipulate large csv file using pandas, when wrote this
df = pd.read_csv(strfilename,sep='\t',delimiter='\t')
it raises "pandas.parser.cparsererror: error tokenizing data. c error: out of memory" wc -l indicate there 13822117 lines, need aggregate on csv file data frame, there way handle other split csv several files , write codes merge results? suggestions on how that?
the input this:
columns=[ka,kb_1,kb_2,timeofevent,timeinterval] 0:'3m' '2345' '2345' '2014-10-5',3000 1:'3m' '2958' '2152' '2015-3-22',5000 2:'ge' '2183' '2183' '2012-12-31',515 3:'3m' '2958' '2958' '2015-3-10',395 4:'ge' '2183' '2285' '2015-4-19',1925 5:'ge' '2598' '2598' '2015-3-17',1915
and desired output this:
columns=[ka,kb,errornum,errorrate,totalnum of records] '3m','2345',0,0%,1 '3m','2958',1,50%,2 'ge','2183',1,50%,2 'ge','2598',0,0%,1
if data set small, below code used provided another
df2 = df.groupby(['ka','kb_1'])['iserror'].agg({ 'errornum': 'sum', 'recordnum': 'count' }) df2['errorrate'] = df2['errornum'] / df2['recordnum'] ka kb_1 recordnum errornum errorrate 3m 2345 1 0 0.0 2958 2 1 0.5 ge 2183 2 1 0.5 2598 1 0 0.0
(definition of error record: when kb_1!=kb_2,the corresponding record treated abnormal record)
based on snippet in out of memory error when reading csv file in chunk, when reading line-by-line.
i assume kb_2
error indicator,
groups={} open("data/petajoined.csv", "r") large_file: line in large_file: arr=line.split('\t') #assuming structure: ka,kb_1,kb_2,timeofevent,timeinterval k=arr[0]+','+arr[1] if not (k in groups.keys()) groups[k]={'record_count':0, 'error_sum': 0} groups[k]['record_count']=groups[k]['record_count']+1 groups[k]['error_sum']=groups[k]['error_sum']+float(arr[2]) k,v in groups.items: print ('{group}: {error_rate}'.format(group=k,error_rate=v['error_sum']/v['record_count']))
this code snippet stores groups in dictionary, , calculates error rate after reading entire file.
it encounter out-of-memory exception, if there many combinations of groups.
Comments
Post a Comment