python - pandas groupby with two key -
i took whole afternoon trying implement task failed ,i've got pandas data frame this
columns=[ka,kb_1,kb_2,timeofevent,timeinterval] 0:'3m' '2345' '2345' '2014-10-5',3000 1:'3m' '2958' '2152' '2015-3-22',5000 2:'ge' '2183' '2183' '2012-12-31',515 3:'3m' '2958' '2958' '2015-3-10',395 4:'ge' '2183' '2285' '2015-4-19',1925 5:'ge' '2598' '2598' '2015-3-17',1915
what implemented new data frame grouped "ka , kb_1" below
columns=[ka,kb,errornum,errorrate,totalnum of records] '3m','2345',0,0%,1 '3m','2958',1,50%,2 'ge','2183',1,50%,2 'ge','2598',0,0%,1
(definition of error record: when kb_1!=kb_2,the corresponding record treated abnormal record)
my code this
df['iserror'] = (df['kb_1'] != df['kb_2']).astype('int') grouped2 = df.groupby(['ka', 'kb_1']) df_rst = pd.dataframe() df_rst['ka'] =grouped2['ka'].all() df_rst['kb_1'] = grouped2['kb_1'].all() df_rst['errornum'] = grouped2['iserror'].transform(sum) df_rst['totalnum of records'] = grouped2.size() df_rst['soll_neq_letzt_error_rate'] = df_rst['errornum'].astype('float').div(df_rst['totalnum'].astype('float'), axis='index') df_rst.to_csv('rst.csv',index=false)
but result not wanted.
for instance, column kb_1 becomes true/false, , errornum becomes nan. can explain why , give workable implementation? thanks
i'm not sure did, don't think far off.
df2 = df.groupby(['ka','kb_1'])['iserror'].agg({ 'errornum': 'sum', 'recordnum': 'count' }) df2['errorrate'] = df2['errornum'] / df2['recordnum'] recordnum errornum errorrate ka kb_1 3m 2345 1 0 0.0 2958 2 1 0.5 ge 2183 2 1 0.5 2598 1 0 0.0
Comments
Post a Comment