Python Pandas Create New Column with Groupby().Sum() -
trying create new column groupby calculation. in code below, correct calculated values each date (see group below) when try create new column (df['data4']) nan. trying create new column in dataframe sum of 'data3' dates , apply each date row. example, 2015-05-08 in 2 rows (total 50+5 = 55) , in new column have 55 in both of rows.
import pandas pd import numpy np pandas import dataframe df = pd.dataframe({'date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'data2': [11, 8, 10, 15, 110, 60, 100, 40],'data3': [5, 8, 6, 1, 50, 100, 60, 120]}) group = df['data3'].groupby(df['date']).sum() df['data4'] = group
you want use transform return series index aligned df can add new column:
in [74]: df = pd.dataframe({'date': ['2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05', '2015-05-08', '2015-05-07', '2015-05-06', '2015-05-05'], 'sym': ['aapl', 'aapl', 'aapl', 'aapl', 'aaww', 'aaww', 'aaww', 'aaww'], 'data2': [11, 8, 10, 15, 110, 60, 100, 40],'data3': [5, 8, 6, 1, 50, 100, 60, 120]}) df['data4'] = df['data3'].groupby(df['date']).transform('sum') df out[74]: data2 data3 date sym data4 0 11 5 2015-05-08 aapl 55 1 8 8 2015-05-07 aapl 108 2 10 6 2015-05-06 aapl 66 3 15 1 2015-05-05 aapl 121 4 110 50 2015-05-08 aaww 55 5 60 100 2015-05-07 aaww 108 6 100 60 2015-05-06 aaww 66 7 40 120 2015-05-05 aaww 121
Comments
Post a Comment