python - Pandas groupby: apply vs agggregate with missing categories -


i'm running issue panda's groupby.apply , groupby.aggregate give different-shaped results when categorical data has missing values. aggregate retains "known" categories, apply keeps categories present in data.

here's simplified example:

import pandas pd import numpy np   # `missing` has 'b' category no data uses it. missing = pd.categorical(list('aaa'), categories=['a', 'b']) dense = pd.categorical(list('abc')) values = np.arange(len(dense)) df = pd.dataframe({'missing': missing, 'dense': dense, 'values': values})  grouped = df.groupby(['missing', 'dense']) print grouped.mean() print grouped.agg(np.mean) print grouped.apply(lambda chunk: np.mean(chunk)) 

which prints

            values missing dense                 0         b           1         c           2 b               nan         b         nan         c         nan             values missing dense                 0         b           1         c           2 b               nan         b         nan         c         nan             values missing dense                 0         b           1         c           2     

note last data frame missing nan rows missing = b. understand why apply might (it chooses not pass group full of nans reduction function). above snippet toy example: need use apply result want.

question: what's best way use apply create output shape matching 1 returned aggregate?

this in need of enhancement pull-request this, see here.

in general, should use use .mean() idiomatic way (and faster).


Comments

Popular posts from this blog

Email notification in google apps script -

c++ - Difference between pre and post decrement in recursive function argument -

javascript - IE11 incompatibility with jQuery's 'readonly'? -