python - Pandas groupby: apply vs agggregate with missing categories -

May 15, 2010

i'm running issue panda's groupby.apply , groupby.aggregate give different-shaped results when categorical data has missing values. aggregate retains "known" categories, apply keeps categories present in data.

here's simplified example:

import pandas pd import numpy np   # `missing` has 'b' category no data uses it. missing = pd.categorical(list('aaa'), categories=['a', 'b']) dense = pd.categorical(list('abc')) values = np.arange(len(dense)) df = pd.dataframe({'missing': missing, 'dense': dense, 'values': values})  grouped = df.groupby(['missing', 'dense']) print grouped.mean() print grouped.agg(np.mean) print grouped.apply(lambda chunk: np.mean(chunk))

which prints

            values missing dense                 0         b           1         c           2 b               nan         b         nan         c         nan             values missing dense                 0         b           1         c           2 b               nan         b         nan         c         nan             values missing dense                 0         b           1         c           2

note last data frame missing nan rows missing = b. understand why apply might (it chooses not pass group full of nans reduction function). above snippet toy example: need use apply result want.

question: what's best way use apply create output shape matching 1 returned aggregate?

this in need of enhancement pull-request this, see here.

in general, should use use .mean() idiomatic way (and faster).

Search This Blog

Lix

python - Pandas groupby: apply vs agggregate with missing categories -

Comments

Post a Comment

Popular posts from this blog

javascript - three.js lot of meshes optimization -

smartface.io - Proper way to change color scheme for whole application -

Email notification in google apps script -