c++ - Why use max and min macro in __global__ kernel of CUDA not giving correct answer? -
i trying write simple cuda function blur images. use myself defined max , min macro as
#define min(a, b) ((float)a > (float)b)? (float)b: (float)a #define max(a, b) ((float)a > (float)b)? (float)a: (float)b
the part of __global__
kernel is:
float norm; float sum = 0;// when filter exceed border, norm affect! int center = radius * filterwidth + radius; int imgx = 0, imgy = 0; (int y = -radius; y <= radius; y++) { (int x = -radius; x <= radius; x++) { imgx = min(max(x + absolute_image_position_x, 0), numcols-1); //imgx = min(numcols - 1, imgx); imgy = min(max(y + absolute_image_position_y, 0), numrows -1); //imgy = min(numrows-1, imgy); sum += (float) inputchannel[(imgy*numcols) + imgx] * filter[center + (y*filterwidth) + x]; } } outputchannel[pos] = (unsigned char) sum;
but min , max can not give correct answer when tried debug. example, min(max(10,0),100) give 100.0f! did not check each step why wrong. later changed cuda math functions, results became right. has idea. there restriction in use of macro in cuda kernel?
getting rid of (float)
clear clutter, macros this:
#define min(a, b) (a > b)? b: #define max(a, b) (a > b)? a: b
and example use (simplifying few variable names):
imgx = min(max(x + aipx, 0), nc-1);
will expand to:
imgx = ((x + aipx > 0)? x + aipx: 0 > nc-1)? nc-1: (x + aipx > 0)? x + aipx: 0;
perhaps getting parsed incorrectly? try putting parens around use of macros' arguments:
#define min(a, b) ((a) > (b))? (b): (a) #define max(a, b) ((a) > (b))? (a): (b)
Comments
Post a Comment