c++ - Performance of runif -

April 15, 2011

i working on custom bootstrap algorithm specific problem, , want large number of replicates care performance. in regard, have questions on how use runif properly. i'm aware run benchmarks myself, c++ optimization tends difficult , understand reasons difference.

first question:

is first code block faster second?

for (int = 0; < n_boot; i++) {   new_random = runif(n);  //new_random pre-allocated in class   // random numbers }

for (int = 0; < n_boot; i++) {   numericvector new_random = runif(n);   // random numbers }

it comes down whether runif fills left side or if allocates , passes new numericvector.

second question:

if both versions allocate new vector, can improve things generating 1 random number @ time in scalar mode?

in case wondering, memory allocation takes sizable part of processing time. have reduced runtime 30% optimizing other unnecessary memory allocations away, matter.

i set following struct try represent scenario accurately & facilitate benchmarking:

#include <rcpp.h> // [[rcpp::plugins(cpp11)]]  struct runif_test {    size_t runs;   size_t each;    runif_test(size_t runs, size_t each)   : runs(runs), each(each)   {}   // first code block   void pre_init() {     rcpp::numericvector v = no_init();     (size_t = 0; < runs; i++) {       v = rcpp::runif(each);     }   }   // second code block   void post_init() {     (size_t = 0; < runs; i++) {       rcpp::numericvector v = rcpp::runif(each);     }   }   // generate 1 draw @ time     void gen_runif() {     rcpp::numericvector v = no_init();     (size_t = 0; < runs; i++) {       std::generate_n(v.begin(), each, []() -> double {         return rcpp::as<double>(rcpp::runif(1));       });     }   }   // reduce overhead of pre-allocated vector   inline rcpp::numericvector no_init() {     return rcpp::numericvector(rcpp::no_init_vector(each));   }  };

where benchmarked following exported functions:

// [[rcpp::export]] void do_pre(size_t runs, size_t each) {   runif_test obj(runs, each);   obj.pre_init(); }  // [[rcpp::export]] void do_post(size_t runs, size_t each) {   runif_test obj(runs, each);   obj.post_init(); }  // [[rcpp::export]] void do_gen(size_t runs, size_t each) {   runif_test obj(runs, each);   obj.gen_runif(); }

here results got:

r>  microbenchmark::microbenchmark(     do_pre(100, 10e4)     ,do_post(100, 10e4)     ,do_gen(100, 10e4)     ,times=100l) unit: milliseconds                  expr      min       lq      mean   median        uq       max neval   do_pre(100, 100000) 109.9187 125.0477  145.9918 136.3749  152.9609  337.6143   100  do_post(100, 100000) 103.1705 117.1109  132.9389 130.4482  142.7319  204.0951   100   do_gen(100, 100000) 810.5234 911.3586 1005.9438 986.8348 1062.7715 1501.2933   100

r>  microbenchmark::microbenchmark(     do_pre(100, 10e5)     ,do_post(100, 10e5)     ,times=100l) unit: seconds                   expr      min       lq     mean   median       uq      max neval   do_pre(100, 1000000) 1.355160 1.614972 1.740807 1.723704 1.815953 2.408465   100  do_post(100, 1000000) 1.198667 1.342794 1.443391 1.429150 1.519976 2.042511   100

so, assuming interpreted / accurately represented second question,

if both versions allocate new vector, can improve things generating 1 random number @ time in scalar mode?

with gen_runif() member function, think can confidently not optimal approach - ~ 7.5x slower other 2 functions.

more importantly, address first question, seems little faster initialize & assign new numericvector output of rcpp::runif(n). i'm no c++ expert, believe second method (assigning new local object) faster first because of copy elision. in second case, looks though 2 objects being created - object on left of =, v, , (temporary? rvalue?) object on right side of =, result of rcpp::runif(). in reality though, compiler optimize unnecessary step out - think explained in passage article linked:

when nameless temporary, not bound references, moved or copied object of same type ... copy/move omitted. when temporary constructed, constructed directly in storage otherwise moved or copied to.

this was, @ least, how interpreted results. more well-versed in language can confirm / deny / correct conclusion.

Search This Blog

Lix

c++ - Performance of runif -

Comments

Post a Comment

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

javascript - IE11 incompatibility with jQuery's 'readonly'? -

php - How can I echo out this array? -