c++ - Performance of runif -
i working on custom bootstrap algorithm specific problem, , want large number of replicates care performance. in regard, have questions on how use runif properly. i'm aware run benchmarks myself, c++ optimization tends difficult , understand reasons difference.
first question:
is first code block faster second?
for (int = 0; < n_boot; i++) { new_random = runif(n); //new_random pre-allocated in class // random numbers }
for (int = 0; < n_boot; i++) { numericvector new_random = runif(n); // random numbers }
it comes down whether runif fills left side or if allocates , passes new numericvector.
second question:
if both versions allocate new vector, can improve things generating 1 random number @ time in scalar mode?
in case wondering, memory allocation takes sizable part of processing time. have reduced runtime 30% optimizing other unnecessary memory allocations away, matter.
i set following struct
try represent scenario accurately & facilitate benchmarking:
#include <rcpp.h> // [[rcpp::plugins(cpp11)]] struct runif_test { size_t runs; size_t each; runif_test(size_t runs, size_t each) : runs(runs), each(each) {} // first code block void pre_init() { rcpp::numericvector v = no_init(); (size_t = 0; < runs; i++) { v = rcpp::runif(each); } } // second code block void post_init() { (size_t = 0; < runs; i++) { rcpp::numericvector v = rcpp::runif(each); } } // generate 1 draw @ time void gen_runif() { rcpp::numericvector v = no_init(); (size_t = 0; < runs; i++) { std::generate_n(v.begin(), each, []() -> double { return rcpp::as<double>(rcpp::runif(1)); }); } } // reduce overhead of pre-allocated vector inline rcpp::numericvector no_init() { return rcpp::numericvector(rcpp::no_init_vector(each)); } };
where benchmarked following exported functions:
// [[rcpp::export]] void do_pre(size_t runs, size_t each) { runif_test obj(runs, each); obj.pre_init(); } // [[rcpp::export]] void do_post(size_t runs, size_t each) { runif_test obj(runs, each); obj.post_init(); } // [[rcpp::export]] void do_gen(size_t runs, size_t each) { runif_test obj(runs, each); obj.gen_runif(); }
here results got:
r> microbenchmark::microbenchmark( do_pre(100, 10e4) ,do_post(100, 10e4) ,do_gen(100, 10e4) ,times=100l) unit: milliseconds expr min lq mean median uq max neval do_pre(100, 100000) 109.9187 125.0477 145.9918 136.3749 152.9609 337.6143 100 do_post(100, 100000) 103.1705 117.1109 132.9389 130.4482 142.7319 204.0951 100 do_gen(100, 100000) 810.5234 911.3586 1005.9438 986.8348 1062.7715 1501.2933 100
r> microbenchmark::microbenchmark( do_pre(100, 10e5) ,do_post(100, 10e5) ,times=100l) unit: seconds expr min lq mean median uq max neval do_pre(100, 1000000) 1.355160 1.614972 1.740807 1.723704 1.815953 2.408465 100 do_post(100, 1000000) 1.198667 1.342794 1.443391 1.429150 1.519976 2.042511 100
so, assuming interpreted / accurately represented second question,
if both versions allocate new vector, can improve things generating 1 random number @ time in scalar mode?
with gen_runif()
member function, think can confidently not optimal approach - ~ 7.5x slower other 2 functions.
more importantly, address first question, seems little faster initialize & assign new numericvector
output of rcpp::runif(n)
. i'm no c++ expert, believe second method (assigning new local object) faster first because of copy elision. in second case, looks though 2 objects being created - object on left of =
, v
, , (temporary? rvalue?) object on right side of =
, result of rcpp::runif()
. in reality though, compiler optimize unnecessary step out - think explained in passage article linked:
when nameless temporary, not bound references, moved or copied object of same type ... copy/move omitted. when temporary constructed, constructed directly in storage otherwise moved or copied to.
this was, @ least, how interpreted results. more well-versed in language can confirm / deny / correct conclusion.
Comments
Post a Comment