statistics - Is is possible to real time update a value with spark streaming? -

January 15, 2014

lets assume have stream of double values , want compute average every ten seconds. how can have sliding window doesn't need recompute average instead update by, lets say, removing part of oldest ten seconds , adding new 10 seconds values?

tl;dr : use reducebywindow both of function arguments (jump last paragraph code snippet)

there's 2 interpretations of question, specific 1 (how running mean 1 hour, updated every 2 seconds), , general 1 (how computation updates state in sparse way). here's answer general one.

first, notice there way represent data such average-with-updates easy compute, based on windowed dstream: represents data incremental construction of stream, maximal sharing. less efficient, computationally, recompute mean on each batch – noted.

if want update of complex stateful computation invertible, don't want touch stream's construction, there updatestatebykey – there spark doesn't in reflecting incremental aspect of computation in stream, have manage yourself.

here, have simple , invertible, , don't have notion of keys. can use reducebywindow inverse reduction argument, using usual functions let compute incremental mean.

val myinitialdstream: dstream[float]  val mydstreamwithcount: dstream[(float, long)] =    myinitialdstream.map((x) => (x, 1l))  def addonebatchtomean(previousmean: (float, long), newbatch: (float, long)): (float, long) =    (previousmean._1 + newbatch._1, previousmean._2 + newbatch._2)  def removeonebatchtomean(previousmean: (float, long), oldbatch: (float, long)): (float, long) =    (previousmean._1 - oldbatch._1, previousmean._2 - oldbatch._2)  val runningmeans = mydstreamwithcount.reducebywindow(addonebatchtomean, removeonebatchtomean, durations.seconds(3600), duractions.seconds(2))

you stream of one-element rdds, each of contains pair (m, n) m running sum on 1h-window , n number of elements in 1h-window. return (or map to) m/n mean.

Search This Blog

Lix

statistics - Is is possible to real time update a value with spark streaming? -

Comments

Post a Comment

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

javascript - IE11 incompatibility with jQuery's 'readonly'? -

php - How can I echo out this array? -