jvm - datastax : Spark job fails : Removing BlockManager with no recent heart beats -


im using datastax-4.6. have created cassandra table , stored 2crore records. im trying read data using scala. code works fine few records when try retrieve 2crore records displays me follwing error.

 **warn blockmanagermasteractor: removing blockmanager blockmanagerid(1, 172.20.98.17, 34224, 0) no recent heart beats: 140948ms exceeds 45000ms  15/05/15 19:34:06 error connectionmanager: corresponding sendingconnection connectionmanagerid(c15759,34224) not found** 

any help?

this problem tied gc pressure

tuning timeouts

increase spark.storage.blockmanagerheartbeatms spark waits gc pause end.

spark-734 recommends setting -dspark.worker.timeout=30000 -dspark.akka.timeout=30000 -dspark.storage.blockmanagerheartbeatms=30000 -dspark.akka.retry.wait=30000 -dspark.akka.framesize=10000

tuning jobs jvm

spark.cassandra.input.split.size - allow change level of parallelization of cassandra reads. bigger split sizes mean more data have reside in memory @ same time.

spark.storage.memoryfraction , spark.shuffle.memoryfraction - amount of heap occupied rdds (as opposed shuffle memory , spark overhead). if aren't doing shuffles, increase value. databricks guys make similar in size size of oldgen.

spark.executor.memory - depends on hardware. per databricks can 55gb. make sure leave enough ram c* , os , os page cache. remember long gc pauses happen on larger heaps.

out of curiosity, going extracting entire c* table spark? what's use case?


Comments

Popular posts from this blog

c++ - Difference between pre and post decrement in recursive function argument -

php - Nothing but 'run(); ' when browsing to my local project, how do I fix this? -

php - How can I echo out this array? -