jvm - datastax : Spark job fails : Removing BlockManager with no recent heart beats -
im using datastax-4.6. have created cassandra table , stored 2crore records. im trying read data using scala. code works fine few records when try retrieve 2crore records displays me follwing error.
**warn blockmanagermasteractor: removing blockmanager blockmanagerid(1, 172.20.98.17, 34224, 0) no recent heart beats: 140948ms exceeds 45000ms 15/05/15 19:34:06 error connectionmanager: corresponding sendingconnection connectionmanagerid(c15759,34224) not found**
any help?
this problem tied gc pressure
tuning timeouts
increase spark.storage.blockmanagerheartbeatms spark waits gc pause end.
spark-734 recommends setting -dspark.worker.timeout=30000 -dspark.akka.timeout=30000 -dspark.storage.blockmanagerheartbeatms=30000 -dspark.akka.retry.wait=30000 -dspark.akka.framesize=10000
tuning jobs jvm
spark.cassandra.input.split.size - allow change level of parallelization of cassandra reads. bigger split sizes mean more data have reside in memory @ same time.
spark.storage.memoryfraction , spark.shuffle.memoryfraction - amount of heap occupied rdds (as opposed shuffle memory , spark overhead). if aren't doing shuffles, increase value. databricks guys make similar in size size of oldgen.
spark.executor.memory - depends on hardware. per databricks can 55gb. make sure leave enough ram c* , os , os page cache. remember long gc pauses happen on larger heaps.
out of curiosity, going extracting entire c* table spark? what's use case?
Comments
Post a Comment