I am trying to understand why the following occurred:
- I have a Docker container with Yarn and Spark running fine except that the timestamp of that container was minus X hours of what I wanted it to be. So when I was running
dateit was returning a timestamp minus X hours of the current timestamp. - Managed to fix the above by passing a TZ environment variable in the
docker runcommand, so when I typedateI get the correct timestamp. - However, when I run
spark-submit(cluster mode is yarn) applications in YARN, the timestamp in the AM logs is still the wrong one(minus X hours). - Managed to fix the above by passing a timezone setting for the JVM in
spark-submit:-conf 'spark.executor.extraJavaOptions=-Duser.timezone'and-conf 'spark.driver.extraJavaOptions=-Duser.timezone'. - This tells me that there was an issue with the JVM YARN uses. However when tried to get the datetime from SparkScala shell it was returning the correct time(using
system.currenttimemillis()) without specifying any JVM settings from step 4.
Questions
- How can I tell what JVM is being used at container launch from YARN Application Master and what JVM at SparkScala shell?
- Why are there different timestamps when running in shell/bash and spark-submit ?