YARN AM logs report different time-stamp from what is shown in terminal and sparkscala shell

Question

I am trying to understand why the following occurred:

I have a Docker container with Yarn and Spark running fine except that the timestamp of that container was minus X hours of what I wanted it to be. So when I was running date it was returning a timestamp minus X hours of the current timestamp.
Managed to fix the above by passing a TZ environment variable in the docker run command, so when I type date I get the correct timestamp.
However, when I run spark-submit (cluster mode is yarn) applications in YARN, the timestamp in the AM logs is still the wrong one(minus X hours).
Managed to fix the above by passing a timezone setting for the JVM in spark-submit: -conf 'spark.executor.extraJavaOptions=-Duser.timezone' and -conf 'spark.driver.extraJavaOptions=-Duser.timezone' .
This tells me that there was an issue with the JVM YARN uses. However when tried to get the datetime from SparkScala shell it was returning the correct time(using system.currenttimemillis()) without specifying any JVM settings from step 4.

Questions

How can I tell what JVM is being used at container launch from YARN Application Master and what JVM at SparkScala shell?
Why are there different timestamps when running in shell/bash and spark-submit ?

0 Answers0