How to configure Spark client running in a Docker container for two-way communication with a remote Spark cluster?

Question

spark-submit seems to require two-way communication with a remote Spark cluster in order to run jobs.

This is easy to configure between machines (10.x.x.x to 10.x.x.x and back) but becomes confusing when Docker adds an extra layer of networking (172.x.x.x through 10.x.x.x to 10.x.x.x and then back to 172.x.x.x through 10.x.x.x somehow).

Spark adds an extra layer of complexity with its SPARK_LOCAL_IP and SPARK_LOCAL_HOSTNAME configuration parameters for the client.

How should Docker networking be configured to allow this?

score 0 · Answer 1 · answered Jan 21 '19 at 07:53

You can run the docker containers with host network mode. In your compose file you can add the following config:

services:
  worker0:
    container_name: container0
    cpuset: 0-4
    entrypoint: /entrypoint.sh
    environment:
        - SPARK_MASTER=10.34.221.247
        - RAM=16g
        - NUM_WORKERS=5
        - SHUFFLE_PORT=7338
    expose:
        - 7000-64000
    image: 10.47.7.214/spark-worker
    mem_limit: 16g
    network_mode: bridge
    tty: true

Though I am facing issues with this config still. The jobs starts normally but eventually the docker driver fails to connect to the executors. You can atleast try this for now.

How to configure Spark client running in a Docker container for two-way communication with a remote Spark cluster?

1 Answers1