0

The problem I am having is that kafka does not use the environment variables passed in from Docker-compose. It just uses the default server.properties file. I know this is by design (or lack thereof).. but why bother passing in the environment variables if they aren't going to be used? I have scoured the web as well as other docker image sources and it looks like confluent has code that reads the environment variables and then builds a custom server.properties file. Seems like a huge duplication of effort! Am I missing something?

Goal: I want to pass basic config vars from docker-compose.yml to my custom docker image (it needs to be rhel8-based) and have it use the config. For example, cluster ID, node ID, storage path, ports, just the basics needed to run Kafka. Why is this so hard? What am I doing wrong?

I've managed to take our RHEL8 ubi8-jdk21 minimal docker image and create a myproject/kafka image from it:

In the Dockerfile, I download the latest kafka release, then unzip it to e.g. /opt/kafka:

FROM myproject/ubi-openjdk21:ubi8
...

RUN curl -L ${DOWNLOAD_URL} -o kafka_${KAFKA_VERSION_LONG}.tgz
&& curl -L ${VERIFY_URL} -o kafka_${KAFKA_VERSION_LONG}.tgz.sha512
&& sha512sum kafka_${KAFKA_VERSION_LONG}.tgz > zip.checksum
&& sha512sum -c zip.checksum > checksum.result

#TODO finish the checksum validation process; grep for OK in the result file

RUN tar -xzf kafka_${KAFKA_VERSION_LONG}.tgz
&& mv kafka_${KAFKA_VERSION_LONG} kafka

...

Then in the entrypoint.sh script I know I need to call

./bin/kafka-storage.sh format --standalone -t "$CLUSTER_ID" -c config/server.properties
./bin/kafka-server-start.sh config/server.properties

I am attempting to re-use the cluster set up from the apache kafka site: https://hub.docker.com/r/apache/kafka

Sample docker-compose.yml:

services:
  controller-1:
    image: apache/kafka:latest
    container_name: controller-1
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: controller
      KAFKA_LISTENERS: CONTROLLER://:9093
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0

controller-2: image: apache/kafka:latest container_name: controller-2 environment: KAFKA_NODE_ID: 2 KAFKA_PROCESS_ROLES: controller KAFKA_LISTENERS: CONTROLLER://:9093 KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0

controller-3: image: apache/kafka:latest container_name: controller-3 environment: KAFKA_NODE_ID: 3 KAFKA_PROCESS_ROLES: controller KAFKA_LISTENERS: CONTROLLER://:9093 KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0

broker-1: image: apache/kafka:latest container_name: broker-1 ports: - 29092:9092 environment: KAFKA_NODE_ID: 4 KAFKA_PROCESS_ROLES: broker KAFKA_LISTENERS: 'PLAINTEXT://:19092,PLAINTEXT_HOST://:9092' KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://broker-1:19092,PLAINTEXT_HOST://localhost:29092' KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 depends_on: - controller-1 - controller-2 - controller-3

... (see link for the full file)

But I need to pass the storage dir and cluster ID in. I tweaked the docker-compose.yml:

...

broker-3: image: myproject/kafka:${CONTAINER_VERSION} container_name: broker-3 restart: on-failure:5 ports: - 49192:9092 environment: KAFKA_NODE_ID: 6 KAFKA_LOG_DIRS: '/storage_dir' KAFKA_PROCESS_ROLES: broker

...

  CLUSTER_ID: '${CUSTER_ID}'
depends_on:
  - controller-1
  - controller-2
  - controller-3
volumes:
  - kafka_data_6: '/storage_dir'

volumes: kafka_data_1: kafka_data_2: ...

1 Answers1

0

I couldn't find what I was looking for; I don't think it exists. So basically I am parsing ENV and writing to the kafka config in my entrypoint.sh:

...

STANDALONE=

overwrite the default config every boot

rm -f config/server.properties

loop thru all environment variables starting with KAFKA_

for var in "${!KAFKA_@}"; do # make lowercase svar=${var,,} # get rid of the "kafka_" prefix svar=${svar/kafka_/} # change underscores to dots svar=${svar//_/.}

# for controller nodes, we want to pass the standalone flag.. but not for broker nodes. (??)
if [[ ${!var} = "controller" ]]; then
    STANDALONE=--standalone
fi

# debug
printf '%s=%s\n' "$svar" "${!var}"

# update the generated config file
printf '%s=%s\n' "$svar" "${!var}" >> config/server.properties

done

./bin/kafka-storage.sh format $STANDALONE -t "$MYCLUSTERID" -c config/server.properties ./bin/kafka-server-start.sh config/server.properties

My understanding of kafka is basic, and of docker is just a bit better than that, so I may be doing all kinds of stuff wrong here. Evidenced by the fact that my new kafka cluster seems unstable! But at least it is kinda up and running. I will update this question as I make progress.