The problem I am having is that kafka does not use the environment variables passed in from Docker-compose. It just uses the default server.properties file. I know this is by design (or lack thereof).. but why bother passing in the environment variables if they aren't going to be used? I have scoured the web as well as other docker image sources and it looks like confluent has code that reads the environment variables and then builds a custom server.properties file. Seems like a huge duplication of effort! Am I missing something?
Goal: I want to pass basic config vars from docker-compose.yml to my custom docker image (it needs to be rhel8-based) and have it use the config. For example, cluster ID, node ID, storage path, ports, just the basics needed to run Kafka. Why is this so hard? What am I doing wrong?
I've managed to take our RHEL8 ubi8-jdk21 minimal docker image and create a myproject/kafka image from it:
In the Dockerfile, I download the latest kafka release, then unzip it to e.g. /opt/kafka:
FROM myproject/ubi-openjdk21:ubi8
...
RUN curl -L ${DOWNLOAD_URL} -o kafka_${KAFKA_VERSION_LONG}.tgz
&& curl -L ${VERIFY_URL} -o kafka_${KAFKA_VERSION_LONG}.tgz.sha512
&& sha512sum kafka_${KAFKA_VERSION_LONG}.tgz > zip.checksum
&& sha512sum -c zip.checksum > checksum.result
#TODO finish the checksum validation process; grep for OK in the result file
RUN tar -xzf kafka_${KAFKA_VERSION_LONG}.tgz
&& mv kafka_${KAFKA_VERSION_LONG} kafka
...
Then in the entrypoint.sh script I know I need to call
./bin/kafka-storage.sh format --standalone -t "$CLUSTER_ID" -c config/server.properties
./bin/kafka-server-start.sh config/server.properties
I am attempting to re-use the cluster set up from the apache kafka site: https://hub.docker.com/r/apache/kafka
Sample docker-compose.yml:
services:
controller-1:
image: apache/kafka:latest
container_name: controller-1
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: controller
KAFKA_LISTENERS: CONTROLLER://:9093
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
controller-2:
image: apache/kafka:latest
container_name: controller-2
environment:
KAFKA_NODE_ID: 2
KAFKA_PROCESS_ROLES: controller
KAFKA_LISTENERS: CONTROLLER://:9093
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
controller-3:
image: apache/kafka:latest
container_name: controller-3
environment:
KAFKA_NODE_ID: 3
KAFKA_PROCESS_ROLES: controller
KAFKA_LISTENERS: CONTROLLER://:9093
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
broker-1:
image: apache/kafka:latest
container_name: broker-1
ports:
- 29092:9092
environment:
KAFKA_NODE_ID: 4
KAFKA_PROCESS_ROLES: broker
KAFKA_LISTENERS: 'PLAINTEXT://:19092,PLAINTEXT_HOST://:9092'
KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://broker-1:19092,PLAINTEXT_HOST://localhost:29092'
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@controller-1:9093,2@controller-2:9093,3@controller-3:9093
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
depends_on:
- controller-1
- controller-2
- controller-3
... (see link for the full file)
But I need to pass the storage dir and cluster ID in. I tweaked the docker-compose.yml:
...
broker-3:
image: myproject/kafka:${CONTAINER_VERSION}
container_name: broker-3
restart: on-failure:5
ports:
- 49192:9092
environment:
KAFKA_NODE_ID: 6
KAFKA_LOG_DIRS: '/storage_dir'
KAFKA_PROCESS_ROLES: broker
...
CLUSTER_ID: '${CUSTER_ID}'
depends_on:
- controller-1
- controller-2
- controller-3
volumes:
- kafka_data_6: '/storage_dir'
volumes:
kafka_data_1:
kafka_data_2:
...