0

Simple question, but so far very difficult to answer... =-[

I am trying to deploy OpenShift (OKD) 4.5 or 4.7 as directed here Guide: Installing an OKD 4.5 Cluster. Look at the "Starting the control plane nodes" section.

I'm trying to create the cluster using an UPI (User Provisioned Infrastructure)/Bare Metal (KVM).

PROBLEM:

  • Version 4.5

The master node cannot finish installation after reboot. It keeps showing the following error...

[ 1304.254380] ignition[485]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #92
[ 1314.264629] ignition[485]: GET error: Get "https://api-int.mbr.okd.local:22623/config/master": net/http: timeout awaiting response headers

For version 4.5 we use "Fedora CoreOS 32.20200715.3.0".

  • Version 4.7

The master node cannot finish installation after reboot. It keeps showing the following error...

[  543.933709] ignition[505]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #112
[  543.939340] ignition[505]: GET error: Get "https://api-int.mbr.okd.loca1:22623/config/master": EOF

For version 4.7 we use "Fedora CoreOS 34.20210518.3.0".


I've waited hours and hours and the master nodes are still in the same situation. What can I do to resolve this?

Thanks! =D


MORE INFORMATION:

See if this helps...

This output occurs in okd_master_3 (10.3.0.7)....

[ 1304.254380] ignition[485]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #92
[ 1314.264629] ignition[485]: GET error: Get "https://api-int.mbr.okd.local:22623/config/master": net/http: timeout awaiting response headers

Connecting okd_master_2 (10.3.0.6) from okd_services (10.3.0.14)...

NOTE: The okd_master_2 (10.3.0.6) was able to boot (reached login screen).

[root@okd_services ~]# ssh core@10.3.0.6
The authenticity of host '10.3.0.6 (10.3.0.6)' can't be established.
ECDSA key fingerprint is SHA256:1xdq65g0ljnZYR6uXHaXW6EsxO3u6X268s4Z9Kfq0ng.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '10.3.0.6' (ECDSA) to the list of known hosts.
Fedora CoreOS 32.20200629.3.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/

Pinging the okd_bootstrap (10.3.0.4) from okd_master_2 (10.3.0.6)...

[core@localhost ~]$ ping 10.3.0.4
PING 10.3.0.4 (10.3.0.4) 56(84) bytes of data.
64 bytes from 10.3.0.4: icmp_seq=1 ttl=64 time=0.973 ms
64 bytes from 10.3.0.4: icmp_seq=2 ttl=64 time=0.801 ms
64 bytes from 10.3.0.4: icmp_seq=3 ttl=64 time=0.373 ms
64 bytes from 10.3.0.4: icmp_seq=4 ttl=64 time=0.647 ms
^C
--- 10.3.0.4 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3032ms
rtt min/avg/max/mdev = 0.373/0.698/0.973/0.220 ms

Calling the problematic URL from okd_master_2 (10.3.0.6)...

[core@localhost ~]$ curl -kv https://api-int.mbr.okd.local:22623/config/master
*   Trying 10.3.0.14:22623...
* Connected to api-int.mbr.okd.local (10.3.0.14) port 22623 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=api-int.mbr.okd.local
*  start date: Jun 16 23:52:22 2021 GMT
*  expire date: Jun 14 23:52:23 2031 GMT
*  issuer: OU=openshift; CN=root-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x561ed249aa40)
> GET /config/master HTTP/2
> Host: api-int.mbr.okd.local:22623
> user-agent: curl/7.69.1
> accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 500 
< content-length: 0
< date: Thu, 17 Jun 2021 14:55:43 GMT
< 
* Connection #0 to host api-int.mbr.okd.local left intact

INFRASTRUCTURE:

Virtual machines...

NAME           ROLE                   OS              IP          MAC
okd_boostrap   bootstrap              Fedora CoreOS   10.3.0.4    52:54:00:07:80:62
okd_master_1   master                 Fedora CoreOS   10.3.0.5    52:54:00:7d:97:70
okd_master_2   master                 Fedora CoreOS   10.3.0.6    52:54:00:6e:52:85
okd_master_3   master                 Fedora CoreOS   10.3.0.7    52:54:00:a3:65:d9
okd_worker_1   worker                 Fedora CoreOS   10.3.0.8    52:54:00:e3:7c:fb
okd_worker_2   worker                 Fedora CoreOS   10.3.0.9    52:54:00:20:ec:4f
okd_services   DNS/LB/web/NFS         CentOS 8        10.3.0.14   52:54:00:3a:fd:a2
                                                         10.2.0.18   52:54:00:92:ce:78
okd_pfsense    firewall/router/DHCP   FreeBSD         10.3.0.2 52:54:00:d8:27:82
                                                         10.2.0.19   52:54:00:ac:82:7d

. OKD_LAN: "10.3.0"; . EXT_LAN: "10.2.0".

Some acronyms... _ DNS - Domain Name System; _ LB - Load Balancing; _ Web - Web Server; _ NFS - Network File Sharing.

Network layout...

           ...→.[N]WAN/EXT_LAN([R]dhcp).←... (10.2.0.0/24)
           ↓                               ↓
          [I]WAN/EXT_LAN                  [I]WAN/EXT_LAN
  [V]OKD_PFSENSE([R]dhcp)                 [V]OKD_SERVICES
          [I]OKD_LAN                      [I]OKD_LAN
           ↑                               ↑
           .........→.[N]OKD_LAN.←.......... (10.3.0.0/24)
                       ↑
      ...................................
      ↓                ↓                ↓
     [V]OKD_BOOSTRAP  [V]OKD_MASTER_1  [V]OKD_WORKER_1
                      [V]OKD_MASTER_2  [V]OKD_WORKER_2
                      [V]OKD_MASTER_3

_ [N] - Network; _ [R] - Network Resource; _ [I] - Network Interface; _ [V] - Virtual Machine.


CONFIGURATION FILES:

BIND 9 (DNS):

. db.10.3.0

$TTL    604800
@   IN  SOA okd-services.okd.local. admin.okd.local. (
        6       ; Serial
        604800  ; Refresh
        86400   ; Retry
        2419200 ; Expire
        604800  ; Negative Cache TTL
)

; Name servers - "NS" records. IN NS okd-services.okd.local.

; Name servers - "PTR" records. 14 IN PTR okd-services.okd.local.

; OpenShift container platform cluster - "PTR" records. 4 IN PTR okd-boostrap.mbr.okd.local. 5 IN PTR okd-master-1.mbr.okd.local. 6 IN PTR okd-master-2.mbr.okd.local. 7 IN PTR okd-master-3.mbr.okd.local. 8 IN PTR okd-worker-1.mbr.okd.local. 9 IN PTR okd-worker-2.mbr.okd.local. 14 IN PTR api.mbr.okd.local. 14 IN PTR api-int.mbr.okd.local.

. db.okd.local

$TTL    604800
@   IN  SOA okd-services.okd.local. admin.okd.local. (
        1       ; Serial
        604800  ; Refresh
        86400   ; Retry
        2419200 ; Expire
        604800  ; Negative Cache TTL
)

; Name servers - "NS" records. IN NS okd-services

; Name servers - "A" records. okd-services.okd.local. IN A 10.3.0.14

; OpenShift container platform cluster - "A" records. okd-boostrap.mbr.okd.local. IN A 10.3.0.4 okd-master-1.mbr.okd.local. IN A 10.3.0.5 okd-master-2.mbr.okd.local. IN A 10.3.0.6 okd-master-3.mbr.okd.local. IN A 10.3.0.7 okd-worker-1.mbr.okd.local. IN A 10.3.0.8 okd-worker-2.mbr.okd.local. IN A 10.3.0.9

; Openshift internal cluster IPs - "A" records. api.mbr.okd.local. IN A 10.3.0.14 api-int.mbr.okd.local. IN A 10.3.0.14 *.apps.mbr.okd.local. IN A 10.3.0.14 etcd-0.mbr.okd.local. IN A 10.3.0.5 etcd-1.mbr.okd.local. IN A 10.3.0.6 etcd-2.mbr.okd.local. IN A 10.3.0.7 cons-okd.apps.mbr.okd.local. IN A 10.3.0.14 oauth-okd.apps.mbr.okd.local. IN A 10.3.0.14

; OpenShift internal cluster IPs - "SRV" records. _etcd-server-ssl._tcp.mbr.okd.local. 86400 IN SRV 0 10 2380 etcd-0.mbr _etcd-server-ssl._tcp.mbr.okd.local. 86400 IN SRV 0 10 2380 etcd-1.mbr _etcd-server-ssl._tcp.mbr.okd.local. 86400 IN SRV 0 10 2380 etcd-2.mbr

. named.conf.local

zone "okd.local" {
    type master;
    file "/etc/named/zones/db.okd.local"; // Zone file path.
};

zone "0.3.10.in-addr.arpa" { type master; file "/etc/named/zones/db.10.3.0"; // 10.3.0.0/24 subnet. };

. named.conf

//
// named.conf
//
// Provided by Red Hat bind package to configure the ISC BIND named(8) DNS server
// as a caching only nameserver (as a localhost DNS resolver only).
//
// See /usr/share/doc/bind*/sample/ for example named configuration files.
//
// See the BIND Administrator's Reference Manual (ARM) for details about the configuration
// located in /usr/share/doc/bind-{version}/Bv9ARM.html .

options { listen-on port 53 { 127.0.0.1; 10.3.0.14; }; directory "/var/named"; dump-file "/var/named/data/cache_dump.db"; statistics-file "/var/named/data/named_stats.txt"; memstatistics-file "/var/named/data/named_mem_stats.txt"; recursing-file "/var/named/data/named.recursing"; secroots-file "/var/named/data/named.secroots"; allow-query { localhost; 10.3.0.0/24; };

// - If you are building an AUTHORITATIVE DNS server, do NOT enable recursion.
// - If you are building a RECURSIVE (caching) DNS server, you need to enable
// recursion.
// - If your recursive DNS server has a public IP address, you MUST enable access
// control to limit queries to your legitimate users. Failing to do so will cause
// your server to become part of large scale DNS amplification attacks. Implementing
// BCP38 within your network would greatly reduce such attack surface.
recursion yes;

forwarders {
    8.8.8.8;
    8.8.4.4;
};

dnssec-enable yes;
dnssec-validation yes;

// Path to ISC DLV key.
bindkeys-file &quot;/etc/named.root.key&quot;;

managed-keys-directory &quot;/var/named/dynamic&quot;;

pid-file &quot;/run/named/named.pid&quot;;
session-keyfile &quot;/run/named/session.key&quot;;

};

logging { channel default_debug { file "data/named.run"; severity dynamic; }; };

zone "." IN { type hint; file "named.ca"; };

include "/etc/named.rfc1912.zones"; include "/etc/named.root.key"; include "/etc/named/named.conf.local";

HAProxy (load balancer):

. haproxy.cfg

#---------------------------------------------------------------------
# Global settings.
#---------------------------------------------------------------------
global
    maxconn 20000
    log /dev/log local0 info
    chroot /var/lib/haproxy
    pidfile /var/run/haproxy.pid
    user haproxy
    group haproxy
    daemon
# Turn on stats unix socket.
stats socket /var/lib/haproxy/stats

#---------------------------------------------------------------------

Common defaults that all the "listen" and "backend" sections will use if not designated

in their block.

#--------------------------------------------------------------------- defaults mode http log global option httplog option dontlognull option http-server-close option redispatch retries 3 timeout http-request 10s timeout queue 1m timeout connect 10s timeout client 300s timeout server 300s timeout http-keep-alive 10s timeout check 10s maxconn 20000

listen stats bind :9000 mode http option forwardfor except 127.0.0.0/8 stats enable stats uri /

frontend okd_k8s_api_fe bind :6443 default_backend okd_k8s_api_be mode tcp option tcplog

backend okd_k8s_api_be balance source mode tcp server okd-boostrap 10.3.0.4:6443 check server okd-master-1 10.3.0.5:6443 check server okd-master-2 10.3.0.6:6443 check server okd-master-3 10.3.0.7:6443 check

frontend okd_machine_config_server_fe bind :22623 default_backend okd_machine_config_server_be mode tcp option tcplog

backend okd_machine_config_server_be balance source mode tcp server okd-boostrap 10.3.0.4:22623 check server okd-master-1 10.3.0.5:22623 check server okd-master-2 10.3.0.6:22623 check server okd-master-3 10.3.0.7:22623 check

frontend okd_http_ingress_traffic_fe bind :80 default_backend okd_http_ingress_traffic_be mode tcp option tcplog

backend okd_http_ingress_traffic_be balance source mode tcp server okd-worker-1 10.3.0.8:80 check server okd-worker-2 10.3.0.9:80 check

frontend okd_https_ingress_traffic_fe bind *:443 default_backend okd_https_ingress_traffic_be mode tcp option tcplog

backend okd_https_ingress_traffic_be balance source mode tcp server okd-worker-1 10.3.0.8:443 check server okd-worker-2 10.3.0.9:443 check

OpenShift (OKD) "*.yaml" files:

. htpasswd_provider.yaml

apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: htpasswd_provider
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpass-secret

. install-config.yaml

apiVersion: v1
baseDomain: okd.local
metadata:
  name: mbr

compute:

  • hyperthreading: Enabled name: worker replicas: 0

controlPlane: hyperthreading: Enabled name: master replicas: 3

networking: clusterNetwork:

  • cidr: 10.128.0.0/14 hostPrefix: 23

networkType: OpenShiftSDN serviceNetwork:

  • 172.30.0.0/16

platform: none: {}

fips: false

pullSecret: '{"auths":{"fake":{"auth": "bar"}}}' sshKey: 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAA<SKIPPED>QbAKPwwhdCkTpd8= root@okd_services.my_domain.com.br'

. registry_pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: registry-pv
spec:
  capacity:
    storage: 45Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: /var/nfsshare/registry
    server: 10.3.0.14

UPDATE:

. netstat -natup output...

[root@okd_services ~]# netstat -natup
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      906/sshd            
tcp        0      0 127.0.0.1:953           0.0.0.0:*               LISTEN      929/named           
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:22623           0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:9000            0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:6443            0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1/systemd           
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 192.168.122.1:53        0.0.0.0:*               LISTEN      1742/dnsmasq        
tcp        0      0 10.3.0.14:53            0.0.0.0:*               LISTEN      929/named           
tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      929/named           
tcp        0      0 10.2.0.18:22            10.2.0.3:44536          ESTABLISHED 1854/sshd: root [pr 
tcp        0      0 10.3.0.14:52252         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:52134         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      1 10.3.0.14:42222         10.3.0.8:443            SYN_SENT    4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.6:51962          ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:52130         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.6:51946          ESTABLISHED 4572/haproxy        
tcp        0      1 10.3.0.14:40530         10.3.0.9:443            SYN_SENT    4572/haproxy        
tcp        0    196 10.2.0.18:22            10.2.0.3:44538          ESTABLISHED 5000/sshd: root [pr 
tcp        0      0 10.2.0.18:45472         10.2.0.5:389            ESTABLISHED 878/sssd_be         
tcp        0      0 10.3.0.14:51970         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:54056         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.2.0.18:33328         147.75.69.225:80        TIME_WAIT   -                   
tcp        0      0 10.3.0.14:6443          10.3.0.5:39976          ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.5:52462          ESTABLISHED 4572/haproxy        
tcp        0      1 10.3.0.14:41396         10.3.0.7:22623          SYN_SENT    4572/haproxy        
tcp        0      1 10.3.0.14:41964         10.3.0.9:80             SYN_SENT    4572/haproxy        
tcp        0      1 10.3.0.14:60674         10.3.0.7:6443           SYN_SENT    4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.5:40024          ESTABLISHED 4572/haproxy        
tcp        0      0 10.2.0.18:43394         109.205.222.4:80        TIME_WAIT   -                   
tcp6       0      0 :::22                   :::*                    LISTEN      906/sshd            
tcp6       0      0 ::1:953                 :::*                    LISTEN      929/named           
tcp6       0      0 :::111                  :::*                    LISTEN      1/systemd           
tcp6       0      0 :::8080                 :::*                    LISTEN      1131/httpd          
tcp6       0      0 :::53                   :::*                    LISTEN      929/named           
udp        0      0 192.168.122.1:53        0.0.0.0:*                           1742/dnsmasq        
udp        0      0 10.3.0.14:53            0.0.0.0:*                           929/named           
udp        0      0 127.0.0.1:53            0.0.0.0:*                           929/named           
udp        0      0 0.0.0.0:67              0.0.0.0:*                           1742/dnsmasq        
udp        0      0 10.3.0.14:68            10.3.0.2:67             ESTABLISHED 893/NetworkManager  
udp        0      0 10.2.0.18:68            10.2.0.2:67             ESTABLISHED 893/NetworkManager  
udp        0      0 0.0.0.0:111             0.0.0.0:*                           1/systemd           
udp        0      0 127.0.0.1:323           0.0.0.0:*                           857/chronyd         
udp6       0      0 :::53                   :::*                                929/named           
udp6       0      0 :::111                  :::*                                1/systemd           
udp6       0      0 ::1:323                 :::*                                857/chronyd

Thanks! =D

Eduardo Lúcio
  • 283
  • 5
  • 15

0 Answers0