-1

I've working on (and off) a deployment of Openstack over the past few months (nearly a year), and I've come across a number of issues during the deployment, most of which was either bad switch configuration, or a bad configuration on the heat templates.

I've been able to complete a successful deployment of Openstack multiple times with a fresh deployment, however as I was preparing the Overcloud with projects, I was unable to create an instance. From the output of "compute service list":

openstack compute service list
+----+----------------+----------------------+----------+---------+-------+----------------------------+
| ID | Binary         | Host                 | Zone     | Status  | State | Updated At                 |
+----+----------------+----------------------+----------+---------+-------+----------------------------+
|  1 | nova-conductor | controller-0.host.cp | internal | enabled | up    | 2021-04-20T20:43:03.000000 |
|  2 | nova-scheduler | controller-0.host.cp | internal | enabled | up    | 2021-04-20T20:43:01.000000 |
| 12 | nova-compute   | compute-0.host.cp    | nova     | enabled | down  | 2021-04-20T09:47:52.000000 |
+----+----------------+----------------------+----------+---------+-------+----------------------------+

I've also noticed that I attempted a scale out with one additional node, but it's not present in the list above, or in the "hypervisor list", but it is visible from a "server list" from the undercloud node:

openstack server list
+--------------------------------------+--------------+--------+-----------------------+----------------+-----------+
| ID                                   | Name         | Status | Networks              | Image          | Flavor    |
+--------------------------------------+--------------+--------+-----------------------+----------------+-----------+
| 5cb29129-7ce8-439a-b00b-3868d5a9aa74 | compute-1    | ACTIVE | ctlplane=10.128.0.136 | overcloud-full | baremetal |
| 58c3d587-d2a8-4601-87a7-3fd3d32a78b6 | controller-0 | ACTIVE | ctlplane=10.128.0.5   | overcloud-full | baremetal |
| 288dde8f-5664-42b2-b9f4-333992964dde | compute-0    | ACTIVE | ctlplane=10.128.0.75  | overcloud-full | baremetal |
+--------------------------------------+--------------+--------+-----------------------+----------------+-----------+

I've carried out 2 fresh installs, and I'm now faced with the following issue for all compute services that are intended to connect to the Controller node:

2021-04-23 22:28:37.891 7 ERROR nova keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.127.2.8:5000/v3/auth/tokens: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

A manual curl from the compute node to the keystone endpoint yields the following (expected) output:

curl http://10.127.2.8:5000/v3/auth/tokens
{"error":{"code":401,"message":"The request you have made requires authentication.","title":"Unauthorized"}}

I don't believe that this is something in the network stack that's causing this issue, and is instead something else. I'd appreciate any assistance with this.

Deployment Information: Controller Nodes = 1 Compute nodes = 2 deployed, 4 introspected OS = CentOS Steam 8 (both undercloud and overcloud) Networking:

  • 4 Interfaces: 1 primary, 2 port bond (OVS + LACP), 1 storage port
  • 2 Juniper EX3400's clustered (LACP configured on bonded ports)

Let me know if any further information is required.

EDIT:

Here is a TCP dump from both Compute and Controller, outlining the transaction of the call to keystone: https://pastebin.com/ADT4RCun

1 Answers1

0

After looking over the TCP dump that I had added to the question, I noticed that all requests below a length of 1500 were successful, anything above was dropped.

As part of the nic configuration I had the MTU set to 9000, so all the interfaces from the perspective of the servers had this set. The configuration was never set on the Switches.

From the nic configs:
    - type: ovs_bridge
      name: bridge_name
      dns_servers:
        get_param: DnsServers
      members:
      - type: ovs_bond
        name: bond0
        mtu: 9000

From the switch side: > show interfaces ae5
Physical interface: ae5, Enabled, Physical link is Up Interface index: 231, SNMP ifIndex: 713 Link-level type: Ethernet, MTU: 1514, Speed: 2Gbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled, Flow control: Disabled, Minimum links needed: 1, Minimum bandwidth needed: 1bps

After increasing the MTU on the Aggregates, the response from the keystone service was successfully received. Now I'll need to make sure that this is the case for all non aggregate ports on the switch as well.