I am currently experiencing performance issues with the HBSDFCDriver in my OpenStack Cinder deployment. The get_volume_stats operation is consistently running longer than expected, leading to warnings about potential instability and significant delays in volume creation and other operations.
Example log entries
Here are some example log entries highlighting the problem:
2024-08-01 23:55:12.266 24 WARNING cinder.volume.manager [req-**** - - - - -] The HBSDFCDriver volume driver's get_volume_stats operation ran for 32.0 seconds. This may indicate a performance problem with the backend which can lead to instability.
2024-08-01 23:56:23.536 24 WARNING cinder.volume.manager [req-**** - - - - -] The HBSDFCDriver volume driver's get_volume_stats operation ran for 43.2 seconds. This may indicate a performance problem with the backend which can lead to instability.
2024-08-02 00:01:12.423 24 WARNING cinder.volume.manager [req-**** - - - - -] The HBSDFCDriver volume driver's get_volume_stats operation ran for 32.1 seconds. This may indicate a performance problem with the backend which can lead to instability.
2024-08-02 00:02:14.724 24 WARNING cinder.volume.manager [req-**** - - - - -] The HBSDFCDriver volume driver's get_volume_stats operation ran for 34.4 seconds. This may indicate a performance problem with the backend which can lead to instability.
2024-08-02 00:07:24.671 24 WARNING cinder.volume.manager [req-**** - - - - -] The HBSDFCDriver volume driver's get_volume_stats operation ran for 104.4 seconds. This may indicate a performance problem with the backend which can lead to instability.
The performance issue seems to affect the stability of the backend and has led to prolonged times for volume creation and other volume-related operations.
Configuration details
- Volume Driver: HBSDFCDriver
- Backend: Hitachi storage G700
Cinder configuration
[DEFAULT]
enabled_backends = storage-g700-san03
backend_native_threads_pool_size = 200
zoning_mode = fabric
storage_availability_zone = nova
cinder_internal_tenant_project_id = e37541e1e71740cf97d651a0ef0073c6
cinder_internal_tenant_user_id = e077e3038bec47b3921237235ea0d555
[storage-g700-san03]
volume_driver = cinder.volume.drivers.hitachi.hbsd_fc.HBSDFCDriver
volume_backend_name = storage-g700-san03
san_ip = 10.0.0.91
san_api_port = 23451
san_login = admin
san_password = 'xxx'
hitachi_storage_id = 886000418244
hitachi_snap_pool = 0
hitachi_pool = 1
hitachi_target_ports = CL1-A, CL2-A, CL3-A, CL4-A, CL5-A, CL6-A, CL7-A, CL8-A
hitachi_group_create = True
hitachi_rest_disable_io_wait = True
use_multipath_for_image_xfer = True
hitachi_zoning_request = True
image_volume_cache_enabled = True
[fc-zone-manager]
zone_driver = cinder.zonemanager.drivers.brocade.brcd_fc_zone_driver.BrcdFCZoneDriver
fc_san_lookup_service = cinder.zonemanager.drivers.brocade.brcd_fc_san_lookup_service.BrcdFCSanLookupService
fc_fabric_names = Site_SANDirector01, Site_SANDirector02
zoning_policy = initiator-target
brcd_sb_connector = HTTP
enable_unsupported_driver = True
[Site_SANDirector01]
fc_fabric_address = 10.0.1.101
fc_fabric_user = admin
fc_fabric_password = fake_pass1
fc_fabric_port = 22
zone_activate = True
zone_name_prefix = openstack_director01_
zoning_policy = initiator-target
fc_southbound_protocol = SSH
[Site_SANDirector02]
fc_fabric_address = 10.0.1.102
fc_fabric_user = admin
fc_fabric_password = fake_pass2
fc_fabric_port = 22
zone_activate = True
zone_name_prefix = openstack_director02_
zoning_policy = initiator-target
fc_southbound_protocol = SSH
Steps taken
Enabled Detailed Logging: Increased log level to DEBUG to capture more details.
Reviewed Configuration: Ensured relevant configuration settings are in place.
Questions
Has anyone encountered similar performance issues with the HBSDFCDriver?
Are there any known optimizations or configurations that can help mitigate this issue?
What additional steps can I take to troubleshoot and resolve this performance problem?
Are there specific logs or metrics that I should focus on to identify the root cause?