I have some tasks I only want to run on machines that have NVIDIA GPUs. Is there a good way with Puppet to be able to determine if a specific agent has an NVIDIA GPU or not? I'm able to do it in bash by checking to see if /usr/bin/nvidia-smi exists, but I'm not sure how I should do this in Puppet. Also if there's a better way to do it in bash instead of this way, please let me know.
2 Answers
You should create a custom fact that either checks the existence of /usr/bin/nvidia-smi (if that's sufficient), with something like:
Facter.add(:nvidia_gpu) do
confine :kernel => 'Linux'
setcode do
FileTest.executable?('/usr/bin/nvidia-smi')
end
end
or perhaps to be more thorough checks to see if a particular PCI device exists, if it shows up as one, using either the output of lspci or walking the /sys/bus/pci directory.
In your Puppet manifests, you can then use the value of $facts['nvidia_gpu'] to control what you do.
- 4,871
One can modify the pci_devices fact to detect the GPU is installed in the computer. It uses lspci instead of looking for toolkits, so can be used to install the toolkits with puppet.
# Copyright: Pieter Lexis <pieter@kumina.nl>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
There are no dependencies needed for this script, except for lspci.
This script is only tested on Debian (Lenny and Squeeze), if you
have any improvements, send a pull request, ticket or email.
The latest version of this script is available on github at
https://github.com/kumina/fact-pci_devices
def add_fact(fact, code)
Facter.add(fact) { setcode { code } }
end
case Facter.value(:operatingsystem)
when /Debian|Ubuntu/i
lspci = "/usr/bin/lspci"
when /RedHat|CentOS|Fedora|Scientific|SLES/i
lspci = "/sbin/lspci"
else
lspci = ""
end
We can't do this if we don't know the location of lspci
if !lspci.empty? and FileTest.exists?(lspci)
Create a hash of ALL PCI devices, the key is the PCI slot ID.
{ SLOT_ID => { ATTRIBUTE => VALUE }, ... }
slot=""
after the following loop, devices will contain ALL PCI devices and the info returned from lspci
devices = {}
%x{#{lspci} -v -mm -k}.each_line do |line|
if not line =~ /^$/ # We don't need to parse empty lines
splitted = line.split(/\t/)
# lspci has a nice syntax:
# ATTRIBUTE:\tVALUE
# We use this to fill our hash
if splitted[0] =~ /^Slot:$/
slot=splitted[1].chomp
devices[slot] = {}
else
# The chop is needed to strip the ':' from the string
devices[slot][splitted[0].chop] = splitted[1].chomp
end
end
end
To create your own facts, edit the following code:
raid_counter = 0
raidcontrollers = []
gpus = {}
scsicontrollers = {}
devices.each_key do |a|
case devices[a].fetch("Class")
when /^RAID/
# ignore AHCI "fake" RAID, because we don't use it
if devices[a].fetch('Driver') != "ahci"
add_fact("raidcontroller_#{raid_counter}vendor", "#{devices[a].fetch('Vendor')}")
add_fact("raidcontroller#{raid_counter}_model", "#{devices[a].fetch('SDevice')}")
raid_counter += 1
raidcontrollers.insert(-1,"#{devices[a].fetch('Driver')}")
end
when /^3D/
if gpus.key?("#{devices[a].fetch('Device')}")
gpus["#{devices[a].fetch('Device')}"]['count'] += 1
else
gpus["#{devices[a].fetch('Device')}"] = {
'count' => 1,
'vendor' => "#{devices[a].fetch('Vendor')}",
}
# Driver might not be defined
if devices[a].key?('Driver')
gpus["#{devices[a].fetch('Device')}"]['driver'] = "#{devices[a].fetch('Driver')}"
end
end
when /.SCSI controller./
if scsicontrollers.key?("#{devices[a].fetch('Device')}")
scsicontrollers["#{devices[a].fetch('Device')}"]['count'] += 1
else
scsicontrollers["#{devices[a].fetch('Device')}"] = {
'count' => 1,
'vendor' => "#{devices[a].fetch('Vendor')}",
'driver' => "#{devices[a].fetch('Driver')}"
}
end
end
end
add_fact("raidcontrollers", raidcontrollers.join(","))
add_fact("gpus", gpus)
add_fact("scsicontrollers", scsicontrollers)
end
- 181