Welcome to the latest installment of Arrikto’s Kubeflow tips and tricks blog! In a simple Q&A format, we aim to provide tips and tricks for the intermediate to advanced Kubeflow user. Ok, let’s dive in.
Is there a way to auto-stop notebooks that are idle for a long time, such as overnight? We are looking to reduce resource usage
Yes, in fact it is a setting in Kubeflow, but it is not enabled by default.
It’s called “notebook culling”, and Benjamin Tan wrote a great article about it. You can set it up so that it is enabled by default on a fresh install. But assuming you already have an instance that you want to apply it to- here are the instructions.
Kubeflow 1.5 will update how idleness is calculated. So if you upgrade from 1.4 to 1.5 expect a slight change in behavior.
Credit: Question Keith Adler – Community Slack, Answer: Alexandre Brown and Benjamin Tan
What’s the best way to get Metadata from an experiment?
This is the sort of question that would get blocked as ‘subjective’ on Stack Overflow and probably cost you some karma points. But this is a blog and I am OP and Moderator.
There was a lot of chat on this- the community was sort of drifting towards MLFlow’s metadata tracker is best- then my friend and co-author Boris Lubinsky chimed in:
“MLFlow works great for demos, but I am afraid, it’s not scalable enough for large scale implementations. In general, in my opinion, metadata management was always a weakest link in Kubeflow.”
I trust Boris a lot and would agree (especially since I’ve not done much with metadata tracking personally). So MLFlow, but don’t count on it at scale. Also, a user named Timos chimed in later that it is worth watching KServe as they may be developing better tracking soon.
Credit: Question Андрей Ильичёв, Answer: Compilation from Community and Boris
How do I enable GPUs on MiniKF local?
Without reservation the answer is “You shouldn’t be using Vagrant.” Period. Full stop. You should use it on GCP or AWS. Personally, I prefer GCP.
…
But I have to use it locally because
The issue isn’t MiniKF, it’s the default way Vagrant is setup- it can’t access your GPUs. The solution is to make Virtualbox using LibVRT which CAN see your GPUs. I’m just going to copy-paste Steven’s answer from community Slack since it is very thorough.
Steven says:
Here’s a more detailed explanation
First of all, these are instructions for linux. My current setup is a headless machine with 2 GPUs. With this setup, when MiniKF is running the GPUs are “detached” from the main OS and only MiniKF can access them. This may not fit your current use case.
I followed some of the instructions I found here to enable IOMMU https://github.com/bryansteiner/gpu-passthrough-tutorial#—-tutorial
1. IOMMU Setup
- Enable IOMMU and CPU virtualization in the machine BIOS
- Enable iommu in the boot kernel parameters (In my case the system uses grub2, I edited /etc/default/grub to add amd_iommu=on iommu=pt then regenerated the config grub2-mkconfig -o /boot/grub2/grub.cfg)
- Reboot
- Verify that IOMMU is correctly enabled dmesg | grep -i -e DMAR -e IOMMU
- Find IOMMU groups of your GPUs and hardware info. (This output all your devices, search for VGA or NVIDIA to quickly find the values we’re looking for)
#!/bin/bash
for d in /sys/kernel/iommu_groups/*/devices/*; do
n=${d#*/iommu_groups/*}; n=${n%%/*}
printf 'IOMMU Group %s ' "$n"
lspci -nns "${d##*/}"
done
- In my case both cards are already on isolated IOMMU groups, no patching needed. We will refer to these values multiple time through the process.
IOMMU Group 16 08:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2204] (rev a1)
IOMMU Group 16 08:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aef] (rev a1)
IOMMU Group 47 43:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2204] (rev a1)
IOMMU Group 47 43:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aef] (rev a1)
2. libvirt setup (https://github.com/bryansteiner/gpu-passthrough-tutorial#—-part-2-vm-logistics)
- Enable hook support for qemu wget ‘https://raw.githubusercontent.com/PassthroughPOST/VFIO-Tools/master/libvirt_hooks/qemu’ –O /etc/libvirt/hooks/qemu chmod +x /etc/libvirt/hooks/qemu
- Create config with the device addresses /etc/libvirt/hooks/kvm.conf. The addresses are the one displayed when executing the IOMMU group script. 08:00.1 -> pci_0000_08_00_1
## Virsh devices
VIRSH_VIDEO_1=pci_0000_08_00_0
VIRSH_AUDIO_1=pci_0000_08_00_1
VIRSH_VIDEO_2=pci_0000_43_00_0
VIRSH_AUDIO_2=pci_0000_43_00_1
- Create bind hook /etc/libvirt/hooks/qemu.d/minikf_defau