We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.
Yeah, the problem I have right now is that I only have one GPU (2080ti) on a node. So the GPU resource " http://nvidia.com/gpu " is 1. It's harder for me to use replica count to scale a process a bit more. I also have to change the deployment strategy to "Recreate" instead of a rolling update. Otherwise, the K8S controller runs into an error because there is no GPU available for the schedule.
As you said, there's a way to share. I was thinking about doing something similar but haven't found time to look into it. How did you make a single GPU shared with different pods? I am talking about CDI (Container Device Interface), by the way. I guess using just RuntimeClass probably should allow you to share it, but I am just wondering if it is possible to share the same GPU with CDI? Maybe just use a fraction number in the request GPU number? Will find a time to try it later.
What I'd also like to find is a web dashboard that shows the kind of information as nvidia-smi. Then I could use the same nodeSelector to deploy to the right node without allocating any of the GPU to it. Haven't found one yet.
As I mentioned I used Node Feature Discovery to label the node with the GPU. It appears (and you can verify this with the nvidia-smi command line tool) that you can have multiple processes sharing a GPU just like you can with a CPU, the thing you have to watch out for is those processes collectively allocating more than the physical VRAM. So I don't specify a GPU resource at all (since as you mentioned it's going to be 1) and instead I use the nodeSelector to make sure workload is allocated to the labeled node.
A few months ago I set up a Nvidia GPU on a K3S node similar to what you did and I've got a few tips. While doing the resource limit of 1 nvidia GPU like you did works it allocates the entire GPU to the one deployment even when it could be shared. The rabbit hole I went down uses Node Feature Discovery in general and then the nvidia flavor of it ( https://github.com/NVIDIA/k8s-device-plugin/tree/main/docs/gpu-feature-discovery ) to label the node as having a GPU. I can then use a node selector to target it such as:
spec:
http://kubernetes.io/arch: amd64nodeSelector:
http://nvidia.com/gpu.product: "NVIDIA-GeForce-RTX-3060"
This allows me to have two deployments at the same time sharing the GPU. The other thing I did was to taint the GPU node so that regular deployments wouldn't use it, but then I added a toleration for that taint to the GPU using deployments. Of course if I do want anything else to be able to deploy to that node I can add the toleration there too. I've got Ollama and Stable Diffusion deployed at the same time, this works because they unload their GPU usage when idle but of course you don't want to use both at the same time with heavy AI models. BTW I've got a very similar set up for a Google Coral node on a Raspberry Pi (my cluster is a mix of Intel and Arm nodes). I also use Ansible to set up most of this because I'm too lazy to do it by hand.