Thanks @m2Giles for the patient assistance!
n.b. This is a work in progress. Feedback is welcome!
The procedure below will allow you to create a distrobox with Tensorflow and CUDA installed for running machine learning / artificial intelligence workflows. You will also be able to connect to the distrobox from Visual Studio Code using ssh (127.0.0.1
port 2222
). From there, the fact that you’re working in a container will be transparent to you.
Prerequisites
An installation of Bluefin-dx-nvidia.
An Nvdia GPU. (Tested on an RTX A4500)
Resources
Download the three files in the gist on my GitHub.
- nvbox.ini
– This is the ‘assemble’ file to create the distrobox. - check-nvdia-cuda
– A bash script to test for installed libraries, libcuda.so, libcudnn, and libcudart
– It also checks for nvcc, then runs nvidia-smi to show installed versions of Nvidia drivers - tensorflow_mnist_test
– Checks that the GPU is available
– Creates a basic model to test the ability to train a model using the MNIST dataset
Creating the Distrobox
From the folder where you downloaded the files:
distrobox assemble create --file nvbox.ini
This will download the blobs for the Tensorflow image at nvcr.io/nvidia/tensorflow:23.12-tf2-py3
, create, and start the container.
The home folder for the distrobox is ~/.local/shared/distrobox/nvbox
. This avoids polluting your own home folder.
The contents of your ~/.ssh folder will have been copied to the distrobox’s home folder. This allows password-less login to the ssh server running in the distrobox.
Testing Libraries
Enter the distrobox:
distrobox enter nvbox
You should see your prompt change to nvbox%
.
You will be in your user’s home folder (not the distrobox’s home folder).
Change directory to the folder where you downloaded the files from the gist, then run the library checks:
./check-nvidia-cuda
You should see messages about libcuda.so and others being installed, the version number of nvcc
, and the information from nvidia-smi. Your GPU should be listed.
Testing Tensorflow
Again from the folder where you downloaded the gist files, run:
./tensorflow_mnist_test
You will see many informational messages about Tensorflow (they have the date, time and I
for Information. You should also see your GPU listed, such as:
2024-01-05 16:10:06.596935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1883] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 15098 MB memory: -> device: 0, name: NVIDIA RTX A4500, pci bus id: 0000:01:00.0, compute capability: 8.6
Epoch 1/6
2024-01-05 16:10:07.678879: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f51006a2170 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-01-05 16:10:07.678909: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA RTX A4500, Compute Capability 8.6
Then, there will be six epochs listed as the model trains. Finally, the last few lines will be something like:
Epoch 6/6
469/469 [==============================] - 0s 940us/step - loss: 0.0614 - sparse_categorical_accuracy: 0.9824 - val_loss: 0.0801 - val_sparse_categorical_accuracy: 0.9745
If all the above works, you should be ready to test connecting with VSCode.
VSCode via SSH
Open VSCode, open the palette (Ctrl+Shift+P), then type ssh conn
Select Remote-SSH: Connect to host
Select + Add New SSH Host...
Enter ssh yourname@127.0.0.1:2222
Select /home/yourname/.ssh/config
You will see a message at the lower right that the host has been added. Click Open Config.
The host will look like this:
Host 127.0.0.1
HostName 127.0.0.1
Port 2222
User yourname
I like to change the host name to something more intuitive:
Host nvbox
HostName 127.0.0.1
Port 2222
User yourname
and save the file.
Now open the command palette again with Ctrl+Shift+P.
Select Remote-SSH: Connect to host
You should see your host listed, which you can select.
From here, you should be able to run Python notebooks and scripts using Tensorflow.
SSH Strict Host Checking
The first time you ssh into a host, you are given the opportunity to add the host to your ~/.ssh/known_hosts
file. If you recognize the host, you will want to add it.
If you rebuild the distrobox later, its ssh “identity” will change. When you attempt to ssh into the host, you will see a warning about the host’s identity changing, and that someone may be doing something nefarious. The message will also give you the line number of the offending host within the ~/.ssh/known_hosts
file.
To fix this, just use your favorite editor to open ~/.ssh/known_hosts
and delete the line. The next time you ssh into the host, you will be asked to add the host’s identity as a new host, and everything will be back to normal. (h/t @m2Giles for pointing this out.)
Final Thoughts
Home
As briefly mentioned above, the home folder for the distrobox is ~/.local/share/distrobox/nvbox
This makes it easy to copy files from your host home folder /home/yourname
to the distrobox’s home folder.
I also add a final cd
command in my .zshrc so that when I’m given the shell prompt, I’m in ~/.local/share/distrobox/nvbox
, rather than ~/
.
Installing other software
Since the distrobox is based on Ubuntu, after you distrobox enter nvbox
, you can used apt
to install software.
I have a ~/.local/bin folder in my distrobox where I put things like
- starship
– this binary might work for you. - atuin
– this binary might work for you. - direnv
–apt install direnv
- chezmoi
– Perhaps this binary