Hypervisor, "solid OS" and Bluefin?

This has come up in a number of different ways but after I learned Bluefin wasn’t really immutable and that the number of atomic/cloud-native distros have such nuanced different ways of solving the same problem it is best to just not touch anything. I realize there’s a number of reasons for this, you cannot delete /var as in theory you should if it was immutable but fstab and things are there.

Lately I after installing a Quadlet (Ollama) and Incus via Just I ran into all kinds of issues. Like it worked fine until an update and then I had to manually launch the web-ui. I saw all kinds of dbus/SELinux errors running systemd. I am pretty sure it is related to me installing ssh/gpg and my own keyring under my home directory as I saw a lot of bad signatures when looking at the RPM installs (I have not modified the rpm-ostree at all).

In any case these apps are not sand-boxed or if they are they using an overlay which isn’t apparent, rpm-ostree -v shows no user overlays and I cannot. So going from reformat on first 40 ISO (bluefin-dx-nvidia:latest with Optimus and Cuda) to whatever it is at now I realized it doesn’t delete things on updates but seems to just be additive leading me to creating a new user doing a diff on our home drives and seeing significant differences, who knows what got updated nightly or by me (I guess using my own keyring accidentally) outside of home that is no longer relevant. It is also super hard to debug (ostree fsck is corrupt out of the gate).

Given I travel quite a bit and have often unstable Internet connections an SSH into a homelab or cloud provider and rebuild of the entire OS is pretty off the table.

I remember Microsoft having similar issues with the XBox when trying to turn it into a media console where you could pause a game and enter into another app. Rather than buy a new Macbook Pro and run this in a VM with some ansible scripts, as my Dell is top of the line already and I don’t like Windows, would it be possible to have a hypervisor essentially running everything and then use PopOS or whatever flavor distro then essentially be able to rebuild or customize Bluefin from there?

This seems overkill but if you told me you’d need a hypervisor to run Mario Bros when the NES came out I’d look at you funny. I don’t know if the XBOX Media Center or whatever they call it when you pause games is technically a VM as much as the game engine itself, I believe there’s some nuance that’s not really relevant, but the idea is that instead there’d be something super stable, an OS you didn’t really develop off of except to invoke an ISO builder or simply install Bluefin fresh and didn’t have to worry about solving all the complexity.

I think the core problem is you have devs like me who use Bluefin as ephemeral anway (dot files in git, source in git, I can get up and running quickly) and then its sort of aimed at normal desktop users and then especially gamers who seem to love it.

The side effect is you could probably load things like “sidecars” a la K8S instead of jumping through hoops with things like Cockpit.

This seems like how things like K8S and XBox kind of solved a lot of problems, which was assume there’d always be another issue so pick something core that will always work and just go from there. It doesn’t even have to be a hypervisor but something like ZFSBootMenu or maybe a more friendly flavor of Proxmox.

Brainstorming here but as I see it there’s a couple big issues.

  1. Boot gets “jacked” with double entries and the weird /boot /boot/efi due to BTRFS. Solved but for some reason I no longer have bootc because I think maybe initramfs at one point was installed with NVidiia maybe? That’s kind of the problem with updates all the time they’re great but then you’re now like 8 updates ahead and notice something missing because its rarely used. But if you boot from something really solid like Vsphere or Prooxmox or something more friendly and transparent I’m not aware of, that goes away.

  2. New user creation problems go away as I would say the Bluefin variant at least and probably Aurora attracts a majority of devs who use it not caring or expecting their data to be there and just using it for the amazing toolset.

  3. Error reporting is horrible and largely upstream, and solved by a fresh install. Also rebasing sort kind of works as long as you’re not switching from Gnome to KDE but since its not truly immutable you’re not doing a “fresh install” on a rebase or even just keeping the home directory which is very confusing.

I used to be closer to the metal but over the years my age might be showing, there might be another solution. I’m just proposing that Bluefin can be a daily driver but I’ve seen well financed teams eventually get around problems by taking routes like these and freeing up some upstream things to mature and the toolkit/error reporting to improve.

Honesty I was vaguely a part of when Go/K8S/Containers came out and they had that oh this works great but uh I have to do all this work to get it going kind of thing and it constantly breaks? I’m open to ideas, maybe another variant solves this but what I like about Bluefin is that it doesn’t have a strict anti-commercial ethos or mantra that cuts other communities apart.

If there’s a free version of VSphere or someone has a better idea let me know. I was just thinking something hypervisor-like that’s rock solid, lets you switch into another OS to do email or rebuild (or let the hypervisor do it!) and is capable of doing this all offline or with a poor hotel Internet connection.

I just never have installed this in a laptop setting but I’m going to guess anything with a game that lets you pause and enter another menu probably took the “VM Suspend” approach the XBox team did so there’s got to be something out there. That also let them essentially rebase and lets them and other machines become backwards compatible from my understanding.

I wish I had a week in a homelab I could dedicate to this, but all I can say is that’s what I’ve seen go out the door from a high level in my professional experience.

I don’t understand what problem you’re trying to solve, is the issue that the ollama just setup was broken? Because those are just systemd service units and containers so they run the same way as they do on any other linux.

I’m sorry I should be more clear.

(1) I just use ujust from the home directory
(2) Anything I install would be through flatpack/brewfiles
(3) Somehow installing ollama/webui (through ujust) worked fine on first install. Afterwards webui (dockerd service itself in reality) stopped working after either a nightly update or basically when I came back in the morning.
(4) Restarted couldn’t figure it out, random SELinux/Dbus errors with no mention of the problem.
(5) I had setup gpg key-ring and SSH per GitHub’s instructions would’ve been the only other operation I did after a full format.
(6) Creating a new user, doing a diff on the directories, noticing quite a few changes. I first did the ujust cleanup and then the diff task. Reboot.
(7) Except for machineid and the keys that weren’t the ones I created (e.g., were generated or came from /etc/usr) I matched the directories.
(8) Removed the webui and ollama image, network and containers from podman and docker.
(9) Haven’t tried reinstalling them yet. But most errors cleared out. The only odd one I see I can’t figure out:
A lot of freedesktop stuff that might not be an error at all:

ay 17 09:11:51 fedora gdm[6285]: Gdm: GdmDisplay: Session never registered, failing
May 17 09:11:51 fedora dbus-broker-launch[5342]: Activation request for ‘org.freedesktop.Accounts’ failed.
May 17 09:11:51 fedora dbus-broker-launch[5342]: Activation request for ‘org.freedesktop.Accounts’ failed.
May 17 09:11:51 fedora gdm[6285]: Gdm: Failed to list cached users: GDBus.Error:org.freedesktop.DBus.Error.NameHasNoOwner: Could not activate rem>
May 17 09:11:51 fedora systemd[1]: gdm.service: Main process exited, code=exited, status=1/FAILURE
May 17 09:11:51 fedora systemd[1]: gdm.service: Failed with result ‘exit-code’.

This which I didn’t think Google was in the tree but I guess it is, only failure:

May 16 19:09:21 fedora rpm-ostree[9758]: Error during transfer: Curl error (6): Couldn’t resolve host name for https://dl.google.com/linux/linux_>
May 16 19:09:21 fedora rpm-ostree[9758]: Downloading: https://dl.google.com/linux/linux_signing_key.pub
May 16 19:09:21 fedora rpm-ostree[9758]: Error during transfer: Curl error (6): Couldn’t resolve host name for https://dl.google.com/linux/linux_>
May 16 19:09:21 fedora rpm-ostree[9758]: Downloading: https://dl.google.com/linux/linux_signing_key.pub
May 16 19:09:21 fedora rpm-ostree[9758]: Error during transfer: Curl error (6): Couldn’t resolve host name for https://dl.google.com/linux/linux_>
May 16 19:09:21 fedora rpm-ostree[9758]: Downloading: https://dl.google.com/linux/linux_signing_key.pub
May 16 19:09:21 fedora rpm-ostree[9758]: Error during transfer: Curl error (6): Couldn’t resolve host name for https://dl.google.com/linux/linux_>
May 16 19:09:21 fedora rpm-ostree[9758]: Error while downloading: Curl error (6): Couldn’t resolve host name for https://dl.google.com/linux/linu>
May 16 19:09:21 fedora rpm-ostree[9758]: Txn UpdateDeployment on /org/projectatomic/rpmostree1/default failed: Updating rpm-md repo 'google-chrom>

And this:

May 17 09:07:13 fedora systemd-logind[5381]: Failed to open ‘/boot//loader/entries’: Remote address changed

When I tried running “bootc” from “update-ng” it fails with command not found. Yet it is installed:

✦ ❯ rpm-ostree search bootc

===== Name Matched =====
bootc : Bootable container system
systemd-bootchart : Boot performance graphing tool

I am on:

State: idle
AutomaticUpdates: stage; rpm-ostreed-automatic.timer: last run 1h 39min ago
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/bluefin-dx-nvidia:latest (index: 0)
Digest: sha256:dd48b42a755bdc023c41de6e572638cc223436a5fef1e8d56c24e02b4971d6fe
Version: 40.20240513.0 (2024-05-16T22:00:43Z)
Commit: 9568e2fe96137ce7a5b2f20c6dfd34ec4ea513c3273173f1f06855f97284b2f0
Staged: no
StateRoot: default

Again, I have Cuda and Optimus configured and seem to be running properly:
NVIDIA T1200 Laptop GPU
NVIDIA Driver Version: 550.78

I guess what I’m saying is that fundamentally I seem to have to format and reinstall to get things working and/or I ran I command in just that wasn’t ready for prime time. Or Nvidia is doing something. I can try installing ollama with webui again if that’d help?

I was just thinking a “restore OS” like you’d run on VSphere might solve some issues if I could just say “restore” and it’d use ansible to install again, as the install is only 7 minutes or so it seems like not a big deal.

Edit: I started this off with the first ISO for F40. I wanted to sign commits in GitHub forgetting you can do it over SSH but but GPG keyring was in my home directory and not in the /etc/skel or wherever it pulls the keys from. I didn’t touch ANYTHING outside of Home unless a just command did it. Of course if if ran manual updates (without bootc, just ujust update one), I read the RPMs would be signed with my keyring which might have confused dbus or a service as a wrong key being signed? Or it could be a weird kernel/Nvidia thing.

I do have a week in my homelab now and because of that I’ll just record everything I do to a log file, and see if I can get to a spot where I can recreate it. I strongly believe it is weird NVidia stuff mucking up the kernel or signing issues I inadvertently caused.

The point of the post was that like I run into these problems and thought it’d be easier to just to rebuild it then pinpoint the error down because the logging isn’t helping. Thanks for your quick response. Hope maybe I did something stupid, I need to dig into deeper on how the system actually works.

Ok so you seem to be conflating a bunch of unrelated things, if you want to manage the ollama service you use systemctl like this: systemctl status --user ollama and then you can stop and restart the service from there. There’s nothing in our units that I can see that anything to do with Docker.

The google.com URL you seem to have added, but it’s failing so that’s the cause of that error, which is probably why your upgrade is stuck, did you enable a google chrome repo or something?