The Curse of The Early Adopter
I was going to get started on working on voice AI stuff for the upcoming Hell on $5 A Day podcast, but I got stopped short. My equipment was too good, or more to the point, too new.
Earlier this year, I invested in replacing my gaming desktop from early 2021 with a new system sporting an Nvidia 5080 card. Until recently, I only used LM studio with it to do text-based inference and it seemed pretty good and speedy(ish). But when I started trying local solutions for Text-to-Image like Invoke AI Community Edition or doing Voice-to-Voice conversions like Replay, they wouldn’t run.
PyTorch and CUDA 12.8 are just friends right now
Compute Unified Device Architecture (CUDA) is a technology used in Nvidia cards to allow massively parallel computing on their tons of cores. With each new architecture, Nvidia updates CUDA. The 50xx line of video cards use the new Blackwell architecture and thus AI applications that want to take advantage of this need Blackwell support. In both cases (Invoke and Replay), they use the popular Python library PyTorch. It’s a widely used machine learning framework originally developed by Meta, but now shepherded by a non-profit. PyTorch uses CUDA to get the most out of Nvidia cards.
The problem is that CUDA 12.8 is the first release version to offer Blackwell support. PyTorch only supports up through 12.6. PyTorch has incorporated it in its development branch and if you install a nightly build instead of an official release, you can get Blackwell support, but it depends on how your application incorporates PyTorch. Invoke is basically a set of Python scripts that create a web server to provide the user interface. It installs its own copy of Python and you can replace its copy of PyTorch with a nightly build. Replay is an encapsulated app with its own scripts too, but I have yet to find a workaround to swap in a nightly build of PyTorch.
That said, on their Discord, one of the Replay devs said that as soon as PyTorch officially drops a version with CUDA 12.8 support, they’ll cut a new release ASAP.
The blessings and the curse of open source
Free Libre Open Source Software (FLOSS) has changed the world. Whether you need a free OS you can hack (Linux) or want to make 3D movies (Blender) or a database capable of supporting tons of of concurrent users (MariaDB, PostgreSQL, etc.), FLOSS has got you covered. If you’re building software, it’s quite likely that your language runtimes, editor, debugger, and package management system (as well as many packages) are all FLOSS. That is the blessing… “standing on the shoulders of giants.”
Its product leadership and developer corps are also made up primarily of volunteers, a few people who get paid by non-profits, and a few people paid by corporations to work on parts of the project that also advance the corporations’ interests. And that means that stuff happens when it happens. The most recommended Photoshop alternative, the GNU Image Manipulation Program (GIMP), released version 3.0 after working on it for years. I used to call GIMP 3.0 “the Duke Nuke’em Forever of graphics software.” So the fact that Blackwell support hasn’t become official in PyTorch is not shameful or embarrassing. It’s just how it is and it’ll get there when it gets there. If you want it to happen faster, you can donate talent (help write code, docs, etc.), money, or equipment. Harassing the maintainers is just a way to get banned from a community and potentially be the straw that breaks the camel’s back (makes a maintainer quit).
So here I sit… I can do this on my older model laptop with discrete Nvidia graphics. But it’ll be slooower and more RAM constrained. But I can start building a workflow with it I can easily translate to my desktop once PyTorch with CUDA 12.8 officially releases and gets picked up by these apps. And once that happens, the laptop can be a backup. Still, it’s been a bit annoying to hit this speedbump. First world problems, I know, but sort of interesting from a supply chain perspective.