You can already run Linux apps using Termux and Termux-X11, and I’d say the performance would be better than this demo, because this is running in a virtual machine and uses it’s own kernel, whereas with Termux you’re running your apps directly on top of the Android Linux kernel. Also, you don’t have the overhead of running ChromeOS on top of Android.
In the footnotes they mention GPT-3.5. Their argument for not testing 4 was because it was paid, and so most users would be using 3.5 - which is already factually incorrect now because the new GPT-4o (which they don’t even mention) is now free. Finally, they didn’t mention GPT-4 Turbo either, which is even better at coding compared to 4.