Researchers from tech company Virtuals Protocol have published a paper on a new text-to-video AI model, MarioVGG, which can simulate Super Mario Bros. footage via some basic text inputs (thanks, ArsTechnica).
The model was fed over 737,000 Mario Bros. frames, showing Nintendo's prized plumber in 32 different levels with varying degrees of success and failure (141 wins and 139 losses, according to Github). Based on these images, and how they are arranged, the AI model "learns" what commands such as "jump" and "run" correspond to on-screen and is then capable of simulating such commands in a video format, physics and all.
The Virtuals Protocol paper showcases the model in action via a series of short videos which, from a distance, do look very similar to the iconic NES platformer. The publisher highlighted a selection of these videos on Twitter, claiming, "The era of infinite interactive worlds is here":
While the model is capable of recreating select Mario moves, it's not as if we're looking at a one-to-one simulation. To keep things simple, the researchers only focused on two inputs, "run right" and "run right and jump". The resolution was cut down from the NES' 256×240 to a much smaller 64×48 and the output frames are a fraction of the input (producing seven generated frames from the 35 it was fed), so things are far from silky smooth.
It's not all that fast, either. The single RTX 4090 graphics card used in the research was only capable of producing a six-frame video sequence every six seconds, and while the final frame of one sequence could be used as the first frame for the following one — getting closer to something that resembles an actual level — the researches admit that it's "not practical and friendly for interactive video games" for the time being.
On top of all that, the results are packed with glitches. A closer look at the above videos reveals Mario changing colours on the fly, morphing into enemies, gliding through usually impassible objects and occasionally disappearing completely. Official Mario this ain't.
And yet the researchers are not giving up hope that a model such as this could be used for game development in the future. "While replacing game development and game engines completely using video generation models might still not be practical and plausible at the moment," the paper concludes, "we show that it is possible and an option with just a limited set of data on a single game domain".
An AI being able to work out the cause-and-effect between user input and on-screen gameplay is a mind-blowing concept, but that final note of "replacing game development" being a possibility leaves a sour taste.
As if you needed a reminder, 2024 has been one of the industry's worst years for game developer layoffs, with both large and small studios seeing dwindling numbers to cut costs. An AI tool that can accurately replicate gameplay might still be a way off, but how it slots in with current working practices will increasingly become a cause for concern over the coming years if it continues to progress at this rate.
Just last week, Bayonetta 3 voice actor Jennifer Hale said that AI is "coming for us all" as negotiations around the ongoing SAG-AFTRA strike turned to its uses in video game acting work.