Skip to content Skip to footer
0 items - $0.00 0

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation by GaggiX

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation by GaggiX

6 Comments

  • Post Author
    ZeroCool2u
    Posted April 19, 2025 at 2:03 pm

    Wow, the examples are fairly impressive and the resources used to create them are practically trivial. Seems like inference can be run on previous generation consumer hardware. I'd like to see throughput stats for inference on a 5090 too at some point.

  • Post Author
    Jaxkr
    Posted April 19, 2025 at 3:02 pm

    This guy is a genius; for those who don’t know he also brought us ControlNet.

    This is the first decent video generation model that runs on consumer hardware. Big deal and I expect ControlNet pose support soon too.

  • Post Author
    IshKebab
    Posted April 19, 2025 at 3:15 pm

    Funny how it really wants people to dance. Even the guy sitting down for an interview just starts dancing sitting down.

  • Post Author
    fregocap
    Posted April 19, 2025 at 3:37 pm

    looks like the only motion it can do…is to dance

  • Post Author
    WithinReason
    Posted April 19, 2025 at 3:58 pm

    Could you do this spatially as well? E.g. generate the image top-down instead of all at once

  • Post Author
    modeless
    Posted April 19, 2025 at 4:06 pm

    Could this be used for video interpolation instead of extrapolation?

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.