A hypernetwork estimates parameters ${mathbf{b}_1, mathbf{W}_2}^{(i,j)}$ of pixel-wise, local neural heat fields. The phase shifts $mathbf{b}_1$ operate on globally learned components, before thermal activations scale each component depending on their frequency and the desired scaling factor. The components are then linearly combined using coefficients $mathbf{W}_2$, resulting in an appropriately-blurred, continuous local neural
8 Comments
jiggawatts
The learned frequency banks reminded me of a notion I had: Instead of learning upscaling or image generation in pixel space, why not reuse the decades of effort that has gone into lossy image compression by generating output in a psychovisually optimal space?
Perhaps frequency space (discrete cosine transform) with a perceptually uniform color space like UCS. This would allow models to be optimised so that they spend more of their compute budget outputting detail that's relevant to human vision. Color spaces that split brightness from chroma would allow increased contrast detail and lower color detail. This is basically what JPG does.
Hizonner
Where are the ground truth images?
WhitneyLand
Seems like a nice result but wouldn’t have hurt for them to give a few performance benchmarks. I understand that the point of the paper was a quality improvement, but it’s always nice to reference a baseline for practicality.
flerchin
I'd like to see the results in something like Wing Commander Privateer.
adhoc32
Instead of training on vast amounts of arbitrary data that may lead to hallucinations, wouldn't it be better to train on high-resolution images of the specific subject we want to upscale? For example, using high-resolution modern photos of a building to enhance an old photo of the same building, or using a family album of a person to upscale an old image of that person. Does such an approach exist?
seanalltogether
I would love to see this kind of work applied to old movies from the 30s and 40s like the Marx Brothers.
nthingtohide
DLSS will benefit greatly from research in this area. DLSS 4 uses transformers.
DLSS 3 vs DLSS 4 (Transformer)
https://www.youtube.com/watch?v=CMBpGbUCgm4
flufluflufluffy
Was anyone else expecting an infinitely zoomable pictures from that title? I am disappoint