Authors: Dr. Paul Scotti (Princeton Neuroscience Institute, MedARC), Dr. Tanishq Mathew Abraham (Stability.AI, MedARC)
Example images reconstructed from human brain activity.
Introduction
MedARC, in collaboration with researchers at Princeton Neuroscience Institute, Ecole Normale Supérieure, PSL University, University of Toronto, and the Hebrew University of Jerusalem, along with EleutherAI and Stability AI, is proud to release its collaborative paper on MindEye.
MindEye is a state-of-the-art approach that reconstructs and retrieves images from fMRI brain activity. Functional magnetic resonance imaging (fMRI) measures brain activity by detecting changes in oxygenated blood flow. It is used to analyze which parts of the brain handle different functions and to assist in evaluating treatments for the brain. MindEye was trained and evaluated on the Natural Scenes Dataset [1], an offline fMRI dataset containing data from human participants who each agreed to spend up to 40 hours viewing a series of static images, for a few seconds each, inside the MRI machine.
This is the first preprint put out by MedARC since its public launch and is currently undergoing peer review. MedARC is a Discord-based research community supported by Stability AI that is building foundation generative AI models for medicine using a decentralized, collaborative, and open research approach.
Method & Results
MindEye achieves state-of-the-art performance across both image retrieval and reconstruction. That is, given a sample of fMRI activity from a participant viewing an image, MindEye can either identify which image out of a pool of possible image candidates was the original seen image (retrieval), or it can recreate the image that was seen (reconstruction).
To achieve the goals of retrieval and reconstruction with a single model trained end-to-end, we adopt a novel approach of using two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior).
Each unique image in the dataset was viewed three times, for three seconds at a time. Corresponding fMRI activity (flattened spatial patterns across 1.8mm cubes of cortical tissue called “voxels”) was collected for each image presentation. fMRI activity across the three same-image viewings was averaged together and input to MindEye to retrieve and reconstruct the original image.
MindEye overall schematic depicts the retrieval and reconstruction submodules alongside an independent low-level perceptual pipeline meant to enhance reconstruction fidelity.
Retrieval
For retrieval, MindEye finds the exact (top-1) match in a pool of test samples with >90% accuracy for both image and brain retrieval, outperforming previous work which showed <50% retrieval accuracy. This suggests that MindEye brain embeddings retain fine-grained image-specific signals.