One of the most distinctive features of Apple silicon chips is that they have two types of CPU core, E (Efficiency) cores that are energy efficient but slower than the P (Performance) cores, which normally run much of the code in the apps we use. Apps don’t decide directly which cores they will be run on, that’s a privilege of macOS, but they register their interest by setting a Quality of Service, or QoS, which is then taken into account when they’re scheduled to run. With the introduction of Game Mode in Sonoma, CPU scheduling can now work differently, with E cores being reserved for the use of games. This article looks at another atypical situation, when running a macOS virtual machine (VM) assigned a set number of virtual cores. How does macOS Sonoma handle that?
CPU cores
M-series chips contain different types and numbers of CPU cores, depending on the model:
- Base models contain 4 E and 4 P cores; for the M1, those are Icestorm and Firestorm respectively, while the M2 has Blizzard and Avalanche. These are configured in two clusters, one containing the E cores, the other the P cores. Within each cluster, all cores are run at the same frequency, and they share L2 cache.
- Pro and Max models contain 2-4 E and 6-8 P cores, in one E cluster and two P clusters, again running at a common frequency within the cluster, and sharing L2 cache.
- Ultra models contain 4 (M1) or 8 (M2) E and 16 P cores, their clusters having common frequencies and L2 caches.
In broad terms, allocation of executable threads to different cores depends on:
- The QoS assigned to each thread. Those given low values for background and maintenance tasks, including Time Machine backups and Spotlight indexing, are run on E cores, and have no external control to allow their rescheduling to P cores. Those with higher QoS can be run on either P or E cores, and external controls can restrict them to E cores alone, although they’ll normally be run on P cores when they’re available.
- Clustering. Because all cores in a cluster run at the same frequency, it’s more efficient to recruit additional cores in the same cluster, as they’ll already be running at a higher frequency, rather than a core in a different cluster, whose frequency may then need to be increased. Clusters are normally recruited in sequence too: in Macs with two P clusters, threads are normally run on the first of those (P0), and only when additional cores are required is the second (P1) recruited. Running with a light load, P0 is therefore likely to be more active than other P clusters, which may spend long periods idling.
- Order within a cluster. Although applied more loosely than other principles, there is an observed tendency to recruit cores in order within a cluster. Where the P0 cluster contains cores P2, P3, P4 and P5, threads tend to be loaded onto P2 first, and P5 last. However, running threads are often relocated within a cluster, following which P2 may be left idle, while P5 bears the brunt.
Thread scheduling and dispatch contains many more subtleties, but those influences normally dominate what you see in practice.
Virtual CPU cores
Under macOS lightweight virtualisation, virtual machines (VMs) are allocated a number of virtual CPU cores, all of which are the same type, equivalent to a P core. There is thus no option to allocate threads running in the VM to E cores on the host. This is illustrated in the following examples shown in Activity Monitor’s CPU History window.
Two different loads were applied to the M1 Max shown here: the two long peaks seen on the E cores were floating-point threads at a QoS of 9 for a background task, while the shorter peaks on four P cores were the same threads at a QoS of 33 for priority user tasks.
When the same sequence was run on a VM with 4 virtual CPU cores, the QoS made no difference to the performance or core allocation seen in the VM.
On the host, both thread loads were run similarly on the first cluster of P cores. This might suggest that the host simply hands over the cluster of P cores to be scheduled and dispatched by the VM. If you rely on what you’re shown in Activity Monitor, you could be tempted to reach that conclusion.
Methods
To assess core allocation for VMs, I used a Mac Studio M1 Max running Sonoma 14.0. A Sonoma 14.0 VM was built using Viable, and run during tests using that virtualisation app. Test loads were applied from the VM using my app AsmAttic, each thread running 5 x 10^8 tight loops of purposeless assembly code performing floating point arithmetic using registers alone. I detailed this here. Each thread typically takes about 3.2 seconds to complete, and serves here as an in-core load, not a benchmark. During each of those runs, powermetrics
was run on the host to log cpu_power
measurements over 0.1 second sampling periods. Frequency and active residency for all the cores was t