Topic: Performance issues on M1 mac (arm64) with multicore rendering

Hello,

Fantastic work Modartt on the Pianoteq 7.1.0 release! It's really great to have native arm64 support. I am enjoying Pianoteq a lot.

I would like to report an issue about performance on a M1 mac. In short, the performance index shows large fluctuations, probably due to heterogeneous cores in the M1 chip. Here are some screenshots: https://imgur.com/a/f5OcBgr

---------------------

- arm64 native, multicore rendering: Performance index 62..135 (warning: large fluctuations) https://i.imgur.com/y6pJdVc.png
- arm64 native, singlecore rendering: Performance index around 143 https://i.imgur.com/X7dls9p.png
- Rosetta 2, multicore rendering: Performance index 75..98 https://i.imgur.com/7PdOk8a.png
- Rosetta 2, singlecore rendering: around 103 https://i.imgur.com/DfmP7lj.png

In the above case, I used a layered instrument consisting of two piano layers (which is of course more CPU-intensive). When it comes to a single-layered instrument, however, I feel it is more stable (performance index around 100..138 with multicore rendering). FYI I am using a M1 mac with 8GB RAM.

What is strange here is that with multicore rendering enabled on native arm64 mode, the performance fluctuates a lot. I guess this is because M1 has four high-performance Firestorm cores and four energy-efficient Icestorm cores and things are much slower on Icestorm cores. However, the gap is not as big as this when Pianoteq is running under Rosetta2's x86_64 emulation mode. I feel when multicore rendering is used, we should use only high-performance cores (not sure this is configurable from an application) to attain better stability.

My experience is that, on native arm64 mode, sound rendering is more stable with multicore rendering disabled in general (and you can see higher performance index). However, when the audio load is very high, I often hear some crackles as it would utilize only one core. Overall, the setting that worked most stably for me is Rosetta2 + multicore rendering.


Thank you!

Last edited by joelw (16-01-2021 00:14)

Re: Performance issues on M1 mac (arm64) with multicore rendering

I also have this issue.

Pianoteq 7.2.1 ARM native
MacOS 11.2.3

reaper ARM native

VST3 version of pianoteq

Pianoteq is basically unusable at 48khz 64 or 128 samples buffer.

My i5 quadcore windows machine from 2013 does way better.

I had read that the apple silicon macs were supposed to be fast for many audio plugins. I feel reasonably sure that ithis is an optimisation issue rather than a fundamental problem.I hope Modarrt can sort it out soon.

Last edited by blackthorn (06-04-2021 21:47)

Re: Performance issues on M1 mac (arm64) with multicore rendering

This seems to be an issue on ARM chips with BIG.little cores. I had the same wide fluctuations and "fragile" audio performance on the Odroid N2+ when it was switching between the high/low performance cores. I was able to stabilize the performance index by assigning Pianoteq to the 4x high-performance cores.

Unfortunately MacOS users cannot assign CPU cores to an application, but it is possible in Logic Pro (Preferences > Audio > Processing Threads). Try running Pianoteq in Logic Pro, set to use specific cores, and see if you get better performance.

Another way to stabilize the Pianoteq performance index on a Mac is to disable all the "helper" programs that Apple loads by default. I deleted all Widgets from the Notification Center, and that helped - it even sped up the startup time. Other things like disabling WIFI / Bluetooth can also help.

Notes:
I also noticed that CPUs with hyper-threading seem to exhibit this behaviour as well (performance index with a wide range). In hyper-threaded CPUs the  "virtual cores" seem to act like "low-performance" cores in ARM, but the effect is not as noticeable - perhaps because virtual and real cores share the same cache? (I don't know just guessing)

Last edited by Groove On (07-04-2021 03:52)

Re: Performance issues on M1 mac (arm64) with multicore rendering

hmm this gave me a clue, which I should have tried before:

In Reaper I tried changing the setting in prefs-audio-buffering "Allow live FX multiprocessing"

I first changed it from 8 to 4 cores with no effect.

I then turned it off entirely and the problem is fixed. Pianoteq now performs well.

PT works OK with or without multiprocessing enabled in the plugin, but the system load shown is lower with multi on.

so seems this is a conflict between which part of the system allocates processors?

turning the "Allow live FX multiprocessing" off also improved perfomance in Kontakt through rosetta

Re: Performance issues on M1 mac (arm64) with multicore rendering

Hi - I must be missing something. I upgraded to v7 ... the latest iteration of that as of a week or two ago (pianoteq_stage_setup_v721) ... because I'm keen to stay native M1 and avoid Rosetta2 - I know Rosetta2 works well but I'm trying to avoid the mix and match approach. Anyway ... on installing I received a message that Rosetta2 is required (I have, deliberately, not installed Rosetta2 yet).

So is pianoteq_stage_setup_v721 really M1 native? Is the installer not native? I know what the official doc says ... but why am I getting this message?

I feel dumb raising this so if it is a dumb question, please know that I am feeling it - cheers.

Re: Performance issues on M1 mac (arm64) with multicore rendering

Groove On wrote:

This seems to be an issue on ARM chips with BIG.little cores. I had the same wide fluctuations and "fragile" audio performance on the Odroid N2+ when it was switching between the high/low performance cores. I was able to stabilize the performance index by assigning Pianoteq to the 4x high-performance cores.

Unfortunately MacOS users cannot assign CPU cores to an application, but it is possible in Logic Pro (Preferences > Audio > Processing Threads). Try running Pianoteq in Logic Pro, set to use specific cores, and see if you get better performance.

Another way to stabilize the Pianoteq performance index on a Mac is to disable all the "helper" programs that Apple loads by default. I deleted all Widgets from the Notification Center, and that helped - it even sped up the startup time. Other things like disabling WIFI / Bluetooth can also help.

Notes:
I also noticed that CPUs with hyper-threading seem to exhibit this behaviour as well (performance index with a wide range). In hyper-threaded CPUs the  "virtual cores" seem to act like "low-performance" cores in ARM, but the effect is not as noticeable - perhaps because virtual and real cores share the same cache? (I don't know just guessing)

There are macOS APIs that allow the developer to specify QoS levels which will guarantee that a certain task is run on the higher performance cores (if possible). Whether Modartt have implemented them I can’t say, but I’d be really surprised if not.

I do wish there was a bit more transparency around scaling and performance from the devs though.