Topic: Pianoteq 8 on Raspberry Pi 5

Following my previous experience with a Pi 4B I'm now building a new device using Pi 5 and the DAC Pro. With this more powerful board and the different OS I wonder if the recommendation of developers to improve the performances and other that I read in this forum are still valid:
1. elevate the privileges of audio group with:

/etc/security/limits.conf
...
@audio - rtprio 90
@audio - nice -10
@audio - memlock 500000

2. use cpufrequtils to switch to performance mode; I read someone recommending cpupower instead.
3. disable ethernet turbo mode:

/boot/firmware/cmdline.txt
... smsc95xx.turbo_mode=N

I removed this configuration becuase was giving me problems with the wifi connection.

In my installation on the Pi 4B I use to built a Real Time patched kernel, which reduce latency even more. With the newer kernels the patch is rolled in already and the process is even easier. I will test if also for Pi 5 the patched kernel will make that much difference.

If you have any other recommendation I'll be glad to test them. At the end I will produce another tutorial to bring everything together.

Last edited by jimmi (18-02-2025 08:07)

Re: Pianoteq 8 on Raspberry Pi 5

I completed my tutorial and published here. Any comment will be welcome

Re: Pianoteq 8 on Raspberry Pi 5

How much difference do these optimizations make over the standard installation? How do you test them?

Re: Pianoteq 8 on Raspberry Pi 5

levinite wrote:

How much difference do these optimizations make over the standard installation? How do you test them?

I'm testing just by my ears, do not have any special equipment to do it. If you may suggest better testing option with software it will be welcome.

The RT kernel makes really the difference from a very audible latency to a hardly detectable. As for the suggestion from this site, ref. my previous post:

  1. I cannot detect any evident difference in latency, and no disturbance are there in the sound. I'm not however an expert player, therefore I cannot exclude that with more intense use some problem may rise.

  2. Without cpufreq a lot of cracks disturb the sound

  3. No apparent differences using this option. This option was created for the Pi 3, I'm not even sure if it is still applicable for Pi 5, I cannot find any module or overlay with such name.

Re: Pianoteq 8 on Raspberry Pi 5

As a basic test for efficiency, you could try to use the "time" command. The idea is the less resorces used for the task is better. As an example:

time ./Pianoteq\ 8 --headless --play-and-quit
To compensate for caching run this initially at least twice.

How often are the patched versions updated? I am curious if your patched version uses the "numa" optimizations which gives a boost, especially for multithreaded tasks. Check for numa options with:
cat /proc/cmdline

Anyway, Good luck on your project I am sure many have found is useful.

Re: Pianoteq 8 on Raspberry Pi 5

jimmi wrote:

I completed my tutorial and published here. Any comment will be welcome

Ciao Roberto.

I don't have a Pi5 at the moment, so I can not test your instructions, but they look really well written, comprehensive and accurate.

Thanks for doing that!

Where do I find a list of all posts I upvoted? :(

Re: Pianoteq 8 on Raspberry Pi 5

levinite wrote:

As an example:
time ./Pianoteq\ 8 --headless --play-and-quit
To compensate for caching run this initially at least twice.

First run:

real    0m33,830s
user    0m16,832s
sys     0m0,902s

Second run:

real    0m32,116s
user    0m16,765s
sys     0m0,840s
levinite wrote:

I am curious if your patched version uses the "numa" optimizations which gives a boost, especially for multithreaded tasks.

This is a new project, I just compiled for the first time using the last kernel 6.14 rc3 with the RT patch rolled into. The kernel use the numa policy, reading /proc/cmdline I found:

numa_policy=interleave

In the kernel .config file the variables are set as follows:

CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_NUMA=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y
CONFIG_NUMA_MEMBLKS=y
CONFIG_NUMA_EMU=y
CONFIG_GENERIC_ARCH_NUMA=y
CONFIG_OF_NUMA=y
# CONFIG_DMA_NUMA_CMA is not set

The documentation say that there are 6 behavioral modes: default, bind, preferred, interleaved, preferred many, weighted interleave. I wonder if I may have even better performances using another mode.

dv wrote:

I don't have a Pi5 at the moment

My instructions should be applicable to Pi4 as well, except for the kernel building which can be adjusted looking at the manual

Last edited by jimmi (23-02-2025 02:51)

Re: Pianoteq 8 on Raspberry Pi 5

BTW same test on a Pi4B:

real    0m34,622s
user    0m47,624s
sys     0m4,530s

Large difference, but the latency in Pianoteq is comparable to Pi5. This test looks not so significant.

Re: Pianoteq 8 on Raspberry Pi 5

I know of no way directly measure latency in Pianoteq, however a fully preempt-able, real-time kernel I'm sure would decrease it. The time command can give a more objective measure of how much cpu time is used for a particular task. As such we can directly compare at least most linux systems. How does Pianoteq run under say Fedora, Ubuntu*, Arch etc? Which desktop is best and how much does it matter? Do I use pipewire or jack or something else.
In most cases we can even compare Pianoteq versions? All else being equal, efficiency is important because almost everything is dependent on cpu speed and availability and the Kernel determines which task or thread actually get to run and for how long.

With that in view would you mind running the following:

<code>
sudo cpupower frequency-set -r -g performance
./Pianoteq\ 8 --version
time ./Pianoteq\ 8 --preset "Shigeru Kawai SK-EX Ryuyo" --play-and-quit --headless
sudo cpupower frequency-set -r -g ondemand
</code>
I suspect your timings should be better with your rt-patch.
I am running an updated PiOS system. Results as follows:
real    0m36.989s
user    0m21.927s
sys    0m1.461s


* https://documentation.ubuntu.com/pro/pr...me_kernel/

Re: Pianoteq 8 on Raspberry Pi 5

I did some more test and now I'm really confused.

I realized that I did not change the CPU governor in my previous test. I'm using cpufrequtils not cpupower (Q: is there any difference?). In my case the code is as follows:

# sudo cpufreq-set -g performance
# ./Pianoteq\ 8\ STAGE/arm-64bit/Pianoteq\ 8\ STAGE --version
Pianoteq STAGE version 8.4.1/20250204 -- [url]http://www.modartt.com/pianoteq[/url] Copyright (c) 2025 Modartt.

# time ./Pianoteq\ 8\ STAGE/arm-64bit/Pianoteq\ 8\ STAGE --preset "Shigeru Kawai SK-EX Ryuyo" --headless --play-and-quit

real    0m37.491s
user    0m32.700s
sys     0m1.315s

# sudo cpufreq-set -g ondemand

Which is surprising because it look worse than the test done with the governor set to ondemand, unles I'm not able to read these numbers. In fact I did another test with the governor set ondemand and the result were more similar to the result I got last time, even though not the same:

real    0m37.818s
user    0m28.599s
sys     0m1.048s

It looks like this test is not replicable, unless it needs additional preparation.

I did not mention that I did a trial installing a vanilla Bookworm with the RT 6.1.0 kernel package available in the distribution. The results (by ears) were better than standard kernel but not as good as those with the kernel 6.14.0 rc3 compiled by me.

Last edited by jimmi (25-02-2025 06:12)

Re: Pianoteq 8 on Raspberry Pi 5

The results should be reasonably repeatable and I expect performance should "win" in most cases but computers are awfully complex devices and no user program has exclusive control. Keep in mind "time" includes loading and initializing and unloading Pianoteq. Is it different for the trial version? You might try an interactive,real time compare of "performance" to "ondemand" using top. I like to get the PID of Pianoteq then use "top -H -p PID" to show Pianoteq's threads.

edit:
My Performance vs Ondemand

Performance:
real    0m36.990s
user    0m20.580s
sys    0m2.984s

OnDemand:
real    0m37.005s
user    0m24.971s
sys    0m2.531s

Last edited by levinite (25-02-2025 23:05)

Re: Pianoteq 8 on Raspberry Pi 5

Since last week i have a Stage license I'm using for my tests.  Today I tried again and the result was more logic:
ondemand

real    0m37.213s
user    0m27.061s
sys     0m1.020s

performance

real    0m37.189s
user    0m23.840s
sys     0m0.876s

I run also the test with top; the load of course was not stable, I tried to catch the highest:
ondemand
https://7girello.in/wp-content/uploads/2025/02/Picture4.png
performance
https://7girello.in/wp-content/uploads/2025/02/Picture1.png

In ondemand the load is considerably higher, even though I cannot understand exactly why.

Last edited by jimmi (26-02-2025 11:15)

Re: Pianoteq 8 on Raspberry Pi 5

Very interesting, thank you. I'm also running on a Pi5 with the DAC Pro (what used to be the IQaudio).

I think your sample rate (and consequently buffer size) settings are needlessly high, since you have an internal sample rate of 48kHz there's little point in having a host sample rate other than 48kHz which would allow you to drop the buffer size down to 128 samples (2.7 ms) or 192 samples (4 ms) and you'll probably find there's no longer any need for the RT kernel. This should also allow you to increase the polyphony to 256 without any adverse effects (at least these settings work for me).

Re: Pianoteq 8 on Raspberry Pi 5

jari_42 wrote:

...since you have an internal sample rate of 48kHz there's little point in having a host sample rate other than 48kHz...

Jari_42 this is the kind of thing trivial for someone, but not for me that I'm not familiar with these parameters. Thanks a lot for this precious hint.

Re: Pianoteq 8 on Raspberry Pi 5

Jimmi wrote:

In ondemand the load is considerably higher, even though I cannot understand exactly why.

If you are talking about "%cpu" then the faster it gets the job done, the less real time is required on the cpu.

Jari_42 wrote:

I think your sample rate (and consequently buffer size) settings are needlessly high...

I agree, no need to change sample rate and a smaller buffer and higher polyphony may work but If he uses morphing or layering he may want to change it back. Organteq already requires more resources than Pianoteq and some presets overload my pi5. Smaller buffers always require more resources. Personally, I choose the largest buffer which allows for an acceptable latency. Using his kernel patch might allow him to keep the buffer larger.

Re: Pianoteq 8 on Raspberry Pi 5

levinite wrote:

but If he uses morphing or layering he may want to change it back.

My use of Pianoteq is very basic: I own a Stage licence, I use to perform for myself or a small audience and accompany my voice, no other sophsticated use or sound research at the moment. Therefore I believe the suggestion perfectly fit it as long as does not affect the sound of Pianoteq standard instruments.
I reduced the sample rate to 48k and the buffer to 192, now if I leave the local control of my piano On I cannot distinguish the sounds generated internally and by Pianoteq.

levinite wrote:

Smaller buffers always require more resources

In fact with this new settings the use of CPU by Pianoteq incresed to 92% and also time is giving worst results:

real    0m37.720s
user    0m28.913s
sys     0m1.167s

The effort to compile the kernel is minor therefore it is stil worthy. I'll try different settings to find the best for me.

It is different for my brother that has a more professional use. He owns a Pro licence with internal sample rate 192k and Organteq licence. What I understood from him is that the difference between 48k and 192k comes mainly for recording purposes. At certain conditons he say that the Pi 5 perfectly fit his needs.

Last edited by jimmi (27-02-2025 03:26)

Re: Pianoteq 8 on Raspberry Pi 5

levinite wrote:

I agree, no need to change sample rate and a smaller buffer and higher polyphony may work but If he uses morphing or layering he may want to change it back. Organteq already requires more resources than Pianoteq and some presets overload my pi5. Smaller buffers always require more resources. Personally, I choose the largest buffer which allows for an acceptable latency. Using his kernel patch might allow him to keep the buffer larger.

Agreed, I don't use morphing or layering but that would significantly increase the resources required.

jimmi wrote:

The effort to compile the kernel is minor therefore it is stil worthy.

Do you still see an improvement with the RT kernel using a 48kHz sample rate and 192 buffer?

Re: Pianoteq 8 on Raspberry Pi 5

jari_42 wrote:

Do you still see an improvement with the RT kernel using a 48kHz sample rate and 192 buffer?

It reduce the CPU load and allow faster response, this is one point. When I have time I'll make a comparison between the performances of the 2 kernels.

There is another point I would like to clarify: by doing the tests mentioned in this thread it seems that there is not an advantage to reduce the sample rate and buffer size in term of use of resources and latency. The combination 192kHz/768samples has approximately the same result that 48kHz/192samples. Is there any other reason to prefer one above the other?

Re: Pianoteq 8 on Raspberry Pi 5

jimmi wrote:

It reduce the CPU load and allow faster response, this is one point. When I have time I'll make a comparison between the performances of the 2 kernels.

That's great to know, thank you very much.

jimmi wrote:

There is another point I would like to clarify: by doing the tests mentioned in this thread it seems that there is not an advantage to reduce the sample rate and buffer size in term of use of resources and latency. The combination 192kHz/768samples has approximately the same result that 48kHz/192samples. Is there any other reason to prefer one above the other?

You're doing more computation than necessary with 192kHz/768samples as you're upsampling the internal sample rate of 48kHz up to 192kHz and you're unlikely to hear any difference, although as you say there are some applications such as recording where it may be desirable to have a higher sample rate for mastering/postprocessing headroom.

Re: Pianoteq 8 on Raspberry Pi 5

command:
time ./Pianoteq\ 8 --preset "Shigeru Kawai SK-EX with Strings Pad" --play-and-quit --headless

Multicore rendering max:
real    0m23.490s
user    0m23.813s
sys    0m1.475s
25.288 user+sys

real    0m23.492s
user    0m23.421s
sys    0m2.269s
25.69 user+sys

Multicore rendering on:
real    0m23.589s
user    0m20.982s
sys    0m1.105s
22.087 user+sys

real    0m23.490s
user    0m20.953s
sys    0m1.187s
22.14 user+sys

Should we now use multicore on instead of multicore max???
I don't know if its a general case as it seems to happen mostly when Pianoteq is stressed more. Indeed, my working theory is some threads must wait longer to access the cpu. More testing needs to be done. Could be just an anomaly. Can someone check for similar results using the real time kernel?

Re: Pianoteq 8 on Raspberry Pi 5

jari_42 wrote:

That's great to know, thank you very much.

I switched between standard kernel and RT enabled kernel and I may say that with your setting the difference is negligible. Even increasing the buffer size to 384 the second kernel has not significant advantage, therefore I think I also will switch back to the standard kernel to use my STAGE version. May be the advantage is only visible with Pi 4 and/or PRO version with 192 kHz, I will keep testing when possible.

levinite wrote:

Should we now use multicore on instead of multicore max???

I did not know that the multicore option can be set differently than max. Options are not mentioned in the README file or in the manual and does not even appear in the "help" text.
I did some test too ( each test was repeated minimum twice), with standard kernel and with the output set to 48kHz and 192 samples I got larger differences, and looking at the CPU usage of Pianoteq I found approx 30% difference between the two multicore rendering:
multicore max

real    0m37.803s
user    0m29.324s
sys     0m2.349s

CPU 98% max

multicore on

real    0m37.826s
user    0m23.190s
sys     0m0.442s

CPU 68% max

Changing to RT kernel there are comparable differences but with much lower stress levels:
multicore max

real    0m36.989s
user    0m21.089s
sys     0m0.847s

CPU 66% max

multicore on

real    0m36.985s
user    0m15.407s
sys     0m0.310s

CPU 44% max

Playing musics I did not hear any disturbance. I think I'll keep using the on mode. Anyone may point out disadvantages doing it?

Last edited by jimmi (05-03-2025 03:58)

Re: Pianoteq 8 on Raspberry Pi 5

jimmi wrote:
jari_42 wrote:

That's great to know, thank you very much.

I switched between standard kernel and RT enabled kernel and I may say that with your setting the difference is negligible. Even increasing the buffer size to 384 the second kernel has not significant advantage, therefore I think I also will switch back to the standard kernel to use my STAGE version. May be the advantage is only visible with Pi 4 and/or PRO version with 192 kHz, I will keep testing when possible.

levinite wrote:

Should we now use multicore on instead of multicore max???

I did not know that the multicore option can be set differently than max. Options are not mentioned in the README file or in the manual and does not even appear in the "help" text.
I did some test too ( each test was repeated minimum twice), with standard kernel and with the output set to 48kHz and 192 samples I got larger differences, and looking at the CPU usage of Pianoteq I found approx 30% difference between the two multicore rendering:
multicore max

real    0m37.803s
user    0m29.324s
sys     0m2.349s

CPU 98% max

multicore on

real    0m37.826s
user    0m23.190s
sys     0m0.442s

CPU 68% max

Changing to RT kernel there are comparable differences but with much lower stress levels:
multicore max

real    0m36.989s
user    0m21.089s
sys     0m0.847s

CPU 66% max

multicore on

real    0m36.985s
user    0m15.407s
sys     0m0.310s

CPU 44% max

Playing musics I did not hear any disturbance. I think I'll keep using the on mode. Anyone may point out disadvantages doing it?

Ciao Roberto,

From these results it's absolutely clear that the realtime kernel with multicore on is the way to go. I had no doubts about it, but it's nice to see it confirmed in practice with your experiments, so thank you for doing them and for posting the results here. Why I had no doubts?

RT vs regular kernel: I'm sure this is clear to you since you started this conversation, but for others, the RT kernel is made in such a way that the "responsiveness" of particular applications is given priority. This is obviously what we want for playing music: Pianoteq running all the time at full throttle even if some other background service (e.g. checking for updates which most linux distro do from time to time) takes a very, very backseat

multicore on vs max: on tries to maximize throughput by putting processes in the most optimal core (which sometimes could be a non-free core if cache locality makes it more advantageous); max tries to maxime core utilization by scheduling processes in the most "spread out" way among different cores in the machine. IIRC, max also enables hyperthreading, which means uses more software threads than are available physical ones. I don't have a good grasp of the raspberry architecture, but on high end Intel and AMD this often "dirties" the cache, so cache coherence and cache locality can slow things down, even worse with hyperthreading. It can be a good solution in some very limited circumstances, but rarely so and on all my high performance machines it is never used (hyperthreading is off and multicore is on, not max)

Where do I find a list of all posts I upvoted? :(

Re: Pianoteq 8 on Raspberry Pi 5

I did another timing experiment. I used the same timing command line as before but first set the cpu frequency to 1.5 Ghz max (i.e. max=min). The results show the same multicore on "efficiency" but there is also clicking which is not heard when using multicore max. I don't know why, but it seems multicore max is better to avoid clicking issues -- at least on the pi5. Why offer multicore on? Perhaps it "plays better" when other programs are also running. I don't know.

Re: Pianoteq 8 on Raspberry Pi 5

After repeating the tests on a Pi4 I have to contraddict myself: multicore max is way better than on. The result of the tests shows the following:
multicore max

real    0m39,374s
user    1m21,624s
sys     0m4,306s

multicore on

real    0m52,045s
user    0m53,704s
sys     0m1,596s

in max mode some clicks or cracks appears, but in on the sound is badly interrupted and distorted. The device becomes unusable also for mild play. The RT kernel may improve the latency but has apparently a bad effect to for this problem, in max mode the disturbances increase with RT kernel (!). What looks like is that the system is already at his limit, I think Pi4 cannot be a good choice for intensive use.

I tested again the Pi5 trying to stress it with a lot of notes and sustain fully pressed: with on mode after sometime I may get disturbances appearing but in max mode I could not undermine it even with my best effort.

Last edited by jimmi (07-03-2025 05:47)

Re: Pianoteq 8 on Raspberry Pi 5

Very cool to encounter a couple of other Linux/RPi5 users of Pianoteq on this forum!

I also did a similar build last year, but decided to use Fedora Linux instead of RaspberryPiOS to get a newer PipeWire stack with improved JACK support in place. It's running quite well for me with 48 Khz, after doing some of the usual optimizations (realtime scheduling/priorities, etc.).

As DAC I chose the Scarlett 2i2 4th gen, mainly due to this excellent project. I CAD'd up and 3D-printed a custom black chassis for it to make it fit in more nicely with the beautiful VPC1, and another one for the RPi5.

Last edited by Eike (Today 00:27)