Topic: Hyperthreading

Can disabling of hyperthreading ("virtual CPUs") speed up the response of Pianoteq?

This question came up recently in Silent laptop advice.

I tried to get an impression with the following quick test:

To get a value for an overall-latency I clicked PTQ's virtual keyboard with a mouse and recorded the click-noise and the PC-speaker sound with the internal microphone of that notebook (velocity fixed at 100).

Then I switched off hyperthreading temporarily by starting the kernel with the option "nosmt", which stands for no simultaneous multithreading, and recorded the same PTQ sound again.

The timespan between the mouseclick and the Pianosound measured with audacity is the latency (example screenshot below).

With hyperthreading on (smt) I got an average overall-latency of 17.1 ms (19, 14, 17, 17, 15, 22, 20, 13).

With hyperthreadinng turned off (nosmt) the average was 18.9 ms (18, 16, 22, 13, 16, 22, 19, 20, 24).

No significant difference of latency found in this experiment.

Some technical details just if interested:

With smt:
---------
$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 78
Model name:            Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
Stepping:              3
CPU MHz:               2799.902
CPU max MHz:           2800.0000
CPU min MHz:           400.0000
BogoMIPS:              4800.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K
NUMA node0 CPU(s):     0-3

Without smt ("nosmt"):
----------------------
$ lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0,1
Off-line CPU(s) list:  2,3
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 78
Model name:            Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
Stepping:              3
CPU MHz:               2800.048
CPU max MHz:           2800.0000
CPU min MHz:           400.0000
BogoMIPS:              4800.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K
NUMA node0 CPU(s):     0,1

It is an office-notebook equipped with Pianoteq trial v6.5.3 just for this test. Only optimisation was setting the scaling_governor to "performance" on each CPU.
Testsound was Steinway D Prelude at 44.1 kHz and 64 samples buffersize for better latency, than the default 512 (!) samples. OS Debian Linux "Stretch".

Example latency measured in the wave editor:
https://i.postimg.cc/jw11GrYm/smt-vs-nosmt-44-1k-Hz-64samples.png

Last edited by groovy (18-07-2019 09:33)

Re: Hyperthreading

I don't believe that hyperthreading on or off, will affect the latency, what can be affected is perfomance, by the way there is a extensive post of a guy who tested several cpus and showed that pianoteq only took significative advantaje from the two first physical cores, hyperthreading and more than two cores had less impact on perfomance. Thus, it is better (or was at that time) to have two fast cores than four slow ones.
If I find this post, I will paste the link here...

Edit: this is the post https://www.forum-pianoteq.com/viewtopic.php?id=4149

Last edited by marcos daniel (18-07-2019 22:45)

Re: Hyperthreading

marcos daniel wrote:

Thus, it is better (or was at that time) to have two fast cores than four slow ones.

Better in respect of real-world latency? That's the question.

Minimum latency does not necessarily reflect some Benchmarking indices.

What has better latency? Two native CPUs or two CPUs emulating four (slower?) virtual CPUs aka "Simultaneous MultiThreading"?

An OpenBSD maintainer (following my upper link) indicated, that the additional multithreading layer could have negative impact on system "performance".

So I made my real-world experiment and found no difference in latency using 2 cores or 2 cores with SMT (= 4 "virtual" CPUs). But this is just a sample.

Re: Hyperthreading

I am almost convinced that latency has more to do with the audio interface and drivers, the cpu power will let you have more polyphony but not less latency. I did not find significative difference in latency between an AMD Turion M500 Intel Pentium 2020m. I've lost my latency measurements, but the Intel was newer and faster, latency was about the same.
I don't know how familiar are you with maths but latency could be predicted quite accurately by the following equation:
latency= 3*(pianoteq shown latency) + 4ms
or something like this, where pianoteq shown latency is what you read when configuring the buffer size, and frequency (pianoteq shown latency= buffer size/samplerate). I'm sure about the slope 3 but not the intercept, anyway it wasn't much different between those two processors. Those measurements were made recording on a channel pianoteq sound and on the other the digital piano sound, then measuring the distance between the two sounds, very similar to what you did. Driver was ASIO4all and the sound was generated in the internal sound card. USB sound card had much worse performance.
Now I own a much better laptop, if you are courious I can measure and see how it compares (if I find old measurements)

Re: Hyperthreading

Hey marcos,

this is not a thread about latency in general, there are already a few that go more in deep.

It's just about a single parameter: hyperthreading.

ceteris paribus

Cheers

Re: Hyperthreading

Not to doubt the results of your experiment, but I think the methodology could use improvement

According to what else was running and what else had run recently - the first measurement could have been slowed by other tasks/processes and or their residue.
The second experiment started on a relatively "clean" machine.

I would therefore do both experiments several times on clean re-boots.
There is still likely to be variation, something you can ponder.


So with hyperthreading turned ON the two lowest latency measurements were 13 and 14 ?
with it OFF they were 13 and 16 ?

Last edited by aandrmusic (21-07-2019 15:34)

Re: Hyperthreading

Both (smt and nosmt) have been done on clean re-boots. Thanks for your hint.

Re: Hyperthreading

I was just imagining what if Modartt could design/update a new sond engine for a processor like this :

https://www.amazon.com/Intel-i9-9980XE-...amp;sr=8-7

18 cores of 4,4ghz each...
36 threads...

They would be able compute much more details for ultra hyper realistic sounds.

Last edited by Beto-Music (22-07-2019 19:19)

Re: Hyperthreading

I would imagine that you would get a latency that is so low that the
notes you're playing will sound even before you pressed the keys

Beto-Music wrote:

I was just imagining what if Modartt could design/update a new sond engine for a processor like this :

https://www.amazon.com/Intel-i9-9980XE-...amp;sr=8-7

18 cores of 4,4ghz each...
36 threads...

They would be able compute much more details for ultra hyper realistic sounds.

Re: Hyperthreading

Yes, if the software core sound generator do not get redesign for ultra fast processors, only the latency will change.
That's why I said that to work it's require redesign the software core sound generator to allow more precision for extra calculation.  But it's just a idea, since at the moment the public with such processors are not enought to justify such investment.

I was one of the first, if not the first, to imagine and suggest to use non-real time calculation (audio export for example) to make pianoteq use a lot more calculation, more precision, to create perfect piano sounds up to the point to shut-up any critic. Studios, for example, could play the piano in real time( while the artist performs) and later render it using algorithms that requires heavier computation to produce perfect piano sounds. Even if took 1 hour to render 5 minutes of performance, it would be worthwhile.

The problem it's the high cost to redesign the software core sound generator. Other problem is that people could get used with the perfect sound rendered and start to spot/point more deffects in the real time playing mode. But today pianoteq 6.5 it's already very good, close to the real thing.

MrRoland wrote:

I would imagine that you would get a latency that is so low that the
notes you're playing will sound even before you pressed the keys

Beto-Music wrote:

I was just imagining what if Modartt could design/update a new sond engine for a processor like this :

https://www.amazon.com/Intel-i9-9980XE-...amp;sr=8-7

18 cores of 4,4ghz each...
36 threads...

They would be able compute much more details for ultra hyper realistic sounds.

Last edited by Beto-Music (23-07-2019 18:54)

Re: Hyperthreading

I don't think AVERAGE latency is what you should be looking at.
Maximum and minimums may be interesting, but predictability, i.e. deviation is far more significant and I think you need a LOT more than 5 measurements to get at that.

{Not good at stats, but I have a grasp of it}