For any audio system, may it be, for example, an analogue electronic circuit, a digital circuit, a whole computer or a physical wave-guide, there will be some time lag between the instant at which the signal enters the system and the one at which the signal exits. This happens for a lot of reasons, from the finite propagation speed of sound waves to the AD/DA conversion times, from the time required by the operating system (if we are using a computer) to complete a task to even the eventual hysteresis of the system. Latency is very often considered an important parameter of audio systems, especially computers setups for audio purposes (that we refer as computer audio systems including any sound-card and operating system). More in depth, it is usually convened that latency should not be audible, that is, the user shouldn’t be able to hear any possible delay between its actions, which cause the input signal/event to be generated, and the relative sound output. However, this is not always the case.
But first of all, when latency is not audible?
According to Fastl & Zwicker the temporal resolution of the human hearing system is around 2 ms. It may seem appropriate then to define the lowlatency audio system as the audio system that does not introduce a latency bigger than 2 ms. However, even if this temporal resolution always applies, when complex sounds (and not the test stimuli used in psychoacoustics experiments) are heard the perceptual consequences may not justify such an harsh requirement. In fact, raising the latency from 2 ms may give a broad range of different perceptions before to give a clearly audible delay. That reason is that the perceptual attributes of audible stimuli are linked to their objective properties in a not linear way. It may happen that the perceived timbre is different, or maybe the rhythm patterns feel different, all depending on the temporal patterns and frequency patterns of the stimulus and whether or not we can hear the direct sound as well. For these reasons, by the way, the information about the temporal resolution is extrapolated extrapolated experimentally by other perceptual phenomena, like temporal masking patterns. In other words, even if latency is objectively a delay, it is not always perceived as a delay. These are among the odd consequences of the non linearity of our hearing system, evolved to gather information from all different objective attributes of a signal and not to just map them all (see Part 3 for a more detailed discussion on latency perception). All these facts are very hard to model and only superficially understood. Then, it is hard to find an upper limit for a lowlatency audio system only on psychoacoustical (partial) evidence, as we are imposing that the user should not hear delays. I then propose this different approach, based on everyday experience.
Imagine you are playing an electro-acoustic guitar using also an amplifier and a loudspeaker. Suppose also that the sound pressure level of the wave emitted by the loudspeaker is not high enough to mask completely the direct sound from the instrument. This may happen while practicing, for example. Usually, you will have your amplifier some meters away from the center of your head. Let’s say, a distance between 1 and 3 meters. Assume also the air in the room as homogeneous, in equilibrium at a temperature of 20 °C. As we are using low amplifier gain to make the instrument itself audible then we assume linear behavior of air. Putting these assumptions together the phase speed of sound will be equal to 343 m/s, independent on frequency. Thus, if we call the time required by the wave to travel from the speaker to our head we can write:
and we can calculate the times required to cover 1 and 3 meters respectively. These times are approximately 3 ms and 9 ms. We are used to play with an amplifier up until 3 meters away (and maybe even much more) without hearing a delay between our instrument and the sound from the speaker. Then we could think that 9 ms is an appropriate upper limit. However, it is preferable to push the audio system a little bit further, as we are probably used to play in a room. In fact, when referring to our experience, we must not forget that the perception of an eventual delay due to acoustic latency can be made difficult by the grade of diffusion and reverberation of the sound field, that always exists in usual rooms, due to reflections, even if the time lag could be enough to notice it. In other words, the real audio system involved is the amplifier + loudspeaker + room, the last subsystem introducing, among other effects, reverberation, which tends to blend latency. Also, we must take into account that if we setup a certain latency for an audio system and then we attach external loudspeakers to it we will introduce a further bit of latency we must compensate for due to the distance of the speakers from our head. Since the sound needs 2 ms to travel 0.7 m, a distance likely to be similar to the distance of our monitors from our head, we may define the upper limit as 7 ms.
Summing up, we might introduce these rule of thumb definitions:
An audio system is said to be practically lowlatency when the full chain latency it introduces is less than 7 ms.
An audio system is said to be psychoacustically lowlatency when the full chain latency it introduces is less than 2 ms.
These are not official definitions, and do not reflect psychoacoustic evidence on latency perception (which appears to be scarce, an example is reviewed in Part 3). However, those are effective in my experience and seem to make sense to me. A pyschoacustically lowlatency system is preferable and it would be the ultimate lowlatency system: it makes sure that a delay cannot be heard. However, for practical purposes, 7 ms are OK. To be noted that I used full chain latency. This is the latency measured between the instant the signal enters the input(s) of the audio system and the one at which it exits the output(s). This, for computer audio systems, cannot be calculated a priori as it involves time lags that can be known (like the ones introduced by the operating system and DSP buffering) and other that cannot be known without a freaking huge effort, like the time the converters of the sound-card need to convert the signals or other lags required on the hardware-firmware level. It is much easier to measure this time-lapse indeed. A technique is showed in a linked article at the end of the post.
OK, but what are these practical purposes?
Well, it depends. If I want to use my computer to play on stage with the goal of processing my signal then the latency must be as low as possible. This is, indeed, a practical purpose that requires lowlatency operation. But If I just need to master a song lowlatency is not needed at all. I may need low latency when recording, if the monitoring is supplied by my computer audio system as well… But I won’t need it if the monitoring is supplied by a parallel output of the preamplifier I use to inject the signal into the sound-card, as is often the case. Latency, on contrary of what most people believe, is not a central attribute defining the quality of a computer audio system. In fact, it is important only for few practical purposes! However, it should be noted that a computer audio system able to work in a stable way according to a particular definition of lowlatency behavior will operate stably also raising the latency (… usually…). For these reasons, if and when I will review some Linux audio setup, I will check if it is able to reach lowlatency operation, so that the reader can understand if the system can fit his/her needs. So, here the Facts:
- Latency is important only when the output must be supplied quickly, without noticeable lags.
- The human hearing system (made of all the systems from the outer ear all the way up to the parts of the brain involved in creating the perception of sound) have a temporal resolution of 2 ms.
and here the myths:
- Lowlatency operation is always important and an audio system that have a big latency sucks.
- A lag of 20 ms is the limit for lowlatency operation.
With regards to the last one, the limit is subjective: any human being has a different ear and then a different limit. However, as the resolution is, on average, ten time smaller than 20 ms, I expect the limit to be smaller than this value on average.
For a related very interesting reading I link this page.
For Part 2 visit here.