In a previous post I have talked about how a lowlatency audio system could be defined relating to (very little) ear physiology and (a lot of) common sense. This may seem odd, as this is not a rigorous scientific approach, but since the human hearing system is mostly unknown it makes some sense: it is impossible to pretend to model it to a very deep degree. Then, a bit of common sense must be used. Among the conclusions in the linked post there was that a system should be defined as effectively lowlatency when the latency is smaller than 2 ms, while should be simple to tolerate a latency as big as around 7-10 ms in most situations. I made the example of playing an electric/acoustic guitar in a room to get there, with an amplifier but being able to hear the direct sound as well. However, the requirement I got for the latency is much harsher than the usually agreed 20 ms. But from where that 20 ms came from?
Who is familiar with room acoustics knows about the ‘Haas effect’. This effect applies to the perception of echoes in a room. In a room any sound source, like a loudspeaker or our voice, will emit a primary wave. This wave will be reflected at each wall, the reflections making up the total sound field. At each “bounce” the wave will loose a little of energy on the walls, because friction and local thermodynamic phenomena, and sooner or later there will be no more bounces, as the wave is gone. If the source emits a primary steady wave then, as soon it is shut down, the sound in the room will take some time to extinguish, as a consequence of the ongoing reflections. This is known as reverberation and it is a major concern in room acoustics, being deeply linked to perceptual attributes (for example speech and music clarity).
The ‘Haas effect’, basically speaking, describes the disturbance of the perception of the primary sound caused by the reflections. The whole of the effect is hard to describe, but it can be concluded from it that echoes arriving within 20 ms from the primary sound are way unlikely to cause any disturbance in the perception of the primary sound, not even if the energy of the reflection is 10 times the energy of the primary sound (that wouldn’t happen in real rooms, but it is possible to simulate it with loudspeakers). A reflection is more or less a delayed copy of the direct signal, just refined in frequency by the frequency dependence of the reflectivity of the walls. So we would expect, at first sight, to be able to treat latency in the same way, since latency produces a delayed copy of our primary signal. That could be indeed much the case in many real situations.
Coming back to our electrified acoustic guitar plus amplifier example, as we can hear both the primary signal from the instrument and delayed signal from the speaker, it is quite possible that we can move the loudspeaker all the way up until the sound it produces needs 20 ms to reach our ears and be comfortable to play anyway. That is because the hearing system integrates the stimuli arriving at it, being able to recognize a copy of a signal and use it to redefine the perception of the primary sound, the one we associate to a source, in order to gather more info, like its location and spaciousness. In other words, our ear integrates close repetitions of a signal together to extract information about a source with very clever signal processing. Then, why being so much harsher with the requirements for lowlatency systems?
Well, imagine now that you can’t hear the direct sound from your guitar: you are using an electric guitar and the level of sound from the loudspeaker is enough to mask completely the direct sound from the strings. Now the ear will be no longer integrating a direct sound and its copy. Instead, the brain will now develop an expectation about when the sound is to be heard as a consequence of our actions, while the ear will supply the perception of that consequence. This is a completely different job for our brain and the latency can quite likely get annoying for values way smaller than 20 ms. This is a common case for any musician who plays an electric instrument and have the monitor supplied by the audio system, maybe even through headphones. Moreover, it should be noted that the level of concentration that a trained musician have while actively producing the sound is way higher than the one of a causal listener (akin to the kind of subject used for the Haas effect experiments). Concentration and training could lower the threshold as well.
So, there are reasons to harsh the requirements for a lowlatency audio system.
For a look at a quantitative study, check Part 3.