For new Linux users audio can be a shock. Many newbies that ask for support on forums have a perfectly working system, but are prompted to think otherwise by the not-really-straightforward way audio on Linux works. This post is an attempt to summarize and clarify how Linux audio works on a standard modern Linux installation.
So, let’s dig in!
Firs of all, let’s clarify that this is not an exhaustive description, but more of a rough outline of how Linux audio looks like. There are many more info for each of these paragraphs (often entire wikis) so, if more is needed, use the internet to retrieve the appropriate information (there are links to sources throughout the text and at the bottom). Once the Linux Audio Anatomy is clearer searching for answers (and understanding them) should be much easier.
Linux is different
People coming from Windows or Mac are used to have all audio management done in few places. Usually, it is just needed to install the drivers for the audio devices, configure some parameters system-wise through some system control and maybe configure the devices through the software supplied with it by the vendor and/or through DAWs or other music software.
On Linux it is somewhat similar, but with few more layers and more counter-intuitive. We will have to deal with drivers, APIs and perhaps a couple of sound servers. Everything will become clear as we go on.
The device drivers
As we already recalled, on Windows and Mac we are used to install the drivers that come with the device. We do the same under Linux, but in a somewhat different way. Usually, in fact, we don’t install drivers developed by manufacturers (these usually don’t even exist for Linux). To understand why, we have to make a step backwards and talk about kernels.
What is the kernel?
Every operating system is kinda like an onion: there are multiple software levels. The innermost is the kernel. Very roughly speaking, the kernel is the most fundamental program running on a computer. It takes care of talking with memory, CPU and hardware. It runs in his own space in memory, while the applications are run in another space in memory, so that they do not interfere with vital OS functions.
Linux is different from Mac and Windows as it uses a Monolithic Kernel. Without entering into details (which I don’t really know that well) this means that the device drivers must be part of the kernel.
That is: a monolithic kernel is a huge piece of software (compared to other kinds of kernels) comprising every operating system fundamental functionality. Talking with hardware is considered a fundamental functionality, so under the Linux monolithic kernel all the software needed to talk with the hardware is part of the kernel itself.
Back to the drivers…
Now, how can the Linux kernel work on different computers? Every computer has different hardware! Shouldn’t that mean that we need a different kernel for each computer?
Actually, the kernel is made modular by, indeed, modules, which make possible to add/remove functionality on demand. It happens that most of device drivers are built as modules for the Linux kernel. So, for each machine we can load and unload kernel modules to add-remove support for hardware peripherals! Modern Linux operating systems can load automatically the required modules for a machine having supported hardware, usually through udev in modern distributions.
Note: kernel drivers and kernel modules are not the same thing. A module is a pice of software (included in the kernel) that can be loaded and unloaded into the kernel, while a driver is a piece of software (included in the the kernel) that implements support for hardware devices (it drives them). Most kernel drivers are built as kernel modules for convenience, so that they can be loaded and unloaded on demand.
Using lspci -vnn we can see all the components of our pc that are on the PCI bus together with the kernel modules and driver handling them. For example, I can see information about my on-board audio as follows:
00:03.0 Audio device : Intel Corporation Broadwell-U Audio Controller [8086:160c] (rev 09) Subsystem: Intel Corporation Broadwell-U Audio Controller [8086:160c] Flags: bus master, fast devsel, latency 0, IRQ 52 Memory at b2214000 (64-bit, non-prefetchable) [size=16K] Capabilities: <access denied> Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel
where we can see that the device is governed by snd_hda_intel. In general, it is possible to list all the currently loaded modules with lsmod. For removable devices it is perhaps easier to plug the device and look at dmesg. For example, when I plug my Scarlett 2i4 the first time after boot:
[24564.149904] usb 2-2: new high-speed USB device number 4 using xhci_hcd [24565.291620] usbcore: registered new interface driver snd-usb-audio
Meaning that to deal with the device my system needs the xhci_hcd module and the snd-usb-audio driver.
So, isn’t this a pain? How it is possible, with so many modules and drivers, that applications exists working the same way regardless of the hardware below? And how can I find a driver for my device?
Drivers and APIs
Audio devices manufacturers do not usually release kernel drivers for Linux, so there are many projects trying to create drivers, usually implemented as modules. These projects develop software frameworks able to deal with many different devices. The most important nowadays are ALSA and FFADO.
We seen already two drivers starting with the snd prefix. All drivers with that prefix are modules and come with ALSA. Not only ALSA supplies device drivers for a lot of USB and PCI sound-cards (many of which listed in the matrix), but it supplies also APIs and tools.
Roughly speaking, around the plethora of kernel (modularized) drivers, the ALSA developers organized a set of higher level software tools that make possible for developers to deal with sound devices in a unified way. To write an application that works with whatever card happen to be supported by ALSA it is needed to write an application using the ALSA API, ALSA will take care of dealing with the different sound-cards on the system.
It is worth to clear up how ALSA assign names to the devices, as this is often a point of confusion. Included with ALSA there are many command line tools. Among the most useful there are alsamixer, aplayer, arecord and amidi. They are respectively a command line mixer (handling all the channels for each device), a command line audio player, a command line audio recorder and a command line tool to send/receive midi events.
With alsamixer we can navigate among devices and channels and make sure all the levels for each channel/device are set as we want them. It is also possible to print sound card information.
The other commands accept the -l option to list devices able to play audio, capture audio and manage midi messages respectively. For example:
[crocoduck@arch ~]$ aplay -l **** List of PLAYBACK Hardware Devices **** card 0: HDMI [HDA Intel HDMI], device 3: HDMI 0 [HDMI 0] Subdevices: 1/1 Subdevice #0: subdevice #0 card 0: HDMI [HDA Intel HDMI], device 7: HDMI 1 [HDMI 1] Subdevices: 1/1 Subdevice #0: subdevice #0 card 0: HDMI [HDA Intel HDMI], device 8: HDMI 2 [HDMI 2] Subdevices: 1/1 Subdevice #0: subdevice #0 card 1: PCH [HDA Intel PCH], device 0: ALC269VB Analog [ALC269VB Analog] Subdevices: 1/1 Subdevice #0: subdevice #0 card 2: USB [Scarlett 2i4 USB], device 0: USB Audio [USB Audio] Subdevices: 1/1 Subdevice #0: subdevice #0
[crocoduck@arch ~]$ arecord -l **** List of CAPTURE Hardware Devices **** card 1: PCH [HDA Intel PCH], device 0: ALC269VB Analog [ALC269VB Analog] Subdevices: 1/1 Subdevice #0: subdevice #0 card 2: USB [Scarlett 2i4 USB], device 0: USB Audio [USB Audio] Subdevices: 1/1 Subdevice #0: subdevice #0
[crocoduck@arch ~]$ amidi -l Dir Device Name IO hw:2,0,0 Scarlett 2i4 USB MIDI 1
As you can see, the output of arecord -l (or aplay -l) is of the kind
card X: -Some Name-, device Y: -Some Info- -Other Info-
the numbers X and Y are important information, as ALSA calls the devices this way:
By this standard, my Scarlett 2i4 is called hw:2,0 by ALSA. This can be useful for configuring JACK, for example (see later). Sub-devices, if present, are just sound endpoints (like sets of channels or similar).
ALSA is now part of the Linux kernel (so it does not need to be explicitly installed in most distributions), but in the past OSS was in use. ALSA replaced it many years ago, but OSS is still widely used on many other Unix-like operating systems.
The FFADO project provides drivers for FireWire audio devices. It uses the underlying IEEE 1394 implementation in the Linux kernel (see here). As such the firewire-core and firewire-ohci kernel modules need to be loaded for FFADO to be able to drive the audio devices. JACK2 (see below) should be built against the FFADO framework in most modern distributions packages. As such, usually only the FFADO libraries are required to be installed for a completely functional audio FireWire setup.
This page aims to track support status for FireWire audio devices. It should be noted, however, that before to commit to FireWire audio the state of the onboard FireWire chipset support should be investigated first. lspci -vnn can supply information about the FireWire controllers communicating with the PCI bus on your machine. Ricoh chipsets are known to be dodgy on Linux. Word of wisdom is that Texas Instruments (TI) chipsets offer the best Linux compatibility. Make sure to know what chipset your computer FireWire port is running on and check for possible issues online before to buy an interface.
FFADO comes with many utilities but I am not very accustomed to it. I own a FCA202 that I use through a TI chipset. It works like a charm out of the box, so I do not have issues pushing me to a deeper understanding of the FFADO world.
We seen how audio devices are handled under Linux and we also seen that the difficulty in addressing different devices of different kinds is solved by the use of APIs. However, there might be high level functionalities and processes that developers might want to program when writing audio software that might be hard to implement even with the audio frameworks provided APIs. Nowadays, given the many new functionalities of ALSA, there are many computers that could run perfectly with ALSA alone (all my laptops, for example). However, in the past there were much more limitations, especially in the old OSS days.
This is the reason why higher level software was developed called Sound Servers.
Sound Servers do not provide kernel modules or drivers, but instead run as daemon programs in user space.
This means that when the user logs in a session, he/she launch the Sound Server (or this is configured to start automatically). Once launched the Sound Server runs in the background (because it is a daemon) and receives calls from audio software written to interact with it, through its API (yep! Yet another API).
The advantage is that many more (often higher level) functionalities are provided beyond the ones of the drivers and the API wrapped around them by frameworks like ALSA.
Also, many Sound Servers are written with a specific application in mind and make development of particular classes of audio software easier.
The disadvantage is that many Sound Servers flourished, making Linux audio a sort of API jungle from the developer point of view. Also, different Sound Servers might conflict (this was much more common in the past).
The most important Sound Servers in the Linux world are PulseAudio and JACK.
PulseAudio has become the de-facto Sound Server for desktop applications. Many user friendly Linux distros, especially the ones based on Ubuntu, usually have it installed out of the box. In this case the sound from desktop applications like system sounds, movie players or browsers is usually routed through PulseAudio (many desktop applications can also deal with ALSA directly though).
PulseAudio usually is configured to start through systemd (or other automatic ways) in most modern Linux distributions and it has many graphical configuration front-ends. In fact, most of the sound devices manager under DEs system control panels are actually sorts of PulseAudio graphical front-ends (although many of these managers can fall back to ALSA if PulseAudio is not installed).
When PulseAudio was first adopted it caused major rants in the Linux community as it broke audio for various reason in many situations. It is still possible to find many pro audio tutorials online recommending to kill PulseAudio every time some serious audio work needs to be done. I am not a fan of PulseAudio myself and I prefer not to install it at all, mainly because my computers can do fine on ALSA alone and I don’t like to include components I don’t need in my systems. However, nowadays PulseAudio creates much less troubles than, say, 6 years ago and usually things run smoothly. Beware of tutorials recommending to kill PulseAudio as they might be outdated and the way they advise to kill PulseAudio might not work anymore if you are running with systemd. Also, killing PulseAudio might not be needed at all. Consider killing PulseAudio for troubleshooting, as a mean to reduce complexity in your system.
JACK is instead the de-facto standard for professional audio. It can make use of pretty much all audio drivers and framework out of there (ALSA and FFADO especially, but also many more that can run under different OSes) and it can also integrate with PulseAudio.
JACK is built with lowlatency audio in mind and it supplies a cables like connection interface for audio applications, allowing users to arrange audio software as they would arrange pedals on a pedal-board.
It is usually best to install JACK2 with dbus support (especially if PulseAudio integration is desired).
This guide is perhaps a good starting point for beginners. The most common mistake beginners do when launching JACK is missing the configuration of the Input and Audio Devices. In qjackctl this is done in Setup > Advanced. Choose Input Device and Output Device from the dropdown menus. If you don’t recognize any of the names go back above (ALSA naming convention section) and select the appropriate hw:X,Y item. If your device did not show when entering the commands above bad news: apparently your device is not supported by ALSA (but there could be a workaround somewhere). Another common mistake is to not select the appropriate driver. Go to Setup > Driver if you need to use something other than ALSA. Selecting FFADO will make your FireWire device to work.
The Linux Audio Anatomy looks like this (low level to high level):
- Kernel with loadable modules. Drivers for audio devices are usually implemented as modules and are supplied by ALSA (or OSS in few cases) and FFADO. These audio devices frameworks make also possible to develop audio applications through unified APIs.
- One or more Sound Servers (optional). Most usually PulseAudio for desktop applications and JACK for pro audio applications. These are not part of the kernel and the user is in charge of their execution, either manually or by automatizing their startup.
- Audio applications. Depending on the developer’s choice an audio application might be able to talk directly to drivers through the ALSA (or other frameworks) APIs, to a particular Sound Server API only or through many APIs. There are, in fact, applications that can work directly with ALSA, PulseAudio or JACK (for example, Audacity): just select the API you want the application to work within its configuration window.
But why Linux is so counter-intuitive?
Well, for people coming from commercial operating systems this is confusing.
Interestingly, though, commercial operating systems do not work much differently under the hood.
Mac OS has perhaps the simplest and cleanest audio stack, CoreAudio. Still, depending on Mac OS version, it is possible to observe processes apparently working as sound servers (see that reported here, for example), similarly to PulseAudio or JACK.
On Windows the audio stack is as fragmented as on Linux, the most important audio APIs being ASIO, DirectSound and WASAPI (have a look at this for more into).
This goes as showing that audio stacks on commercial operating systems are not actually much simpler than Linux’s stack. This has to be expected: streaming audio is not a simple task for a multipurpose OS. What makes a difference is mainly how the stack is configured. While on commercial OSes there usually are nice GUI based tools that unify configuration, on Linux we often need to individually configure many different parts of the system (as an example, see how many checks realtimeconfigquickscan hast to perform in order to asses whether a Linux OS is properly configured for audio).
This happens because Linux is not developed as the commercial operating systems, i.e. completely by the same team of developers in its integrity. Its kernel is developed (by many different contributors) as a single entity but the other components come from many different projects. A complete Linux installation is a puzzle of many parts all coming from different sources: kernel + services + DE + applications… This contributes to this plethora of different APIs, not only in audio. Also, Linux tends to put the user in charge of understanding and configuring the system, which means that each single piece of software is released with all the means to achieve deep and finely tuned configuration. By contrast, on commercial OS configuration is usually performed by GUI programs that wrap around the many different layers of the stack. This might seem simpler, but it has the disadvantage of not allowing to really finely tune the system should that be needed for not typical applications. For this (and many other reasons) Linux is perhaps the best platform for acoustics and audio research, as well as algorithmic composition and audio/music related research in general. It is, in fact, the platform chosen by CCRMA.
However, even if more complex in its configuration, Linux is able to achieve performances in line with every other desktop operating systems out of there. The myth and legend that Macs are better for audio is complete nonsense: I have witnessed much more buffer overruns on my company MacBook Air than on my Entroware Apollo (OK, the latter is more powerful… still…). The truth is that in 2016 any computer with any OS can do audio like a champ (actually, this started being true from the mid 90s).
For more info see:
Note: as discussed in Latency: Myths And Facts. Part 1. and Latency: Myths and Facts. Part 2. Why echo perception is different? latency is not the most important thing on a audio setup. Still, these research papers show that latency and stability among all desktop operating systems is comparable if they are properly configured. Since fanboys usually love to compare latency figures… here we go. For a look at an academic result about latency perception, see Latency: Myths and Facts. Part 3: A look at a quantitative study.
So, is the configuration complexity of Linux worth it? Or, in other words, which operating system should one choose for audio? Just look at the software. Do you like garage band? Get a Mac. Do you like Ardour? Get Linux (or Mac). Do you like Libre and/or Open Source philosophy and you really wish to make music on Linux? Then give it a try! I personally love Linux audio software much more than any other program out of there and I am glad to be a happy Linux user. I found that taking the time to learn pays back.
If you are a new Linux user and you think you are liking what Linux is and what it has to offer, don’t be scared by the complexity and dig into it! You will find that once you got your head around your system you will be able to get good and reliable performances out of it!
*well, unless your hardware does not really like Linux…