The Android audio stack from a music player's perspective

Happy 2025! Long time no see, it’s really been way too long since the last blog post. I lost three drafted posts I was working on… and then kind of lost all motivation. So I’ve been working on Gramophone instead, a music player app for Android. And one day, a user asked if Gramophone can play Hi-Res (192 kHz, 32 bit FLAC files) audio without quality loss on a USB DAC when enabling the “float output” toggle. That was a question that led me into a pretty deep rabbit hole.

Disclaimer: you WILL get a headache reading this blog post, as it’s a recycled draft. I lost motivation to rewrite it to make it readable, but the information collected here still is worth sharing. This post was not proof-read. Good luck!

First of all, if you notice any inaccuracies, please point them out in the comments! This has been a lot of reading code and I won’t put it past myself that I have confused something. Additionally, please note that any observations about ExoPlayer apply to 1.8.0, while observations about AOSP code apply to Android 16 as of publishing date, and observations of dumpsys are from a Pixel 7a running an Android 14 QPR3 custom ROM without related changes to my best knowledge. Older Android versions may have significant differences; such as not supporting anything above 16-bit bit depth. Same thing applies to newer Android versions, of course.

And remember: manufacturers sometimes significantly customize the Android code! MediaTek, Qualcomm, Samsung and friends may have some further differences!

About ExoPlayer’s sample format restriction

So wait, let’s take a step back. What does the “float output” toggle in Gramophone even do? Gramophone uses a library by Google called ExoPlayer for media playback. When enabling the float output toggle, setEnableAudioFloatOutput(true) is called on the DefaultRenderersFactory provided by ExoPlayer. The Javadoc says:

Sets whether floating point audio should be output when possible.

Enabling floating point output disables audio processing, but may allow for higher quality audio output.

The default value is false.

Hmm, it “may allow for higher quality audio output”. That’s relative to the default mode, but doesn’t guarantee anything about actually having no quality loss. So we need to find out what this method does to answer the question. First of all, why isn’t that the default if it may allow for higher quality audio output? The answer is in the same sentence: it disables audio processing. What audio processing do they refer to? For a long time, I was under the impression this disables AudioEffect and hence third-party equalizer apps. This is however not true, AudioEffect fully supports high-resolution audio up to 192 kHz with float32 as of Android 10 - point to note: Android does not support 384 kHz or 768 kHz in the effects framework as of today, but supports up to 192 kHz with int16 or float32 effects (int24 and int32 appear unsupported). The Javadoc refers to ExoPlayer’s built-in audio processors, a few of which are configurable:

    /**
     * Creates a new default chain of audio processors, with the user-defined {@code
     * audioProcessors} applied before silence skipping and speed adjustment processors.
     */
    public DefaultAudioProcessorChain(AudioProcessor... audioProcessors) {
      this(audioProcessors, new SilenceSkippingAudioProcessor(), new SonicAudioProcessor());
    }

and a few of which are hardcoded:

    toInt16PcmAudioProcessor = new ToInt16PcmAudioProcessor();
    toFloatPcmAudioProcessor = new ToFloatPcmAudioProcessor();
    availableAudioProcessors =
        ImmutableList.of(trimmingAudioProcessor, channelMappingAudioProcessor);

Let’s start with the SilenceSkippingAudioProcessor - it is relatively obviously named, and in fact disabled by default, though it can be enabled by calling another method up the chain. The ChannelMappingAudioProcessor seems to be responsible for switching around channel order in case a Vorbis or Opus file with more than two channels is played back, because it is different from the canonical Android channel order. The TrimmingAudioProcessor is responsible for removing samples at the start of the audio based on encoderDelay and encoderPadding in Format which appear to be used for gapless playback. Meanwhile, the SonicAudioProcessor is not just named after Sonic the Hedgehog cough, it’s actually doing multiple things at once:

it can resample audio, though that is not used by DefaultAudioSink, it will always output the same sample rate as renderer generates
it can change the speed of audio, if not in tunneling mode (for A/V sync), and not playing in offload mode
it can change the pitch of audio, if not in tunneling mode (for A/V sync), and not playing in offload mode

ExoPlayer uses SonicAudioProcessor for speed and pitch changes by default (though it also supports using AudioTrack parameters for that instead, which is used for float output among others). However, SonicAudioProcessor only supports the int16 PCM encoding. If you play a 24-bit or 32-bit audio file, ToInt16PcmAudioProcessor is responsible for converting the audio to 16 bit before it is given to SonicAudioProcessor. Point to note: ExoPlayer, with disabled float output mode, converts all audio samples to int16. So this is a quality loss as defined by the user who had asked the question. But SonicAudioProcessor is not active in float mode, because float mode disables the DefaultAudioProcessorChain and the ToInt16PcmAudioProcessor, and uses ToFloatPcmAudioProcessor instead, which can convert 24 bit and 32 bit int audio to float32 - point to note: In float output mode, ExoPlayer increases the bit depth of int24/int32 files to float32, which is a lossless conversion in case of int24, but a lossy one which may cause 8 bits of quantization errors for int32 input format. (I however also want to note here that these quantization errors are probably not harmful to any real-life audio playback use case ever.) Do note the omission of 16 bit, ToFloatPcmAudioProcessor cannot convert 16 bit audio to float32 - point to note: ExoPlayer’s float output codepath is disabled when playing 16 bit audio files. However, other than ToInt16PcmAudioProcessor and ToFloatPcmAudioProcessor, there are no modifications done to the audio that decrease quality in any way on ExoPlayer’s side. Point to note: other than above caveats, sound quality is not reduced by any of the processing ExoPlayer does. ExoPlayer delivers samples with original sample rate to the underlying AudioTrack - although there’s a hard upper limit of 192 kHz -, with int16 or float32 encoding respectively. (As side note, native int24 and int32 output support for ExoPlayer is planned from Google’s side, and I also sent two pull requests - although it will have to require Android 12. Without this, AudioTrack will only be operated in int16 or float32 modes by ExoPlayer.) This means the actual potential increase of playback quality that “float output” achieves is by configuring the underlying AudioTrack in float32 PCM mode, instead of using int16 and discarding some of the data if the file has higher bit depth.

The audio data that is being processed and sent to the AudioTrack is obtained through decoding with MediaCodec, which is guaranteed to support up to at least 192 kHz with 24 bit audio, again, as of Android 10 for WAV and FLAC. MediaCodec, too, can only operate in int16 or float32 mode before Android S. Manual testing on U confirms int24, int32 and float32 to be supported at 192 kHz. However, that does not tell the entire story: Although it’s not trivial to test this, MediaCodec in fact supports 384 kHz audio as well! I hacked up a quick and very stupid patch (please don’t look at it, I’m only posting it so others can reproduce my results) that resamples everything to 192 kHz with really poor linear resampling that pretends everything is 16bit mono. As stupid as that may be, it is enough to audibly confirm if the frames are extracted or if we only get silence or noise. With this patch, we can play back int16/int24/int32/float32 WAV files and int16/int24/int32 FLAC files (reminder: the reference FLAC encoder as of today does not support float32) at 384 kHz. However, for full official support for 384 kHz from MediaCodec side, Google would have to at least edit AudioCapabilities.cpp twice and do the equivalent changes in MediaCodecInfo.java as well, and more changes may be needed. Point to note: MediaCodec does not officially support anything above 192 kHz as of today, but in practice supports 384 kHz with int16/int24/int32/float32 for WAV and FLAC. Of course, excluding float32 for FLAC due to missing encoder support.

To reiterate all points to note so far:

Android does not support 384 kHz or 768 kHz in the effects framework as of today, but supports up to 192 kHz with int16 or float32 effects (int24 and int32 appear unsupported TODO so what happens when playing int24 song + effect?)
ExoPlayer, with disabled float output mode, converts all audio samples to int16, causing up to 16 bits of data to be discarded
In float output mode, ExoPlayer increases the bit depth of int24/int32 files to float32, which is lossless for int24 but may cause 8 bits of quantization errors for int32 (but they likely do not matter for playback use case, nonetheless, native int24 / int32 output is planned)
ExoPlayer’s float output codepath is disabled when playing int16 audio files
other than above caveats, sound quality is not reduced by any of the processing ExoPlayer does
MediaCodec does not officially support anything above 192 kHz as of today, but in practice supports 384 kHz with int16/int24/int32/float32 for WAV and FLAC

TL;DR: MediaCodec decodes to PCM data (up to 384 kHz as int16/int24/int32/float32), ExoPlayer in float audio mode converts it to float32 (and in normal output mode, converts to int16, respectively) but otherwise without quality degradations and gives it to the AudioTrack (which can take up to 192 kHz).

The life of an AudioTrack

To go back to answering the user’s question: We have nice 192 kHz float32 audio coming from ExoPlayer going into our AudioTrack now, but does that audio reach our fancy 1,000,000$ USB DAC without quality loss?

To understand this, we need to understand what an AudioTrack does and how it works. From now on, it’s going to get a bit confusing, so let me take a step back. I am writing an Android application. Android applications are usually written in Java or Kotlin. Android provides official Java APIs, such as android.media.AudioTrack. Android also provides official C APIs, such as AAudio. But both are usually different ways to reach the same goal. To over-generalize a bit, every Java API either does magic by calling a system service using Binder RPC (Android’s way to talk to other apps or the system services) or by calling C code which eventually calls Binder RPC as well. A C API calls other C code, which, unsurprisingly, ends up using Binder RPC. But there are significant differences, because the traditional APIs for Java are usually very old. The Java and C APIs are developed by different teams, and sometimes one has small features the other does not have (such as AAudioStream_getHardwareSampleRate, which can’t be done using alternative APIs). AudioTrack exists since Android 1.5, which is quite a while ago, in fact the first commercially launched Android version. Meanwhile the alternative API OpenSL ES (which in the meantime was deprecated again) exists since Android 2.3, and the other alternative API AAudio is new in Android 8.0 - those APIs are quite different to use in practice, but their inner workings naturally evolved from the AudioTrack model. Remember that all of these APIs do the magic by eventually calling a system service through Binder RPC - just because they introduced a new API doesn’t mean they have to redesign the entire Binder-side interface. And if the Binder interface is the same, some of the lowest-level C code can be the same as well.

What I was referring to as AudioTrack is the java class android.media.AudioTrack, hereafter referred to as AudioTrack.java. AudioTrack.java’s magic is completely implemented by the C++ file android_media_AudioTrack.cpp. This file is a very thin wrapper around the C++ class android::AudioTrack from libaudioclient (before Android 8.0: libmedia), which I’ll call AudioTrack.cpp from now on. Do you know who else uses AudioTrack.cpp? OpenSL ES - for playing PCM buffers - and AAudio! That’s right, all of the Android audio APIs are wrappers around AudioTrack.cpp. Okay, to be fair, almost: there is one exception - AAudio’s MMAP mode, which was added in Android 8.1 to achieve extremely low latency, is NOT an wrapper around AudioTrack.cpp. Okay, now we know almost everyone uses AudioTrack.cpp for audio output, but remember how I said the APIs sometimes have little differences in functionality? (another example for that being offload support, AudioTrack.java has that since Android 10 but AAudio since Android 16) There’s something I haven’t told you yet: some system components use AudioTrack.cpp directly for advanced use cases, and for that reason it actually supports more than the features of all the APIs based on it combined.

Now that this is clarified, back to our AudioTrack.java that gets fed the 192 kHz float32 audio. When DefaultAudioSink from ExoPlayer creates a AudioTrack.java, native code is called and a AudioTrack.cpp is created and set up. The AudioTrack.cpp then (in createTrack_l) talks to AudioFlinger (the system service responsible for coordinating audio streams and mixing audio) using Binder RPC and asks for a new IAudioTrack (note the I in the beginning). An IAudioTrack is the backing Binder interface for the AudioTrack.cpp functionality provided by the AudioFlinger system service. AudioFlinger then goes to AudioPolicyManager (the service responsible for deciding when to use bluetooth, USB or speaker for audio) to ask for an IO handle based on the provided attributes, flags, format, et al. It decides the output device by first consulting dynamic policy mixes (those are used for screen recording and Miracast) and then the classic policy engine which runs through some strategies to finally choose a device.

Now that we have a device selected, we need to somehow convert this device to an actual output - which raises the question what an actual output would even be. Android represents outputs as mixPorts, which are opened by AudioFlinger. Any data written into a mix port goes directly to the device-specific audio HAL. There’s commonly a “deep buffer” mix port used for music playback. One mix port can be used for multiple devices, i.e. the deep buffer mix port in this example can be used both for speaker and wired headphone playback. To differentiate and make sure the audio is outputted where we want it to be, the possible output devices are tracked as so-called devicePorts. Each active devicePort then gets assigned a mixPort when in use, and the HAL is being told that through so-called “audio patches”. An interesting property of both is that they can have audio profiles (combinations of sample rate / format / channel mask), however, the profiles of a devicePort actually do not matter at all in the entire stack and are purely informative. They are however exported to apps as part of AudioDeviceInfo, which can be misleading for developers. (For example: on my device, the bluetooth audio HAL’s created devicePort has only one profile only supporting AUDIO_FORMAT_PCM_16_BIT, even though the default input for the LDAC encoder is actually AUDIO_FORMAT_PCM_32_BIT - and one can observe that being the actual value used in logs.) Anyway, the AudioPolicyManager has selected a device, which corresponds to a devicePort which has one more important property: there is an optional allowlist of encoded formats. This is used for restricting A2DP offload to specific formats; for example, there can be one offload devicePort only supporting SBC and AAC, and another software A2DP devicePort supporting all encoded formats including LDAC. After selecting the first compatible devicePort for the device (based on the allowed encoded formats), it decides on a mixPort routed to the selected devicePort and this mixPort (output stream to the HAL) is then used to output audio. This mixPort decision is done in a multi-step process:

if there is a spatializer, a non-direct (implies non-offload) mixPort is requested, no preferred mixer attributes (custom sample rate / format / channel count settings for USB DACs) are set, audio attributes are unset or compatible (compatible meaning: content type set to either media or games, and no explicit request to not be spatialized, and audio not already spatialized), spatializer is enabled for the channel count, the audio format is compatible PCM and the track isn’t a stereo low-latency track, and the haptic configuration is compatible with the spatializer, and the current device has a spatializer output, then the spatializer mixPort is chosen
if offload was explicitly requested as flag (implies direct flag is set as well), and there is no non-offloadable effect enabled and master mono is disabled, and the deep buffer flag isn’t set, and this device has a offload profile that corresponds to the request, and there aren’t already too many direct or offload mixPorts opened, a new offload output is opened, which means a new IO handle will be allocated (will try below options on failure)
if a direct non-offload track was explicitly requested, and the deep buffer flag isn’t set, and this device has a direct profile that corresponds to the request, the system tries to reuse an already-open stream with the same parameters opened by the same client, and if there is nothing to reuse, and there aren’t already too many direct or offload mixPorts opened, a new direct output is opened, which means a new IO handle will be allocated (will try below options on failure)
if a non-direct track with one or more of non-linear PCM output format (i.e. FLAC), sample rate higher than (excluding) 192 kHz, or more than 2 channels is requested, and the deep buffer flag isn’t set, and this device has a direct profile that corresponds to the request, the system tries to reuse an already-open direct stream with the same parameters opened by the same client, and if there is nothing to reuse, and there aren’t already too many direct or offload mixPorts opened, a new direct output is opened, which means a new IO handle will be allocated (aborts on failure because mixer can’t handle those)
if the track uses a linear PCM format, and a preferred mixer attribute is set (can only happen for USB devices), the system tries to find an already-open output on the same device with the same parameters, and if there is none, falls back to opening a new output
if the track uses a linear PCM format (and, impliedly, no preferred mixer attribute is set), the system sorts all open outputs on the chosen device based on the following criteria - disqualifying tunneling, direct (implies offloading) and MMAP outputs and chooses the best match:

The priority is as follows:
1: the output supporting haptic playback when requesting haptic playback
2: the output with the highest number of requested functional flags
    with tiebreak preferring the minimum number of extra functional flags
    [functional flags: AUDIO_OUTPUT_FLAG_VOIP_RX, AUDIO_OUTPUT_FLAG_INCALL_MUSIC,
            AUDIO_OUTPUT_FLAG_TTS, AUDIO_OUTPUT_FLAG_DIRECT_PCM (do not confuse with plain direct flag),
            AUDIO_OUTPUT_FLAG_ULTRASOUND, and AUDIO_OUTPUT_FLAG_SPATIALIZER (ref 1)]
3: the output supporting the exact channel mask
4: the output with a higher channel count than requested
5: the output with the highest sampling rate if the requested sample rate is
    greater than default sampling rate [currently 48 kHz (ref 2)]
6: the output with the highest number of requested performance flags
    [performance flags: AUDIO_OUTPUT_FLAG_FAST, AUDIO_OUTPUT_FLAG_DEEP_BUFFER,
            AUDIO_OUTPUT_FLAG_RAW, AUDIO_OUTPUT_FLAG_SYNC (ref 3)]
7: the output with the bit depth the closest to the requested one
8: the primary output
9: the first output in the list

(ref 1, ref 2, ref 3)
In practice, when requesting a perfectly standard (not deep buffer, while that’d be standard for music, I’m talking about something else here) 44.1kHz float32 stereo AudioTrack for any phone where the primary output is 48kHz float32 stereo, it will get assigned the primary 48 kHz output (because 44.1 kHz is not larger than 48 kHz, the sample rate difference will be ignored, and all other parameters match).
(Bonus chatter: the flag AUDIO_OUTPUT_FLAG_DIRECT_PCM mentioned above sounds confusingly similar to AUDIO_OUTPUT_FLAG_DIRECT, but in fact the PCM variant does not cause any special behaviors inside the audio stack. It just is an opportunity for OEMs to create more output types that apps could actually get from the audio policy manager, if the flag were public API… The most common usage of the PCM one is having both AUDIO_OUTPUT_FLAG_DIRECT and AUDIO_OUTPUT_FLAG_DIRECT_PCM in the format bit mask for one mixPort.)

This whole flow requires that all relevant outputs are either direct, or already open. AudioPolicyManager opens every mixPort (in this code, confusingly called output profile) which can be routed to any attached device except MMAP outputs (if no dynamic audio profiles are present - dynamic refers to APM having the capability to choose the sample rate / format / channel count at runtime) when HAL registers and when devices are connected, and all of them except direct outputs are kept open indefinitely. If we have an audio profile (this time, this refers to a format / sample rate / channel mask combination) information, AudioPolicyManager picks a profile for the mixPort based on the best format, highest channel count (if mixed - but we can ignore the direct codepath, it does not restrict us from specifying high channel counts because this automatically opened mixPort will be closed again soon, and app’s preferred settings take priority over this automatic format decision) and highest sample rate (if mixed, same as channel count; additionally, we have an upper limit of 192 kHz here). If there are no known audio profiles or only dynamic audio profiles, unspecified values will be passed and the HAL which can choose whatever it likes best (emulator HAL for illustration purposes).

Okay, whew, that was a lot of information to digest! We now acquired an IO handle, and the construction of our AudioTrack continues on AudioFlinger’s side - it creates an AfTrack, gives the thread corresponding to the IO handle to the track (each IO handle corresponds to one thread on AudioFlinger side), and creates a wrapper which provides the IAudioTrack API using the backing AfTrack which is given back to AudioTrack.cpp. When we start our track and write data to our AudioTrack.java, it will end up inside the AfTrack and it will give it further down the chain to it’s thread, which is either an MmapThread (used for AAudio’s MMAP mode, not supported by AudioTrack.cpp) or one of the PlaybackThreads: BitPerfectThread, SpatializerThread, OffloadThread, DirectOutputThread and the MixerThread. We’ll talk about AAudio’s MMAP mode in detail later, so let’s focus on the PlaybackThreads. The AfTrack is assigned to the PlaybackThread which then produces audio while there are active tracks. There are important differences in how the PlaybackThread continue to operate:

MixerThread is one thread that can mix multiple tracks, resample per-track, apply volume, up/down-mix, convert audio formats and apply AudioEffects before giving the audio data to the HAL
DirectOutputThread is really simple: one and only one track can play on one output, no track mixing, no resampling, no format conversion, no up/down-mixing, the track must perfectly match what the audio HAL wants, but in return we get zero interference by AudioFlinger
SpatializerThread is a normal MixerThread with the addition of a auto-creating and force-attaching a virtualization or down-mixing effect (depending on the situation) to play multi-channel content virtualized to binaural stereo
BitPerfectThread is a normal MixerThread with the addition of bit-perfect playback (either only the bit-perfect track is active, in which case it’s passed through bit-perfect, or it’s muted and only other tracks are being mixed i.e. alarm or notification), but it requires audio HAL support and is only supported for USB audio
OffloadThread is a DirectOutputThread but for compressed audio formats such as MP3 or FLAC (since Android 16, also supports PCM); decoding and potential resampling/mixing is done on some low-power DSP chip in firmware and you’ll never know if this is the best fidelity you can get (bypassing the entire Android audio stack’s processing after all) or really poor quality because of poorly implemented DSP firmware

This an ideal strategy of preference when considering fidelity in a music player app would be: BitPerfectThread > DirectOutputThread > OffloadThread > MixerThread > SpatializerThread.

For our purposes of high fidelity playback, a BitPerfectThread sounds like something we’d like to use. However, this feature relies on the USB audio AIDL HAL to support it. The only commercially available device that does so at this moment is the Pixel 9, and of course, this only applies to USB outputs. If we want to use a BitPerfectThread, we have to set the bit perfect flag in our preferred mixer attributes, which we can do by following these guidelines, which are public API (so, can be used no problem). Easy-to-use support for ExoPlayer is planned from Google’s side as well. If the user happens to have a device where this is supported and this is enabled in our app (which means ExoPlayer will support a fully bit-perfect audio chain as well), we can conclusively answer the question with “no fidelity decrease”.

A SpatializerThread is not helpful for our goal of high fidelity playback, but if we wanted to use it, there are guidelines for both public platform media APIs and ExoPlayer. In a similar way as we enable it, we can also disable it and hence prevent any fidelity decrease thanks to it (but remember, some users like the spatializer effect, so don’t just forcibly block it).

The problem with OffloadThread very much is that we don’t know if quality is going to be amazing or terrible, but it’s more power efficient for sure - but the compatibility with weird media files will decrease when enabling offload, hence I personally would recommend a default-disabled toggle, using the easy-to-use ExoPlayer APIs that are available for implementing this. If the user happens to have a device where this is supported, we can conclusively answer the question with “no fidelity decrease (other than whatever the DSP firmware is up to)” (reminder: the int / float output or any conversions do not matter at all for offload, because in offload mode, ExoPlayer outputs the unprocessed compressed media file). An additional nuance in Android 16 however is PCM offload, which is currently mostly undocumented. The intention seems to be that the decoding is done on AP in advance, and a large buffer of decoded audio is written to DSP which handles sample rate conversion et al. and plays back the PCM data. This works similar to what a direct thread does on Qualcomm devices.

The DirectOutputThread is almost as ideal as BitPerfectThread, but it has some quirks too. Not every device has a mixPort with AUDIO_OUTPUT_FLAG_DIRECT set, so it is not supported on every device. Even if it is, none of the public Android audio APIs support explicitly requesting a direct audio track. None of them! This is some of the internal system functionality supported by AudioTrack.cpp only I talked about earlier. You need to write a custom prone-to-break wrapper based on dlsym to access AudioTrack.cpp function, which is quite tedious to implement. Nonetheless, it is possible, proven by none other than Poweramp’s “Hi-Res Output” (though do not be misled, Poweramp’s “Hi-Res Output” is a lot more than just direct audio tracks, including a wide variety of OEM-specific workarounds and some other modes that do not seem to do much at all on my device - so just enabling it doesn’t mean you’re using a direct audio track, that is best confirmed by dumpsys media.audio_flinger). Additionally, the audio HAL may still employ software mixing or even resampling if a direct thread is used - direct threads only guarantee the audio going directly to the HAL (we could try to use devicePort audio profiles as a way to detect this, but because they are wrong more often than not, I do not recommend doing so). Nonetheless, the audio HAL specified it can support direct playback at this sample rate, format and channel mask, so the chance we get high fidelity output is higher than if we don’t use a direct track.

But on unlucky devices or outputs - and those are the large majority - where we can’t get anything but a MixerThread, we don’t know enough to conclusively answer if fidelity is being decreased. Now the key factors for determining if the audio playback is high fidelity is:

are there other apps playing? if not, the different tracks will be mixed.
is the volume set to 100%? if not, volume mixing will be done.
is the sample rate compatible with the thread? if not, resampling will be done.
is the audio format compatible with the thread? if not, conversion will be done.
does the channel count match with the thread? if not, up/down-mixing will be done.
is there any effect attached? if so, the effect may modify the audio signal or cause it to get converted (if the effect does not support the format/sample rate/channel count).

Even in the ideal case, we are not guaranteed to get bit-perfect audio. While resampling, format conversion and up/down-mixing will be completely skipped if not needed, volume and track mixing is a combined step that will always be done even if there is only one active track at 100% volume. Although this shouldn’t lead to any quality decrease as defined by the user, or in proper terms fidelity decrease, because 0 + 345 * 100% = 345 - the calculations will produce the same result as the input. Then there is the problem of audio effects: audio effects are intended to modify the audio, so I won’t count audio effects that are applied on purpose as fidelity decrease, but some effects only support int16 and cause everything to be down-converted to int16 - and sometimes, other apps apply audio effects without our knowledge. Other effects only support float32 and cause everything to be up-converted, which causes possible 1 bit or 9 bits of quantization errors (again, probably doesn’t matter for any real life playback use case). Now to avoid any fidelity (“quality”) loss, we need to make sure that the sample rate, audio format and channel count are compatible with the MixerThread’s settings programmatically, and avoid any effects being applied (I’ll exclude making sure the volume is set to 100% and avoiding that any other app is playing audio because both of those are the user’s responsibility - if the user even cares -, and it doesn’t make any sense at all to try and enforce these in a music player, beyond claiming audio focus).

Observing mixPorts in their natural habitat

Actually getting the sample rate, audio format and channel count from a I/O handle is not possible using public API, except if you are using AAudio: it has a set of getHardware* functions. However, with some tricks we can access the output I/O handle which gives us access to the HAL format and HAL sample rate. On more recent builds of Android 14, there are C++ functions for achieving the same in less ugly ways, and we gain a way to query the HAL channel count. These values can change for a specified mixPort in some cases on manufacturer-customized systems (MTK Hi-Fi causes this for example) - even though AOSP actually specifies they never change, which is why they should be re-queried in OnRoutingChangedListener. That automatically catches cases where the audio I/O handle of the AudioTrack.cpp changes because it requests a new IAudioTrack as well (this happens when switching between HALs, for example between primary HAL for speaker and bluetooth HAL for software A2DP output). I’ve implemented this in Gramophone and have some experimental UI showing this data. But it’s all about how to interpret it!

In the previous section I talked how about the different threads affect audio output. But how can we detect which thread we are using right now? Now that is a hard problem! AudioTrack.cpp has this information as mAfTrackFlags, but this was only added in Android 14 QPR2, and it’s a private field only used in a private function with no public getter! Due to C++ classes being unstable ABI, we cannot directly access the variable’s offset, and because the getter is private (and even if it weren’t, it’s defined in the .h file), there’s no symbol for it we can dlsym(). So we can’t use this, what other options do we have? We could reimplement AudioTrack.cpp in app code using the same binder APIs, but that’s too many critical and prone-to-change APIs - that approach would be sure to cause problems, and it will require Android 14 QPR2. But is there no other way? Actually, there is! We know that each I/O handle corresponds to one thread, and each thread corresponds to one mixPort. Android actually keeps that information together (the code comment of that method is wrong) in the AudioMixPort class, so we can convert from IO handle to mix port ID and we get the mix port’s name and profiles for free. Sure, it’s hidden API, but has that ever stopped me?

After using the hidden API listAudioPorts to get access to some AudioPort objects, filtering them to the appropriate AudioMixPort with the correct thread ID and successfully getting the mixPort’s name as defined in the audio policy, I noticed the AudioMixPort does not contain the output flags I need to determine which thread I am on. Aw. So I looked at the implementation of listAudioPorts which uses the native counterpart of the method. That in turn unceremoniously calls into AudioPolicyManager to fulfill the request, which then gets the output mix port SwAudioOutputDescriptor object and converts it to a audio_port_v7. Sadly, the flags are copied over in this conversion only since Android T, and there is no other place the mixPort information enters our app process. So if this is our only option, let’s use it. At first, I attempted calling listAudioPorts with a local copy of the JNI wrapper. But this didn’t work out because audio_port_v7 is a very unstable structure and fields are frequently added in the middle. The AudioSystem code copies into a audio_port_v7[] which we can’t properly use without knowing the size of a audio_port_v7. I was about to start working on calling the Binder method manually, while being sort of unhappy about it because the parcelable will probably change as often as audio_port_v7, when I noticed a method called getAudioPort which just fills in a pre-allocated audio_port_v7 based on just the ID (though, this function only works since Android Pie). This made me have a very stupid idea, which turned out to work just fine. Let’s look at the audio_port_v7 in detail:

struct audio_port_v7 {
    audio_port_handle_t      id;                 /* port unique ID */
    audio_port_role_t        role;               /* sink or source */
    audio_port_type_t        type;               /* device, mix ... */
    char                     name[AUDIO_PORT_MAX_NAME_LEN];
    unsigned int             num_audio_profiles; /* number of audio profiles in the following
                                                    array */
    struct audio_profile     audio_profiles[AUDIO_PORT_MAX_AUDIO_PROFILES];
    unsigned int             num_extra_audio_descriptors; /* number of extra audio descriptors in
                                                             the following array */
    struct audio_extra_audio_descriptor
            extra_audio_descriptors[AUDIO_PORT_MAX_EXTRA_AUDIO_DESCRIPTORS];
    unsigned int             num_gains;          /* number of gains in following array */
    struct audio_gain        gains[AUDIO_PORT_MAX_GAINS];
    struct audio_port_config active_config;      /* current audio port configuration */
    union {
        struct audio_port_device_ext  device;
        struct audio_port_mix_ext     mix;
        struct audio_port_session_ext session;
    } ext;
};

The id is the very first field, and this hasn’t changed in years (considering the old audio_port struct, it’s been this way since the very beginning of the media.h header.) So, preparing input for getAudioPort is trivial: allocate a zero’ed buffer and store the id at the beginning. Now we want to access the “flags” member of “active_config”. The struct audio_port_config has only chanced once since it’s inception in the media.h header (which was 2015), and that was to add the flags member. But that happened two versions before T, so we can reasonably assume it’s going to stay pretty stable. Sadly, accessing the struct from either head of tail turned out to be hard because the fields in the middle and the “ext” union change sizes at times. I opted for doing a memory scan for the last mention of the sample rate in the audio_port_config struct and go to “flags” from there. This means I’ll rely on that the “ext” union never containing the sample rate, and the audio_port_config not adding any fields between sample rate and flags, which seemed the most likely to me. Is this a stupid solution? Yes. Does this break many best practices? Yes. Will this break in future Android versions? Yes. Will I do it anyway? Yes. As a bonus, we get the HAL channel count for Android 9-13 by doing the same memory scan. And on Android 6-8, we can get the channel count as well, by falling back to listAudioPorts by just hardcoding all known different sizes for audio_port (didn’t have the _v7 suffix back then), which is viable because it only changed once across that timespan, and then accessing audio_port.active_config.channel_mask.

Ok, that’s great, we can take a look at the name and flags now. Is that enough to figure out the exact output thread type? Looking at the thread creation code, we can say it almost is. The exception is that using only flags to detect this may cause false negatives for direct threads that are created because the chosen format or channel mask are not supported by the mixer. Unfortunately, these details are implementation details in AudioFlinger and we can’t replicate these checks in our app process. But, this only impacts mixPorts where the HAL (or before AIDL, audio_policy_configuration.xml / audio_policy.conf) did NOT specifically specify the direct flag, so these outputs never were intended to be direct at all. We can only tell this is happening by looking at at the discrepancy between audio_port_v7 flags, mAfTrackFlags and mFlags (which can actually be obtained through the dump() method, even though the getter is defined in the .h file and hence gets no symbol). This nuance here is important, because:

no matter if we did or did not request a direct track, if format / sample rate / channels happen to match a non-direct mixPort and are not supported by the mixer (-> thread writing to mix port can take no non-direct tracks at all), we end up with a non-direct (mFlags) AudioTrack.cpp backed by a direct (mAfTrackFlags) AfTrack which writes to a non-direct mixPort (audio_port_v7.active_config.flags)
if we did not request direct track but format / sample rate / channels happen to match a direct mixPort and are not supported by the mixer, we end up with a non-direct (mFlags) AudioTrack.cpp backed by a direct (mAfTrackFlags) AfTrack which writes to a direct mixPort (audio_port_v7.active_config.flags)
if we did request a direct track and format / sample rate / channels happen to match a direct mixPort, we end up with a direct (mFlags) AudioTrack.cpp backed by a direct (mAfTrackFlags) AfTrack which writes to a direct mixPort (audio_port_v7.active_config.flags)
if we did request a direct track but format / sample rate / channels did not match any direct mixPort, we end up with a non-direct (mFlags) AudioTrack.cpp backed by a non-direct (mAfTrackFlags) AfTrack which writes to a non-direct mixPort (audio_port_v7.active_config.flags) even though we asked for a direct one

The only way to accurately tell the AudioFlinger thread type is on Android 14 QPR2 or later, by looking at mAfTrackFlags, and the only way to access it is another memory scan. But if our goal only is to detect if a direct mixPort is used, we can do that by looking at the mix port flags, and these automatically imply it is a direct AfTrack and hence a DirectOutputThread as well. We can do this since Android 13. If we simplify this further to trying to detect a direct mixPort only if we requested a direct AudioTrack in the first place, we just need to take a look at mFlags. Getting mFlags can be done using dump() since Android 9, and hardcoding the offset in earlier versions. To avoid hardcoding the offset before Android 9, we could try to use the mixPort’s name - but we’d have to try to match this to the flags in audio_policy.conf or audio_policy_configuration.xml. Lots of manufacturers customized these to the point where I am not confident in listing all the possible file paths nor knowing which file to choose in case there are multiple (this was a quite common occasion during the XML migration in Android 8: a device shipped with both audio_policy.conf and audio_policy_configuration.xml). Manufacturers move the file around, rename it, or sometimes even have a different one for Hi-Fi. Programmatically finding out which one is used is impossible - and the one that is not used will contain data that does not actually apply to this audio HAL. PowerAmp does parse these files by hardcoding all known-to-them locations, but they can do this because they have the whole “Hi-Res Output” behind a sort-of whitelist approach that checks for known signs they can be somewhat confident the audio policy configuration file picked is the correct. If I run the exact same Poweramp build on a my proper ROM and an AOSP GSI, it will offer me Hi-Res for phone speakers on the ROM (which identifies as Pixel) but say that’s not supported on the GSI (which identifies as AOSP on arm64). All Pixels with Android 11+ are white-listed for the “Direct HD” (basically seems to be direct audio tracks) Hi-Fi output variant - but sadly, the disadvantage of a whitelist approach shows here: my specific Pixel does not have any direct outputs at all, but Poweramp doesn’t know that because the Pixels are blanket-whitelisted (without checking audio_policy.conf).

Now we have the HAL channel count, sample rate, format, and a boolean that tells is if we know for sure our audio is routed to a direct mixPort. If we know for sure we’re routed to a direct mixPort, we don’t need to interpret the other values at all - they are guaranteed to be the same as we have configured our AudioTrack.java - but a sanity check is never a bad idea. Additionally, that we know it’s a direct mixPort means that we know it’s a DirectOutputThread - so our question is answered with “no fidelity decrease (other than float32 conversion quantization errors caused by ExoPlayer and whatever the audio HAL is up to)” (reminder: the ExoPlayer team said they plan to support int24 and int32 as input formats for the AudioTrack, which would get rid of possible float32 conversion quantization errors). Bit perfect (no fidelity decrease) and spatializer (fidelity decrease) threads can be detected using the mixPort flags available since Android 13 - which isn’t a problem because bit perfect is supported since Android 14, and spatializer since Android 13. Offload can be detected using mFlags if we requested it, hence since Android 9, and means “no fidelity decrease (other than whatever the audio HAL / DSP firmware is up to)”. This only leaves the MixerThread. Let’s take a look at our checklist again:

is the sample rate compatible with the thread? if not, resampling will be done.
is the audio format compatible with the thread? if not, conversion will be done.
does the channel count match with the thread? if not, up/down-mixing will be done.
is there any effect attached? if so, the effect may modify the audio signal or cause it to get converted (if the effect does not support the format/sample rate/channel count).

We have sample rate and audio format on Android 5 or later; channel count (before Android 14, we need to convert the channel mask to a channel count) on Android 6 or later. Checking if those match with what we input into the audio track is trivial. What remains is the effects. Those are sadly a more complicated story. There are three ways other apps can add an effect to our audio track:

We broadcast our audio session ID to equalizer apps
Audio session 0 (global mix) effects, deprecated for many years but still supported
The OEM gave this app special permissions or modified AudioFlinger to attach the effect everywhere

We can simply not broadcast our own audio session ID to equalizer apps to get rid of case one. That is something we actively have to do, ExoPlayer doesn’t do it for us, for example, so it’s easy to not do it - but don’t make that the default, well behaved third-party equalizer apps rely on your music player’s cooperation. And most users want working third-party equalizer apps, hence broadcasting the effect ID should be the default. About case three, I’ll keep it short, we can’t do much about it. There’s no way to detect or prevent this that I am aware of (with permissions a normal app can obtain, anyway) - if you know one, tell me in the comments! Users need to turn off any potential effects in their device’s Settings app. Some examples include Dolby (Samsung’s AudioFlinger has various integrations with Dolby effects) and MediaTek’s BesLoudness (and while BesLoudness is not a effect in the sense of AudioEffect, it modifies the sound quite a lot).

To understand the global mix effects, we have to first understand the different audio sessions involved in this process:

    // Effect chain for session AUDIO_SESSION_DEVICE is inserted at end of effect
    // chains list in order to be processed last as it contains output device effects.
    // Effect chain for session AUDIO_SESSION_OUTPUT_STAGE is inserted just before to apply post
    // processing effects specific to an output stream before effects applied to all streams
    // routed to a given device.
    // Effect chain for session AUDIO_SESSION_OUTPUT_MIX is inserted before
    // session AUDIO_SESSION_OUTPUT_STAGE to be processed
    // after track specific effects and before output stage.
    // It is therefore mandatory that AUDIO_SESSION_OUTPUT_MIX == 0 and
    // that AUDIO_SESSION_OUTPUT_STAGE < AUDIO_SESSION_OUTPUT_MIX.
    // Effect chain for other sessions are inserted at beginning of effect
    // chains list to be processed before output mix effects. Relative order between other
    // sessions is not important.

Well-behaved music players create a new session (in the comment: “other sessions”) and broadcast it to equalizer apps which will then attach effects to that session. Both AUDIO_SESSION_DEVICE and AUDIO_SESSION_OUTPUT_STAGE are reserved for the system, so they fall into our third case above, but it’s useful to know they exist. Global mix effects can only be created if they attach to a mixer thread without fast mixer or are hardware accelerated (-> change audio output in hardware, not with software processing). We can see there is a block for RAW outputs disallowing effects, but sadly it is part of the block that checks for fast flag, which doesn’t seem to be quite intended, but that’s what this is - raw flag in practice makes no difference (except potentially using a different mixPort). Software effects are unlikely to be supported on a thread with fast output flag or fast mixer in the future, because this feature has been omitted on purpose, probably to reduce latency. In practice, this means if we are playing on a thread with the fast flag set, we know there won’t be any global effects, otherwise we can’t know and just have to hope the user knows which apps add AudioEffects to the global mix. Global mix effects are attached to the most preferred mixPort patched to the music stream’s routed output device, in this order of preference: active offload output, active spatializer output, active deep buffer output, active primary output, active other output, inactive offload output, inactive spatializer output, inactive deep buffer output, inactive primary output, inactive other output. Until including Android 7.1, a simpler algorithm ignoring active output was used: offload > deep buffer > other output. That means if a music player was using a non-deep buffer output before Android 8, effects wouldn’t have worked properly (I think that’s one of the reasons Audio Compatibility Patch removes the deep buffer output - effects will attach to a non-deep buffer output and all apps will be forced to use that same output).

In conclusion:

we know if resampling, up/down-mixing or format conversion is happening
if we are on a fast mixPort, we know if effects are attached (excluding evil OEM effects)
we can use this information to tell the user if his playback is expected to be relatively high fidelity or not
everything else, as usual, is up to the HAL (which may do further mixing!)

Poking the audio stack causes it to explode

But what if the sample rate doesn’t match? What if the format doesn’t match? Can we do anything about that? I, as an unsuspecting user, just plugged in my DAC, and it shows me it’s configured for 96kHz/32bit audio. However, my audio file only is 48kHz - Gramophone tells me it’s being resampled, but why? My DAC supports 48kHz, my mixPort supports 48kHz, so why isn’t it just using that?! The answer is in the above text: every non-direct mixPort will be opened at highest format, then highest sample rate combo and otherwise whatever HAL decides to use - and then stay that way forever. Okay, I lied. Remember preferred mixer attributes? The system always gives us an output thread perfectly matching the preferred mixer attributes, and once you’re done, it gets rid of that output again. The output selection queries the last set preferred mixer attributes. Sadly, preferred mixer attributes can only be set for USB devices. Nonetheless, this is a golden ticket for us music player devs: Since Android 14, Google added a way for normal music-playing apps to reconfigure USB DACs however they want! And remember, preferred mixer attributes are not only used for bit perfect audio: They work without audio HAL modification.

Well, they would work without audio HAL modification. If the audio HAL ever supported anything except the highest settings properly. Spoiler: on a Pixel 7a running Android 14 QPR3, this ends up in disaster. The audio HAL does not attempt to reconfigure the DAC to a lower rate ever again after moving to a higher one. 96 kHz, the highest one my HAL declares supported for float32, is set by default. Hence, trying to set the DAC to 88.2 kHz results in a slight speedup (because the audio data is 88.2 kHz but the DAC thinks it’s getting 96 kHz), while 44.1 kHz just deteriorates to chipmunk. Additionally, I had issues which my settings randomly unapplying because audio policy manager crashed the audio process. At this point, I fetched another device where I know the audio HAL isn’t as stupid. Let me explain why: In the context of preferred mixer attributes not being a thing yet, MediaTek assigned some engineer to develop a Hi-Fi audio solution. This resulted in a toggle in Settings (although any app can replicate this using getParameters("hifi_dac") and setParameters("hifi_dac=0/1") public API) that does two things:

off: blocks max sample rate as 48 kHz for most usages, enables PMIC DAC for analog output (if your device even has two DACs)
on: changes mixPort and AF thread sample rate to last media sample rate forcibly (this means on MTK stock ROM, AF thread sample rate can change even if it wouldn’t in AOSP), enables Hi-Fi SoC-external DAC for analog output (if your device even has two DACs)

They ported this Hi-Fi feature to Android 14. Surprisingly, there are almost no conflicts with preferred mixer attributes. I say almost because there unfortunately is a minor problem: when stress testing, I was able to freeze the entire system by making audioserver enter a busy loop with no chance of recovery except rebooting. And this problem is reproducible. Now in the grand scheme of things it probably doesn’t matter since we get an output with matching sample rate and highest available audio output format, but we have no influence over channels for example when using MTK Hi-Fi, which we do have when using preferred mixer attributes instead. Nonetheless, I don’t recommend actually enabling the preferred mixer attribute APIs in Android 14 unless you personally verify its stability on a device. I’m hoping this is more fleshed out in Android 16 (yes, I know Android 15 and 16 are released, but I didn’t get to testing them yet).

Now that the situation is looking hopeful for USB, what about Bluetooth? The software A2DP bluetooth audio HAL completely ignores the passed in config and instead asks the bluetooth stack for the current output configuration, then opens the output at those settings. This means we can forget even trying to change the sample rate and friends from audio stack side, the lower layer completely ignores it. However, in Developer Options, there are options where users can ask the bluetooth stack to change this and the software A2DP HAL will follow suit instantly. This is a contrast to the A2DP offload codepath, which goes through my primary HAL (the same one that’s responsible for USB preferred mixer attribute bugs) and that only has 48 kHz mixPorts even when playing on a 44.1 kHz device… I assume the DSP is resampling before encoding. Before Android 15 QPR1, apps actually had a way to detect the bluetooth codec of a connected device including parameters such as sample rate, which means on those versions we can show it to the user and detect if the offload path is resampling, sadly that got closed even though it might get promoted to public API later (sorry for quoting you :P) - though, with companion device manager association, we can query the codec status again since Android 16. So this means before Android 15 QPR1, we can detect mismatches and can tell the user to go to developer settings and change the codec settings. Before Android 14, we can even do that ourselves using setCodecConfigPreference, which used to not enforce any permission either (and today can only be used with companion device manager association). Or, if we create a CDM association, we can cover every version except 15 QPR1/QPR2.

Oh well. What about analog outputs? Internal DACs? On most OEMs, it is reality that these can still only be used to their full potential with direct audio tracks (and we still don’t know what the Audio HAL is doing with those, but let’s be hopeful, if a company bothers to add a proper DAC they’ll write proper software) - or MTK’s Hi-Fi. These direct audio tracks work a little bit differently on every device though. The basic problem outline is: just because you request a direct track, you don’t know if you actually get a direct track. We only know that when looking at mFlags. Now imagine this: the audio HAL for the internal DAC supports up to 192 kHz at int32, and we play a 96 kHz float32 file with the direct flag set. Sadly, this will make us end up at the int16 48 kHz primary output because no direct output will match our request. Only in Android 10, Google added a method to query the direct audio support state which allows us to check for all combinations. This means that, on older versions, we need to basically brute force our way through all different format, sample rate and channel mask combinations, creating a track, checking if it’s really direct, if not, get rid of it again. Other than that, there are some different Hi-Fi strategies I was able to find in Poweramp or with some research in old open source CAF versions:

detecting if MTK Hi-FI is enabled, and relying on that to do the legwork (and make sure to release old tracks before creating new ones)
creating an offload track set to PCM, for some Qualcomm devices
setting AUDIO_OUTPUT_FLAG_DIRECT_PCM if system property audio.offload.pcm.24bit.enable is true for some other Qualcomm devices
creating normal direct tracks, used for example for Hi-Fi output to LG’s Quad DAC
using AudioTrack.cpp without javaland wrapper allows int24 and int32 audio even on Android 5-11, instead of Android 12+
using AudioTrack.cpp without javaland wrapper allows sample rates higher than 96 kHz even on Android 5, instead of Android 6+
on some Samsung devices, you can create tracks with sample rates higher than 192 kHz, and there may be other Samsung-specific changes

(Bonus chatter: Poweramp deprecated the Hi-Res output in Android 15 pointing to the AAudio output mode instead. The Hi-Res output started failing because the no-argument AudioTrack.cpp constructor was removed in Android 15 (but there is another one that can be used). However, AAudio does not support requesting direct tracks, which is why I’m quite sad about this deprecation.)

Now, iff you’re interested in 384 kHz and 768 kHz, achieving that within the limits of AOSP are going to be quite hard: The default USB audio HAL doesn’t support anything above 192 kHz, tracks of this sample rate are blocked in AudioTrack.java and such tracks can’t be mixed by AudioFlinger either, only direct tracks are supported. So the only way this is possible to do with pushing MediaCodec to 384 kHz + a direct audio track + a very cooperative USB/analog output audio HAL, or, for 784 kHz, by using the nuclear option: ffmpeg-based decoder + direct USB driver :D

The nuclear option of using a direct USB driver of course allows to get various benefits even if not using absurdly high sample rates, for example, changing any hardware control, using any sample rate, channel settings or format and getting guaranteed bit-perfect playback. And such a driver can work on almost every modern device except Huawei/Honor or Unisoc-based devices, so there’s much wider support than Android 14’s preferred mixer attributes. Of course, there are disadvantages such as potential conflicts with other players or loss of alarms and notifications. Writing such a driver however is so much effort that I only know a few apps having done so: HiBy Music, FiiO Music, USB Audio Player Pro, Neutron Player and Onkyo HF Player. Notably, only paid apps or apps where the DAC vendor paid for the development. It’s probably as annoying as than writing code to interface with direct audio tracks - which only Poweramp, USB Audio Player Pro and Neutron Player do to my knowledge.

Excursion 1: What about MMAP mode?

Now we covered a lot of audio stack components, but I have completely omitted talking about MMAP modes. We know there’s a MmapThread, but we also know that AudioTrack.cpp and even the whole AfTrack system do not support MMAP modes. So how does it actually work? The basic idea is that instead of copying buffers around countless times, apps write directly to the buffer provided by the ALSA kernel driver - which requires the HAL to support a mixPort with the flag AUDIO_OUTPUT_FLAG_MMAP_NOIRQ and AUDIO_OUTPUT_FLAG_DIRECT set. That is opened by the openMmapStream method, which is called by the MMAP endpoint class that is used by the AAudioServiceStreamBase which I’d compare with the IAfTrack. That is created in openStream in AAudioService, which is directly called by clients from the libaaudio API. If you create an exclusive AAudio stream that is allowed to be MMAP mode by the client library, HAL and no one else is currently using it, you get a stream where you talk directly to the audio HAL is a format supported by the mixPort. Because the exclusive mode is only supported by MMAP, we actually have a way to find if we really got an exclusive stream without any private API, which is quite nice. We can distinguish between legacy mode (backed by AudioTrack.cpp so all of the above caveats apply) and MMAP shared mode using some private API. But what does MMAP shared mode do?

    // MMAP gracefully handles lack of an exclusive track resource by mixing
    // above the audio framework. For AAudio to know that the limit is reached,
    // return an error.

AAudio does not use the AudioFlinger mixer codebase at all for shared mode. MMAP shared mode is instead implemented by oboeservice just opening an exclusive stream and mixing all the active clients together (but this is pure track mixing without resampling, up/down-mixing, format conversion, applying effects, etc). There are a few requirements to be a shared client, for example, only float32 is supported and the sample rate and channel count must match the shared endpoint. However, the client library does resampling, up-mixing and format conversion to match device sample rate / format / channel count (defined as what the backing AAudioServiceEndpoint uses - so either the actual mixPort sample rate if exclusive stream, or the requested sample rate if shared stream) for streams. The oboeservice opens the backing exclusive stream for a shared stream using the format and channel mask that the first client provides, but no fixed sample rate, and it will stick around until it gets closed for inactivity again.

All that aside, the primary problem with MMAP is that it is designed for low latency. The MMAP mode in general is currently only enabled for low latency, and most audio HALs optimize for low latency instead of quality or power usage when using it. Due to the same reason, I also didn’t see one single device where MMAP supported anything other than 48 kHz output sample rate, so I stopped investigating at this point - AAudio MMAP is largely uninteresting for a music player, and I don’t see any real reason to use AAudio legacy over AudioTrack.java at the moment (although it is a fine API, and it’s only disadvantages are making hidden API hacks harder while not supporting many devices).

Excursion 2: Using the effect framework to your advantage

While I was looking at Poweramp to understand how its “Hi-Res Output” works, I inevitably stumbled across Poweramp’s DVC. But what even is DVC? Their official help center describes it as:

 Direct Volume Control - or DVC - is an option that provides a cleaner and more uninterrupted audio path from Poweramp to your output device, removing some of the interim stages that the hardware manufacturer may have incorporated into their design. Because the hardware volume is controlled more directly by Poweramp, there is generally an improvement in dynamic range and more headroom for equalization to take place so you can safely boost the bass or treble further - both of which may also result in an increase in possible volume levels.
[...]
 Note: If you are using the main Poweramp app and the Poweramp Equalizer app at the same time (which is not generally recommended anyway) then you must only enable DVC in one app or the other, not both. 
 Note: for Hi-Res, DVC support depends on firmware and firmware updates and may not work for some. In this case, either disable DVC via No DVC option in PA Settings -> Audio -> Output -> Hi-Res output -> your device or use standard definition output (AudioTrack, OpenSL, or AAudio).

The statement of it providing a cleaner and more uninterrupted audio path peaked my curiosity, because it also says if you use both Poweramp the music player and Poweramp the standalone equalizer, you must only enable DVC in one of them. Wait, what? The standalone equalizer app can apply an effect that provides a cleaner and more uninterrupted audio path?! The standalone equalizer app can by definition only use the sort-of limited AudioEffect APIs, so how is it doing that? A thread in their forum gives a few more details: a user was asking if hi-res or DVC is preferred for fidelity (one could say audio quality):

Have a Sony Xperia 1 IV and for whatever reason using DVC with Hi-Res output causes it to downsample to 16bit/48khz. So my options are to disable DVC and continue using Hi-Res or to enable DVC and switch to AAudio since the downsampling isn't present on it. Which case will give me the greatest fidelity?

The answer from the developer is quite enlightening:

DVC for the higher range of equalization/tone adjustments (esp. basses - you can get unbelievable level of basses with the DVC) - using most stable, well supported standard definition 48khz/16bit or may be 48khz/24bit. Non-DVC for high resolution devices with high accuracy playback for those who needs it.

DVC + (very, e.g. 192khz+) high-res is a gimmick and not really needed at all - you want high accuracy as close to source material as possible but you also tweak sound massively?

So DVC allows us to have more dynamic range for the DSP engine, but for playing non-DSP-ed sound, disabling it is recommended. At that point, I had to try it myself, and took a sneak peek at the dumpsys media.audio_flinger: (For simplicity, I tested with 48 kHz float32 using my phone’s speaker, because DVC works there as well.)

  1 Effect Chains
    1 effects for session 2369
        In buffer    Out buffer                           Active tracks:
        0x702e86f000   0x702e871000 -> 0xb400006e0a09fc80   1
        Effect ID 395:
                Session State Registered Internal Enabled Suspended:
                02369   003   y          n        y       n
                Descriptor:
                - UUID: 119341a0-8469-11df-81f9-0002a5d5c51b
                - TYPE: 09e8ede0-ddde-11db-b4f6-0002a5d5c51b
                - apiVersion: 00000000
                - flags: 00000050 (conn. mode: insert, insert pref: last, volume mgmt: implements control, input mode: not set, output mode: not set)
                - name: Volume
                - implementor: NXP Software Ltd.
                1 Clients:
                          Pid Priority Ctrl Locked client server
                        10295     1337  yes    yes      0      0
                Status Engine:
                000    0xb400006dea09ce30
                - data: float
                - Input configuration:
                        Buffer     Frames  Smp rate Channels Format
                        0x702e86f000 00960   48000    00000003      5 (AUDIO_FORMAT_PCM_FLOAT)
                - Output configuration:
                        Buffer     Frames  Smp rate Channels Format
                        0x702e871000 00960   48000    00000003      5 (AUDIO_FORMAT_PCM_FLOAT)
                - HAL buffers:
                        In(0x702e86f000) InConversion(nullptr) Out(0x702e871000 -> 0xb400006e0a09fc80) OutConversion(nullptr)

The above effect is only present if DVC is enabled. It’s a “Volume” effect - now if you look at the most common error pattern of DVC not working (everything is quieter), you can come up with a really reasonable hypothesis on what it actually does:

In order to gain more dynamic range for the DSP, Poweramp uses a high bit depth for output (-> more dynamic range free in general), then sets a negative gain in the DSP engine (makes things quieter). Now exceptionally loud sound (strong bass for example) can fit in the upper part of the dynamic range where the music is pushed out. To have normal-sounding audio, they then apply a Volume effect to increase the gain again - on the side of AudioFlinger, which means loud sounds will not be clipped, but instead we just generate a sound that wouldn’t be generate before being mixed at the current volume, but it is thanks to the gain trick freeing up dynamic range. And if DVC is not working properly and makes everything quieter, that is because the current output does not support AudioEffects.

Now, of course, this is only a hypothesis, so take this with a grain of salt. I did not look at decompiled Poweramp code to find out. However, if accurate, that’s a really clever way of getting more headroom for DSP effects!

Future topics to explore

While this post addressed a lot of details, a few more topics would still be interesting to explore, and I may do so in a follow-up:

When should which strategy be used for achieving working direct output across many android versions?
How to make PCM offload, USB bit perfect API, direct output and friends work with ExoPlayer?
How / where does AudioEffect cause format conversions?
Which AudioEffects are okay to use in what circumstances?
How did Samsung modify the audio framework in order to support global effects and different sample rates?

Conclusion

The Android audio stack is really complicated. This blog post started as personal notes to not forget my research, but at some point I noticed it’d be nice to share this with others and clear up some myths regarding Android audio too (“Android resamples everything to 48 kHz” is too easy / not accurate anymore, it just depends on what the Audio HAL wants to do). I’m aware this post got very technical, but in short, what users need to know is: Music players can, when the devs employ a wide array of tricks, tell you a reasonably accurate approximation of what is happening to your audio (when the devs don’t do that, you’ll have to assume the worst). The other points to remember are:

you never know what the Audio HAL does (with plenty of cases such as BesLoudness where it works against the goal of high fidelity)
forced global effects added through OEM modifications can’t be detected by apps
if you want high fidelity audio playback, you should disable the AudioEffect session broadcasting, because some effects (note: DVC’s “Volume” uses float32, that’s not what I mean here) still haven’t moved on from int16
your music player should probably try its best to use direct playback threads; normal AudioTrack.java, AAudio or OpenSL ES don’t result in the best output

Thanks for reading, and don’t hesitate to write a comment in case of questions!