Feb 2, 2012

Inference Pattern

Steve Jobs gave you the iPod, but he listened to vinyl at home.  #SecondLife

I've been listening a lot to binaural audio lately as a result of my recent research paper which touched on the subject briefly for the future of virtual worlds. Over time, this got me wondering about the perceptual quality of LP (Analog) versus Digital outputs and what exactly it is we're hearing that is different.

record player

As part of the research, I came across things like "Holophonic" recording and the theoretical (if not a little quirky) explanation from Hugo Zuccarelli with unsubstantiated claims concerning the holographic nature of sound and the interpretation thereof within the brain.

Usually I would believe this to be an open and shut case, but something has been nagging me about the whole situation.

You see, the way that binaural audio essentially works is the HRTF or Head Related Transfer Function, which is the subtle difference in arrival time between each ear, coupled with the precise shape of the head and ears in which the sound waves are molded in order to give subtle and consciously imperceptible cues for spatial positioning inside the mind - often referred to as the Cetera Algorithm.

This got me thinking about what the commonality for those who listen to hi-fidelity analog audio and how they commonly describe it to be "better" but can never quite give a direct answer as to exactly how it is better.

The untrained audiophile will likely never be able to hear the differences, and to make a better point, even the most well trained audio engineers today are unlikely to be able to accurately determine the differences.

I believe this has to do with the actual difference between hi-fidelity analog and digital recording, in that the frequency response is the key to this mystery, coupled with some key understandings about binaural audio, as well as a cursory look at what Hugo Zuccarelli calls "Holophonic" audio.

The idea behind Holophonic audio is that Hugo insists there is some sort of reference information in high quality audio that is interpreted in the mind alone, and while this seems silly up front, this is essentially how the Cetera Algorithm works. The subtle difference in arrival time is calculated in our minds on a subconscious level to give us the perception of spatialized audio. This is why you know that a bird is chirping in the forest in a tree, about 100 feet in front of you, and roughly 200 feet in the air, and to your left by about 50 feet.

But when we record binaural audio, the frequency response rate likely plays a much larger role than we were originally aware of; in turn separating our audiophile experience into two categories, whereas today we like to think of it only as a single category of conscious perception my theoretical thinking distinguishes our audio into two categories as subconscious perception and conscious perception.

The standard for CD audio began around 40khz and early CDs recorded at this rate up until recently, whereby that was raised to 96khz response range, even though audio engineers will say that the average human ear cannot hear above 40khz. When you open a wave file, the standard encoding rate is 44.1khz and this is not a coincidence - it was just a little more than the stated average hearing ability for CD quality to give it some leeway. Even though (as stated in the video) the capture of the audio was roughly half of what it actually is.

When listening to binaural audio, often times we can say that the positional cues are accurate to a degree, but often times the cues for "in front" and "behind" the head are hard to make out. This is also true for positional cues such as "above" and "below" in combination with in front and behind. Looking over the information for the binaural tracks, I noticed a common thread between them and the ones which seemed to have better positional cues (where the in front, back, above, and below were much better represented). The difference seemed to be in the recording quality itself, and the frequency response.

So I looked it up further and found the video below showing the definite difference between analog and digital, and the drop-off of that frequency range. Analog Hi-Fidelity goes right up to 120khz frequency response, and should be well out of the range of hearing, while original CDs stopped around 40khz before dropping off, and more recent digital recordings stopped around 96khz before the sharp drop-off.

I suppose the digital future wasn’t as great as we were told…

What this leads me to believe is that the difference is in subconscious audio cues as maintained by the capture of the audio in frequency response. Everything after 40khz would be subconscious information much like we infer the positional cues from subtle difference in arrival time to each ear, positional audio quality, and fidelity, hinges on the superior capture and playback of upper frequency response range which is carrying the consciously imperceptible audio cues for the mind to reconstruct for a full audio experience.

In short, it's all in our heads - but in a very good way. Hugo Zuccarelli might have been spot on with his assumptions about “Holophonics”, but for all the wrong reasons. MP3 destroys this upper limit of subconscious audio fidelity due to compression models, so we actually are losing quite a lot of the fidelity of our audio by chopping off the upper frequency range. Even at 96khz frequency response, we're still stripping away subtle cues in the audio which help our mind reconstruct an audioscape, though admittedly not nearly as much information as our early "superior" digital CDs at 40khz.

I believe this is the ultimate secret to the claims of "Holophonic" audio, which for all intents and purposes is recorded the same way as a standard binaural audio track, but with what I would argue a very important difference that Hugo Zuccarelli is unlikely to say publicly:

Hugo Zuccarelli has designed ultra-low distortion microphones that are quite possibly unparalleled in the recording industry, as well as loudspeakers that are also ultra-low distortion. His frequency response ranges may be past even 120khz in recording binaural, which would likely result in capturing an ultra-high fidelity spatial audio binaural that preserves far more subconscious cues than some of our standard HD microphones and recording today.

Hence, the difference between binaural and "holophonic". Binaural, then, is the equivalent to the low-fi of 3D Audio, while Holophonic would rely on ultra-low distortion custom microphones and ultra-high capture of frequency response to capture well above and beyond the conscious range of hearing but capture an amazing amount of subconscious audio clarity that the mind is interpreting at an audio resolution superior to current digital means. It would also mean that in order to truly appreciate this process and playback, a standard pair of headphones won't cut it, nor would even your high-end headphones that range up to 1,000 dollars.

For instance, audiophiles who pride themselves on high-end headphones and balk at things like Bose or Skullcandy are in no better position themselves, as even if the headphones can reach 120khz, your audio formats and the equipment it is hooked up to much of the time destroys that subconscious fidelity before it reaches your ears (and subsequently your expensive hipster headphones).

Even your "HD Audio" card for your computer is likely well below the top end audio fidelity of 120khz, so plugging in your expensive headphones gains you absolutely no real stated benefit other than not bastardizing the audio any further than it already has been before it has reached your ears. However, the audio quality you hear is only as good as the process by which it is interpreted and delivered - so all of the components in between your ears and the audio translation matter the most.

It's only as good as the lowest common denominator in the chain.

Personally I have a pair of Bose headphones, and I know full well  that they distort the bass response more than they should. This is why I use them mainly for MP3 music (where it doesn’t actually matter), but not when I need clarity like with binaural audio. I know that unless I'm using hardware capable of 120kHz frequency output and the audio file itself is recorded at ultra-high frequency response and without compression, that no pair of headphones will make it sound as clear and amazing as it should.

At least... not yet.

I think Neil Young and the late Steve Jobs were onto something here... 

In the context of virtual worlds and augmented reality, if we expect to construct more compelling and immersive environments, we’re going to have to step up our game in the audio department. What this means for audio quality in general is that the trade off between quality and file size means a lot more than we previously thought. 


  1. I have several Zuccarelli recordings the system is a perfect match for virtual worlds but ppl dont really give a s**t about sound quality these days.
    Great article

  2. @stiofainx - Oh, trust me... I definitely give a damn about audio quality. In regard to binaural output for real-time environments, you should take a look at Ghost Audio :)

    @mighquis - Thank you! Glad you enjoyed the article :)