Not so long ago if you wanted to make decent sounding recordings you’d have needed to go to a commercial recording studio and/or spend a fortune on equipment. Now for a very small outlay on some basic recording gear, you can set up a first-class home recording studio in your bedroom, garage or study. However, it is very easy to end up in that familiar situation of having all the gear and no idea!
In this post you are going to get a comprehensive intro to everything you actually need to know about digital audio recording. This is not a highly technical post. Instead, these are the terms associated with digital audio that you should understand, or become familiar with, so that your home studio recording will be successful.
However, the world of digital audio recording is a vast one, so there is quite a lot of information here. Hopefully you will find it useful. But here is a quick overview of the key things you absolutely must know if you are new to digital audio.
Our 3 Top Tips For Successful Digital Audio Recording
- Record, edit and mix at the highest resolution your equipment will allow. Keep everything at that level right up until you create your final mix. At that point, export your final master at the resolution required for distribution
- Watch our for clipping! Do not push your recording levels to the max when recording digital audio. Leave some headroom or you will get clipping (which sounds terrible). Recording at 24-bit will allow for extra headroom if your equipment will allow
- Beware when processing audio files that you don’t introduce clipping when you apply effects etc
And now, if you want more in-depth detail on digital audio, grab a coffee, and read on …
Digital Vs Analog Recording
Sound recording has a rich history, beginning with Thomas Edison’s Phonograph and Emil Berliner’s Gramophone Records back in the 19th Century. Recording began as a mechanical process and evolved via vacuum tubes and magnetic tape. By the mid 1950’s Les Paul had pioneered the concept of overdubbing using multitrack techniques, and the first multitrack tape recorders were placed on the market in 1960.
If you wanted to record at home in the 1970’s you would almost certainly have used tape. Either reel-to-reel or cassette.
All these recording processes are based around the concept of ‘analog’ recording. The term ‘analog’ refers to the fact that the waveform encoded on tape, or inscribed onto vinyl is a close analogy to the original waveform picked up by a microphone. It can then be played back over speakers or headphones via an electromechanical transducer. The original waveforms are recreated and that’s what you hear as the speakers vibrate.
Digital Sound for the General Public
Digital sound first reached the general public in 1982 by means of the compact disc (CD format). However digital audio for musicians was still very expensive and mostly only available in commercial studios and universities. Only in the late 1980’s did lower cost, good quality converters become available for personal computers. This development heralded a new era in computer music.
We have now reached the point where most music is distributed digitally (aside from the resurgence of vinyl!) and for just a small outlay you can set up a home recording studio in your bedroom that would have been the envy of many a commercial operation a couple of decades ago.
Sampling: The Core Concept of Digital Audio Recording
The core concept in digital audio recording is sampling. This is the process of converting continuous analog signals (eg those coming from a microphone) into discrete time-sampled signals. Once the signal has been converted, what you get is an audio file (or a stream of numbers if live-streaming) that represent these sound-waves in the digital domain. These can be streamed to, or stored on, any digital device (like a computer, iPad, digital handheld recorder). This is analog to digital conversion (ADC). Once on your computer, digital audio files can be processed in almost unlimited ways via DSP (digital signal processing).
You then listen back to the digital signals in a reverse of the process when the files are converted back and played through monitors or headphones. This is digital to analog conversion (DAC). The DAC reconstructs the original signal.
The processes of ADC and DAC involve filters to smooth out the results during recording and playback. The quality of the filters is another important difference between less expensive and super professional digital audio recording kit. Here is a non-technical overview and summary of the whole process.
The lowpass antialiasing filter ensures there are no frequencies too high to be correctly sampled during the ADC process. The smoothing filter interpolates the wave form between the discrete samples during DAC. (Read more about aliasing below …)
A Demonstration of the Principle of Sampling
Here is a little more detail of what is going on. If you were to look at an analog audio signal on an oscilloscope the screen might look something like the image below. This is a continuous representation of the amplitude of the sound over a period of time. (The amplitude being the strength of the sound wave over a time period – or the air pressure).
The above is a simple ‘time-domain’ representation of a sound waveform.
To transform this into a digital audio signal, you measure the amplitude at fixed time intervals. In other words, you take a series of ‘samples’. A fixed number of snapshots per second to represent the analog signal as a series of digital samples.
Thus, the smooth analog signal has been converted into a series of time-sampled signals. The resulting graph looks broadly similar to the analog signal.
Bear in mind the filtering that goes on ‘behind the scenes’ to smooth out the conversion process.
Sample Rate and Bit Depth
Now that you understand the basic principle of digital audio, you can start to make sense of all the concepts and jargon associated with digital audio recording. The kind of stuff you see in the tech specs of audio interfaces, and other studio gear. Let’s start with sample rate and bit depth. You may have heard of ‘CD quality’ recording, which is 16-bit, 44.1 kHz. But what does this actually mean?
The sample rate is the number of snapshots per second, or in other words the number of samples taken per second, per channel. So if the sample rate is 44.1 kHz then this means 44,100 samples per second.
If you look at the graph (figure 3) above you can see that the graph looks quite crude. Over the time period shown only about 40 samples have been taken. Twice the number of samples (80 snapshots) would have produced a smoother graph. Four times the number of samples (160 snapshots) would have been even better.
In other words, the more snapshots, or samples the better. Because the nearer you will get to a good representation of the analog signal.
Bear in mind, that if you record a stereo signal, then you need to double the amount of samples taken. For a 1 second file sampled at 44.1 kHz you would need to store 88,200 samples. Therefore, stereo files are twice the size of equivalent mono files.
So when you look at any digital audio recording equipment, you will always see the sample rate of the device. Older and cheaper devices record at ‘CD quality’ of 44.1 kHz, newer and more expensive interfaces etc boast much higher sample rates. Common rates are: 48 kHz; 88.2 kHz; 96 kHz; and 192 kHz. Scroll down below for a comparison table of some popular digital recording devices and interfaces.
You CANNOT record anything at a sample rate of 96 kHz with an audio interface that will only record at 48 kHz. The more samples per second the better … although the more space the files you create will take up. However, if you record and edit at the highest quality, then you will get a much better quality result in the end.
Another thing to consider, as well as the space, is the processing power required. So bigger is not necessarily best for the average home recording studio.
Why did 44.1 kHz become the default industry standard? In the early days of digital audio, the size of audio files was a real consideration. Computers then did not boast of terabytes of storage! Storage was still quoted in megabytes, not gigabytes. So the CD standard of 44.1 kHz was a good compromise between quality and storage space required (and more on this magic number later …)
Again looking at figure 2 and figure 3 above, the graphs are attempting to plot the shape of the analog signal. The vertical axis is labelled as having a range between 1 and -1. This represents the maximum and minimum amplitude of the waveform. Now the more options you have available to represent each sample the better.
16-bit audio (as in the file shown below) means that 16 bits are available to represent the amplitude. A total of 65,536 possibilities. But if you record at 24-bit, then there are 16,777,216 possible values. So you immediately get much more precision.
Do you need to understand this in depth? No, as long as you understand that 24-bit is better than 16-bit because it can more accurately represent the waveform. BUT again, there is a trade-off. More bits = more storage space required. So your 24-bit audio files for the same length recording will be bigger than 16-bit files.
It is very doubtful whether the human ear can detect the difference between sounds recorded at 16-bit and those recorded at 24-bit. However, there is one very important advantage of recording at 24-bit. It makes life easier for you. This is because the extra range of values used for 24-bit recording make it much easier to avoid clipping (which is discussed in more detail below).
You can only record at 24-bit if your recording device supports it. However, these days storage on a computer is not so much of an issue as it used to be. So as a rule of thumb, always record and edit at the highest resolution possible on your audio interface and only reduce down at the end when you come to your final mix and bounce. And take advantage of the extra headroom you get from 24-bit if it is available to you. It has the following benefits:
- Greater Dynamic Range
- Higher Signal/Noise Ratio
- More Headroom
So you have just discovered that the number of bits per sample is important because it affects the dynamic range of a digital sound system. In general, the dynamic range is the difference between the loudest and softest sounds that the system can produce and is measure in units of decibels.
The decibel is the unit of measurement for relationships of voltage levels, intensity, or power of audio systems. The image below shows the decibel scale and some estimated acoustic power levels relative to 0 dB.
In recording music, it is important to capture the widest possible dynamic range to reproduce the full expressive power of the music. In a digital system you can use the following simple formula:
maximum dynamic range in decibels = number of bits * 6.11
So if we record at 8 bits per sample, the upper limit on the dynamic range is approximately 48 dB, which is worse thant the dynamic range of analog tape recorders. Record at 16 bits per sample and the dynamic range increases to a maximum of 96 dB – a significant improvement.
A 24-bit converter offers a potential dynamic range of approximately 146 dB. Much better still.
And since quantisation noise is related to the number of bits, even softer passages will sound cleaner. (More on quantisation below …)
The upshot of all this is always record and edit your audio at the highest bit depth your equipment will allow. This will take up more disk space and processing power, but most modern computers can handle this. At some point you will have to create a final mix, and then bounce it down to an audio file suitable for sharing which will almost certainly have a lower bit depth. But only do this once, at the end of your recording and mixing process. (For more read about dither below …)
Examples of Audio Interfaces and Digital Recording Devices: Sample Rates and Bit Depths
Below is a comparison table of some of the most popular audio interfaces, handheld portable digital recorders and USB microphones. Compare the maximum recording resolution (ie sample rate and max bit depth) of each device.
[The following content contains affiliate links. For more information, read our disclosure policy here.]
Historically, with analogue recording, it is often necessary to set input levels as high as possible to the point that the signal is just below (or sometimes exceeding, for effect) the maximum level that the recording medium (e.g. tape) can handle. This is done to try to offset the fact that tape has a fairly low signal to noise ratio, so quiet recordings can lose some detail as it gets buried in ‘tape hiss’. Furthermore the “distortion” resulting from pushing the levels can add to the desired recording result. Especially when recording rock guitars.
However, if you push your recording levels too high in a digital audio system, you will get a result that sounds terrible! This noise is known as clipping. It looks like this in an audio editor:
With digital recording (particularly at 24-bit) trying to record as ‘hot’ as possible is not particularly necessary and it can make things more difficult to balance later on. Digital clipping is almost always undesirable and if multiple tracks are recorded very loud close to the maximum digital level (0dBFS) then the sum of these tracks will likely exceed this, causing unwanted distortion.
So beware, do not push the levels of individual tracks. And when you come to the final mix, adjust the levels of each track so that the final file does not contain clipping either.
What does clipping sound like? It sounds like very bad static. Not nice! And almost NEVER wanted.
The process of sampling is not quite as straightforward as it might seem. Sampling can play tricks with sound. If too few samples are taken per cycle of the audio signal then the samples may be interpreted as representing a wave other than that originally sampled. This is one way of understanding the phenomenon know as aliasing. An ‘alias’ is an unwanted representation of the original signal that arises when the sampled is reconstructed incorrectly during the digital to analog conversion. The diagrams below will help to demonstrate aliasing (also know as a foldover effect).
The thick black dots represent samples, the dotted line in (3) shows the audio signal as reconstructed by the DAC. Every cycle of the sine waveform shown in (1) is sampled eight times (as shown in 2). The signal is reconstructed correctly by the DAC process (3). The reconstructed signal will sound the same as the original.
In (5) above, only ten samples are taken from the eleven cycles of the audio signal (4). When the DAC tries to reconstruct the signal, as shown by the dashed lines in (6) then a sine waveform results but the frequency has been completely changed due to the foldover effect (aliasing). The reconstructed audio signal will not sound the same as the original.
You can generalise from the explanation of aliasing above to say that as long as there at least two samples per period of the original waveform, then the resynthesized waveform will have the same frequency. Fewer than two samples per period, and the frequency (and maybe the actual sound) of the original signal is lost.
You will often hear of the Nyquist-Shannon sampling theorem, which is precisely defined as follows:
In order to be able to reconstruct a signal, the sampling frequency must be at least twice the frequency of the signal being sampled.
This might just sound like a lot of number theorising, but it is actually very important! The maximum frequency you can record at a sample rate of 44.1 kHz (ie CD quality) is 22.05 kHz. The full spectrum of human hearing is between 20 Hz (lowest) and 20 kHz (highest). So CD quality does just cover the spectrum of human hearing.
However, many people hear information (referred to as “air”) in the region around the 20 kHz “limit” of human hearing. Many analog systems can reproduce frequencies beyond 25 kHz. Scientific experiments confirm the effects of sounds above 22 kH from both physiological and subjective viewpoints.
This is why many people swear that vinyl sounds better than any digital file!
Furthermore, the lack of “frequency headroom” can cause problems with high-frequency partials. From an artistic standpoint, high-sampling rate recordings are preferable. But there is the practical problem of storage. Higher sampling rates = additional storage. Plus, even if you do record and edit at higher sampling rates, at some point the mix will eventually be played back at CD quality.
The overview of digital audio recording and playback above shows that anti-aliasing filters are required to filter out any frequencies which are too high to be recorded at the given sample rate.
With almost every sample in digital audio, the actual value of the signal lies somewhere in-between two possible values (depending on the bit depth). The converter’s solution is to simply round it off or ‘quantise’ it to the nearest value. So again, recording at a higher bit-depth will reduce errors when recording and mixing.
However, with mastering, the sample rate and bit depth of the final track is often reduced to its final digital format (CD quality of 16-bit and 44.1 kHz or , even greater reduction, to MP3 which is a compressed format). When the the 24-bit audio files are saved as a 16-bit then some information gets deleted and ‘re-quantised’ which results in a grainy static sound. This distortion can be prevented by adding low-level random noise (dither) to the signal.
Do not worry, your DAW software will have algorithms incorporated to perform the dithering. The dithering process adds random noise to the lowest 8 bits of the 24-bit signal before they are truncated to 16 bits. Then noise shaping is used to reduce the resulting hiss.
What you do need to know, is that you should not switch between 24-bit and 16-bit more than once, as this will lead to more distortion and a loss of quality!
If you can, record, edit and mix at the highest bit-depth available. Make sure all your recordings are at the same bit-depth. Once you are happy with your final mix, then dither to 16-bits as the very last step in your mastering process.
Do not re-dither material that has already been dithered. When doing cross-fade between two files, make sure each is non-dithered.
In other words, keep everything consistent.
One problem you may encounter in your digital record studio is time-delay or latency. This can accumulate in the signal chain. With all the calculations that have to occur in digital audio, it can take anywhere from a few milliseconds to a few DOZEN milliseconds for the audio signal to be played back via headphones or monitors.
- with a 0-11ms delay it is so short you probably won’t notice
- with an 11-22 ms delay you will hear an annoying slapback effect which can take a lot of getting used to
- with a 22 ms+ delay, it will become almost impossible to play or sing in time with a track
To try and keep latency at a minimum experiment with the following:
- Deactivate all unncecessary plugins while you are recording
- Adjust your DAW buffer settings to find the shortest time your computer can handle without crashing or freezing up
- Make sure you have not just got your audio interface working as a plug and play device, especially in Windows, if there are low-latency drivers available
- For Windows/PC users: If your recording device does not have low latency drivers, (ASIO drivers), then experiment with ASIO4ALL universal drivers
- Use the ‘direct monitor’ feature on your audio interface so you can directly hear your audio signal, rather than the processed signal.
Digital Audio Recording vs MIDI Recording
The string of numbers generated by the Analog to Digital Converter are not related to MIDI data. Your DAW (Digital Audio Workstation) recording software will almost certainly be capable of recording both audio and MIDI tracks. (MIDI is the Musical Instrument Digital Interface specification – a widely used protocol for control of digital music systems.)
It is very common for newcomers to digital audio recording to muddle up audio and MIDI recording. This whole post is concerned with audio recording. Your audio tracks represent the actual sound you can hear. MIDI tracks contain control information, not the sound of anything. MIDI tracks are used to control synths, virtual instruments, digital pianos etc. The video below explains the difference between MIDI and Audio if you are still a bit confused.
Digital Audio File Formats
When you record you will save your audio signal to an audio file. So it is worth understanding the different audio file formats that you will come across.
WAV files are uncompressed audio, also know as lossless audio files. This preserves all the original characteristics of the digital audio signal. A WAV file comprises a header and then all the individual samples. WAV files are recognized by all quality audio editing software. They are most commonly used when working on a windows PC.
AIFF (or AIF)
The Audio Interchange Format File is the Mac OS equivalent of the WAV files. They are of equivalent standard, and are the audio format of choice for audio engineers working on a Mac. However, you can almost always use AIFF and WAV on both mac and PC.
MP3s are digital audio files that are encoded using a lossy compression format. When a WAV or AIF file is compressed to MP3, those parts of the recorded audio that the human ear has difficulty hearing, or that make little difference to the overall sound of the original are discarded in the compression process.
The great advantage of the MP3 format is its drastically reduced file size. This makes it suitable for easy distribution over the internet, and allows small devices such as iPods to store many more files than would otherwise be possible.
There are other proprietary compressed file formats used by iTunes and other music platforms.
Windows Media Auiod is another lossy compression format, which was developed by Microsoft. It was introduced as a competitor to the MP3 file, though has never been as widely used.
The OGG Vorbis format is an audio codec (encoder/decoder) that is open source and free. It uses lossless compression and is generally considered to deliver a superior audio quality to MP3. However, files rendered in this format cannot always be played on the most popular portable audio devices.
Just for completeness, MIDI data is often stored in MIDI files (MID). As you learned earlier, MIDI data and Audio recordings are not the same thing …. so MIDI files are comparatively very small. This is because they do not contain any information about how tracks sound. They only contain control information to determine how MIDI instruments should be played.
In summary, the most common audio file formats are WAV, AIF and MP3.
You should also be aware that all DAW software like Reaper, Pro Tools, Ableton, Cubase etc, and even Audacity, will also have their own file formats for storing entire projects. Within those projects there may be multiple audio files containing separate tracks.
So Now You Are a Digital Audio Guru …
Well, this is quite a lengthy post! It attempts to cover everything you need to know about digital audio so that you can have a basic understanding of what you are doing as you make your audio recordings in your favourite DAW or recording software.
However, the world of digital audio is a big and complicated subject, so you may wish to learn more, as your experience develops. So where should you go next?
If you’d like some more information on Digital Audio, then you could move onto Wikipedia.
The other option is to checkout some of the excellent courses on audio engineering that are available on the Udemy online course platform. Another option is to think about a subscription to LinkedIn Learning. For a modest monthly fee you can access all their courses on audio production.
Finally, if you want a really excellent book that offeres a complete step-by-step approach to professional audio recording then my personal favourite is Practical Recording Techniques.
Elsewhere on this site, you might like our complete guide to home recording studio setup, which covers all the essential equipment you need for your home studio.