3 Digital conversion of analogue sound
3.1 The advantages of digital audio
There is a primary reason why digital recordings appeal to sound archivists. Once digital encoding has been achieved, they can in principle last for ever without degradation, because digital recordings can in principle be copied indefinitely without suffering any further loss of quality. This assumes: (1) the media are always copied in the digital domain before the errors accumulate too far, (2) the errors (which can be measured) always lie within the limits of the error-correction system (which can also be measured), and (3) after error-correction has been achieved, the basic digits representing the sound are not altered in any way. When a digital recording goes through such a process successfully, it is said to be “cloned.”
Both in theory and in practice, no power-bandwidth product (Section 2.3) is lost when cloning takes place - there can be no further loss in quality. However, it also means the initial analogue-to-digital conversion must be done well, otherwise faults will be propagated forever. In fact, the Compact Digital Disc (CD) has two “layers” of error-correction, and (according to audio folk-culture) the format was designed to be rugged enough to allow a hole one-sixteenth of an inch diameter (1.5mm) to be drilled through the disc without audible side-effects.
For all these reasons, the word “digital” began to achieve mystical qualities to the general public, many of whom evidently believe that anything “digital” must be superior! I am afraid much of this chapter will refute that idea. It will also be necessary to understand what happens when a recording is copied “digitally” without actually being “cloned”.
The power-bandwidth products of most of today’s linear pulse-code modulation media exceed most of today’s analogue media, so it seems logical to copy all analogue recordings onto digital carriers anyway, even if the digital coding is slightly imperfect. But we must understand the weaknesses of today's systems if we are to avoid them (thus craftsmanship is still involved!), and we should ideally provide test-signals to document the conversion for future generations.
If you are a digital engineer, you will say that digital pulse-code modulation is a form of “lossless compression”, because we don’t have to double the ratio between the powers of the background-noise and of the overload-point in order to double the power-bandwidth product. In principle, we could just add one extra digital bit to each sample, as we shall see in the next section. Now I am getting ahead of myself; but I mention this because there is sometimes total lack of understanding between digital and analogue engineers about such fundamental issues. I shall therefore start by describing these fundamental issues very thoroughly. So I must apologise to readers on one side of the fence or the other, for apparently stating the obvious (or the incomprehensible).
A digital recording format seems pretty idiot-proof; the data normally consists of ones and zeros, with no room for ambiguity. But this simply isn’t the case. All digital carriers store the digits as analogue information. The data may be represented by the size of a pit, or the strength of a magnetic domain, or a blob of dye. All these are quantified using analogue measurements, and error-correction is specifically intended to get around this difficulty (so you don’t have to be measuring the size of a pit or the strength of a tiny magnet).
Unfortunately, such misunderstandings even bedevil the process of choosing an adequate medium for storing the digitised sound. There are at least two areas where these misunderstandings happen. First we must ask, are tests based on analogue or digital measurements (such as “carrier-to-noise ratio” or “bit error-rates”)? And secondly, has the digital reproducer been optimised for reproducing the analogue features, or is it a self-adjusting “universal” machine (and if so, how do we judge that)?
Finally, the word “format” even has two meanings, both of which should be specified. The digits have their own “format” (number of bits per sample, sampling-frequency, and other parameters we shall meet later in this chapter); and the actual digits may be recorded on either analogue or digital carrier formats (such as Umatic videocassettes, versions of Compact Digital discs, etc.). The result tends to be total confusion when “digital” and “analogue” operators try communicating!
3.2 Technical restrictions of digital audio - the “power” element
I shall now look at the principles of digital sound recording with the eyes of an analogue engineer, to check how digital recording can conserve the power-bandwidth product. I will just remind you that the “power” dimension defines the interval between the faintest sound which can be recorded, and the onset of overloading.
All digital systems overload abruptly, unless they are protected by preceding analogue circuitry. Fortunately analogue sound recordings generally have a smaller “power” component to their power-bandwidth products, so there is no need to overload a digital medium when you play an analogue recording. Today the principal exception concerns desktop computers, whose analogue-to-digital converters are put into an “electrically noisy” environment. Fortunately, very low-tech analogue noise-tests define this situation if it occurs; and to make the system “idiot-proof” for non-audio experts, consumer-quality cards often contains an automatic volume control as well (chapter 10).
Unfortunately, we now have two philosophies for archivists, and you may need to work out a policy on this matter. One is to transfer the analogue recording at “constant-gain,” so future users get a representation of the signal-strength of the original, which might conceivably help future methods of curing analogue distortion (section 4.15). The other is that all encoders working on the system called Pulse Code Modulation (or PCM) have defects at the low end of the dynamic range. Since we are copying (as opposed to doing “live” recording), we can predict very precisely what the signal volume will be before we digitise it, set it to reach the maximum undistorted volume, and drown the quiet side-effects.
These low-level effects are occupying the attention of many talented engineers, with inevitable hocus-pocus from pundits. I shall spend the next few paragraphs outlining the problem. If an ideal solution ever emerges, you will be able to sort the wheat from the chaff and adopt it yourself. Meanwhile, it seems to me that any future methods of reducing overload distortion will have to “learn” - in other words, adapt themselves to the actual distortions present - rather than using predetermined “recipes”.
All PCM recordings will give a “granular” sound quality if they are not “dithered.” This is because wanted signals whose volume is similar to the least significant bit will be “chopped up” by the lack of resolution at this level. The result is called “quantisation distortion.” One solution is always to have some background noise.
Hiss may be provided from a special analogue hiss-generator preceding the analogue-to-digital converter. (Usually this is there whether you want it or not!) Alternatively it may be generated by a random-number algorithm in the digital domain. Such “dither” will completely eliminate this distortion, at the cost of a very faint steady hiss being added. The current debate is about methods of reducing, or perhaps making deliberate use of, this additional hiss.
Today the normal practice is to add “triangular probability distribution” noise, which is preferable to the “rectangular probability distribution” of earlier days, because you don’t hear the hiss vary with signal volume (an effect called “modulation noise”). Regrettably, many “sound cards” for computers still use rectangular probability distribution noise, illustrating the gulf which can exist between digital engineers and analogue ones! It also illustrates why you must be aware of basic principles on both sides of the fence.
Even with triangular probability distribution noise, in the past few years this strategy has been re-examined. There are now several processes which claim to make the hiss less audible to human listeners while simultaneously avoiding quantisation distortion and modulation noise. These processes have different starting-points. For example, some are for studio recordings with very low background noise (at the “twenty-bit level”) so they will sound better when configured for sixteen-bit Compact Discs. Such hiss might subjectively be fifteen decibels quieter, yet have an un-natural quality; so other processes aim for a more “benign” hiss. A process suggested by Philips adds something which sounds like hiss, but which actually comprises pseudo-random digits which carry useful information. Another process (called “auto-dither”) adds pseudo-random digits which can be subtracted upon replay, thereby making such dither totally inaudible, although it will still be reproduced on an ordinary machine. Personally I advocate good old triangular probability distribution noise for the somewhat esoteric reason that it’s always possible to say where the “original sound” stopped and the “destination medium” starts.
All this is largely irrelevant to archivists transferring analogue recordings to digital, except that you should not forget “non-human” applications such as wildlife recording. There is also a risk of unsuspected cumulative build-ups over several generations of digital processing, and unexpected side-effects (or actual loss of information) if some types of processing are carried out upon the results.
If you put a recording through a digital process which drops the volume of some or all of the recording so that the remaining hiss (whether “dither” or “natural”) is less than the least-significant bit, it will have to be “re-dithered.” We should also perform re-dithering if the resulting sounds involve fractions of a bit, rather than integers; I am told that a gain change of a fraction of a decibel causes endless side-effects! One version of a widely-used process called The Fast Fourier Transform splits the frequency-range into 2048 slices, so the noise energy may be reduced by a factor of 2048 (or more) in some slices. If this falls below the least-significant bit, quantisation distortion will begin to affect the wanted signal.
In my personal view, the best way of avoiding these troubles is to use 24-bit samples during the generation of any archive and objective copies which need digital signal processing. The results can then be reduced to 16-bits for storage, and the side-effects tailored at that stage. Practical 24-bit encoders do not yet exist, because they have insufficiently low background noise; but this is exactly why they solve the difficulty. Provided digital processing takes place in the 24-bit domain, the side-effects will be at least 48 decibels lower than with 16-bit encoding, and quantisation distortion will hardly ever come into the picture at all.
On the other hand, if 16-bit processing is used, the operator must move his sound from one process to the next without re-dithering it unnecessarily (to avoid building up the noise), but to add the dither whenever it is needed to kill the distortion. This means intelligent judgement throughout. The operator must ask himself “Did the last stage result in part or all of the wanted signal falling below the least-significant bit?” And at the end of the processes he must ask himself “Has the final stage resulted in part or all of the signal falling below the least-significant bit?” If the answer to either of these questions is Yes, then the operator must carry out a new re-dithering stage. This is an argument for ensuring the same operator sees the job through from beginning to end.
3.3 Technical limitations of digital audio: the “bandwidth” element
In this section I shall discuss how the frequency range may be corrupted by digital encoding. The first point is that the coding system known as “PCM” (almost universally used today) requires “anti-aliasing filters.” These are deliberately introduced to reduce the frequency range, contrary to the ideals mentioned in section 2.4.
This is because of a mathematical theorem known as “Shannon’s Sampling Theorem.” Shannon showed that if you have to take samples of a time-varying signal of any type, the data you have to store will correctly represent the signal if the signal is frequency-restricted before you take the samples. After this, you only need to sample the amplitude of the data at twice the cut-off frequency. This applies whether you are taking readings of the water-level in a river once an hour, or encoding high-definition television pictures which contain components up to thirty million per second.
To describe this concept in words, if the “float” in the river has “parasitic oscillations” due to the waves, and bobs up and down at 1Hz, you will need to make measurements of its level at least twice a second to reproduce all its movements faithfully without errors. If you try to sample at (say) only once a minute, the bobbing-actions will cause “noise” added to the wanted signal (the longer-term level of the river), reducing the precision of the measurements (by reducing the power-bandwidth product).
Any method of sampling an analogue signal will misbehave if it contains any frequencies higher than half the sampling frequency. If, for instance, the sampling-frequency is 44.1kHz (the standard for audio Compact Discs), and an analogue signal with a frequency of 22.06kHz gets presented to the analogue-to-digital converter, the resulting digits will contain this frequency “folded back” - in this case, a spurious frequency of 22.04kHz. Such spurious sounds can never be distinguished from the wanted signal afterwards. This is called “aliasing.”
Unfortunately, aliasing may also occur when you conduct “non-linear” digital signal-processing upon the results. This has implications for the processes you should use before you put the signal through the analogue-to-digital conversion stage. On the other hand, quite a few processes are easier to carry out in the digital domain. These processes must be designed so as not to introduce significant aliasing, otherwise supposedly-superior methods may come under an unquantifiable cloud - although most can be shown up by simple low-tech analogue tests!
The problem is Shannon’s sampling theorem again. For example, a digital process may recognise an analogue “click” because it has a leading edge and a trailing edge, both of which are faster than any naturally-occurring transient sound. But Shannon’s sampling theorem says the frequency range must not exceed half the sampling-frequency; this bears no simple relationship to the slew-rate, which is what the computer will recognise in this example. Therefore the elimination of the click will result in aliased artefacts mixed up with the wanted sound, unless some very computation-intensive processes are used to bandwidth-limit these artefacts.
Because high-fidelity digital audio was first tried in the mid-1970s when it was very difficult to store the samples fast enough, the anti-aliasing filters were designed to work just above the upper limit of human hearing. There simply wasn’t any spare capacity to provide any “elbow-room,” unlike measuring the water-level in a river. The perfect filters required by Shannon’s theorem do not exist, and in practice you can often hear the result on semi-pro or amateur digital machines if you try recording test-tones around 20-25 kHz. Sometimes the analogue filters will behave differently on the two channels, so stereo images will be affected. And even if “perfect filters” are approached by careful engineering, another mathematical theorem called “the Gibbs effect” may distort the resulting waveshapes.
An analogue “square-wave” signal will acquire “ripples” along its top and bottom edges, looking exactly like a high-frequency resonance. If you are an analogue engineer you will criticise this effect, because analogue engineers are trained to eliminate resonances in their microphones, their loudspeakers, and their electrical circuitry; but this phenomenon is actually an artefact of the mathematics of steeply bandwidth-limiting a frequency before you digitise it. Such factors cause distress to golden-eared analogue engineers, and have generated much argument against digital recording. “Professional-standard” de-clicking devices employ “oversampling” to overcome the Gibbs effect on clicks; but this cannot work with declicking software on personal computers, for example.
The Gibbs effect can be reduced by reasonably gentle filters coming into effect at about 18kHz, when only material above the limit of hearing for an adult human listener would be thrown away. But we might be throwing away information of relevance in other applications, for instance analysis by electronic measuring instruments, or playback to wildlife, or to young children (whose hearing can sometimes reach 25kHz). So we must first be sure only to use linear PCM at 44.1kHz when the subject matter is only for adult human listeners. This isn’t a condemnation of digital recording as such, of course. It is only a reminder to use the correct tool for any job.
You can see why the cult hi-fi fraternity sometimes avoids digital recordings like the plague! Fortunately, this need not apply to you. Ordinary listeners cannot compare quality “before” and “after”; but you can (and should), so you needn’t be involved in the debate at all. If there is a likelihood of mechanical or non-human applications, then a different medium might be preferable; otherwise you should ensure that archive copies are made by state-of-the-art converters checked in the laboratory and double-checked by ear.
I could spend some time discussing the promise of other proposed digital encoding systems, such as non-linear encoding or delta-sigma modulation, which have advantages and disadvantages; but I shall not do so until such technology becomes readily available to the archivist. One version of delta-sigma modulation has in fact just become available; Sony/Philips are using it for their “Super Audio CD” (SACD). The idea is to have “one-bit” samples taken at very many times the highest wanted frequency. Such “one-bit samples” record whether the signal is going “up” or “down” at the time the sample is taken. There is no need for an anti-aliasing filter, because Shannon’s theorem does not apply. However, the process results in large quantities of quantisation noise above the upper limit of human hearing. At present, delta-modulation combines the advantage of no anti-aliasing filter with the disadvantage that there is practically no signal-processing technology which can make use of the bitstream. If delta modulation “takes off”, signal processes will eventually become available, together with the technology for storing the required extra bits. (SACD needs slightly more bits than a PCM 24-bit recording sampled at 96kHz). But for the moment, we are stuck with PCM, and I shall assume PCM for the remainder of this manual.
3.4 Operational techniques for digital encoding
I hesitate to make the next point again, but I do so knowing many operators don’t work this way. Whatever the medium, whether it be digital or analogue, the transfer operator must be able to compare source and replay upon a changeover switch. Although this is normal for analogue tape recording, where machines with three heads are used to play back a sound as soon as it is recorded, they seem to be very rare in digital environments. Some machines do not even give “E-E monitoring”, where the encoder and decoder electronics are connected back-to-back for monitoring purposes. So, I think it is absolutely vital for the operator to listen when his object is to copy the wanted sound without alteration. Only he is in a position to judge when the new version is faithful to the original, and he must be prepared to sign his name to witness this. Please remember my philosophy, which is that all equipment must give satisfactory measurements; but the ear must be the final arbiter.
Even if E-E monitoring passes this test, it does not prove the copy is perfect. There could be dropouts on tape, or track-jumps on CD-R discs. Fortunately, once digital conversion has been done, automatic devices may be used to check the medium for errors; humans are not required.
I now mention a novel difficulty, documenting the performance of the anti-aliasing filter and the subsequent analogue-to-digital converter. Theoretically, everything can be documented by a simple “impulse response” test. The impulse-response of a digital-to-analogue converter is easy to measure, because all you need is a single sample with all its bits at “1”. This is easy to generate, and many test CDs carry such signals. But there isn’t an “international standard” for documenting the performance of analogue-to-digital converters. One is desperately needed, because such converters may have frequency-responses down to a fraction of 1Hz, which means that one impulse may have to be separated by many seconds; while the impulse must also be chosen with a combination of duration and amplitude to suit the “sample-and-hold” circuit as well as not to overload the anti-aliasing filter.
At the British Library Sound Archive, we use a Thurlby Thandar Instruments type TGP110 analogue Pulse Generator in its basic form (not “calibrated” to give highly precise waveforms). We have standardised on impulses exactly 1 microsecond long. The resulting digitised shape can be displayed on a digital audio editor.
3.5 Difficulties of “cloning” digital recordings
For the next three sections, you will think I am biased against “digital sound”; but the fact that many digital formats have higher power-bandwidth products should not mean we should be unaware of their problems. I shall now point out the defects of the assumption that digital recordings can be cloned. My earlier assertion that linear PCM digital recordings can be copied without degradation had three hidden assumptions, which seem increasingly unlikely with the passage of time. They are that the sampling frequency, the pre-emphasis, and the bit-resolution remain constant.
If (say) it is desired to copy a digital PCM audio recording to a new PCM format with a higher sampling frequency, then either the sound must be converted from digital to analogue and back again, or it must be recalculated digitally. When the new sampling-frequency and the old have an arithmetical greatest common factor n, then every nth sample remains the same; but all the others must be a weighted average of the samples either side, and this implies that the new bitstream must include “decimals.” For truthful representation, it cannot be a stream of integers; rounding-errors (and therefore quantisation-distortion) are bound to occur.
The subjective effect is difficult to describe, and with practical present-day systems it is usually inaudible when done just once (even with integers). But now we have a quite different danger, because once people realise digital conversion is possible (however imperfect), they will ask for it again and again, and errors will accumulate through the generations unless there is strict control by documentation. Therefore it is necessary to outguess posterity, and choose a sampling-frequency which will not become obsolete.
For the older audio media, the writer’s present view is that 44.1kHz will last, because of the large number of compact disc players, and new audio media (like the DCC and MiniDisc) use 44.1kHz as well. I also consider that for the media which need urgent conservation copying (wax cylinders, acetate-based recording tape, and “acetate” discs, none of which have very intense high frequencies), this system results in imperceptible losses, and the gains outweigh these. For picture media I advocate 48kHz, because that is the rate used by digital video formats.
But there will inevitably be losses when digital sound is moved from one domain to the other, and this will get worse if sampling-frequencies proliferate. Equipment is becoming available which works at 96kHz, precisely double the frequency used by television. Recordings made at 48kHz can then be copied to make the “even-numbered” samples, while the “odd-numbered” samples become the averages of the samples either side. The options for better anti-aliasing filters, applications such as wildlife recording, preservation of transients (such as analogue disc clicks), transparent digital signal-processing, etc. remain open. Yet even this option requires that we document what has happened - future workers cannot be expected to guess it.
Converting to a lower sampling frequency means that the recording must be subjected to a new anti-aliasing filter. Although this filtering can be done in the digital domain to reduce the effects of practical analogue filters and two converters, it means throwing away some of the information of course.
The next problem is pre-emphasis. This means amplifying some of the wanted high frequencies before they are encoded, and doing the inverse on playback. This renders the sound less liable to quantisation distortion, because any “natural” hiss is about twelve decibels stronger. At present there is only one standard pre-emphasis characteristic for digital audio recording (section 7.3), so it can only be either ON or OFF. A flag is set in the digital data-stream of standardised interconnections, so the presence of pre-emphasis may be recognised on playback. And there is a much more powerful pre-emphasis system (the “C.C.I.T” curve) used in telecommunications. It is optimised for 8-bit audio work, and 8 bits were once often used by personal computers for economical sound recording. Hopefully my readers won’t be called upon to work with CCIT pre-emphasis, because digital sound-card designers apparently haven’t learnt that 8-bit encoders could then give the dynamic range of professional analogue tape; but you ought to know the possibility exists! But if the recording gets copied to change its pre-emphasis status, whether through a digital process or an analogue link, some of the power-bandwidth product will be lost each time. Worse still, some digital signal devices (particularly hard-disk editors) strip off the pre-emphasis flag, and it is possible that digital recordings may be reproduced incorrectly after this. (Or worse still, parts of digital recordings will be reproduced incorrectly).
I advise the reader to make a definite policy on the use of pre-emphasis and stick to it. The “pros” are that with the vast majority of sounds meant for human listeners, the power-bandwidth product of the medium is used more efficiently; the “cons” are that this doesn’t apply to most animal sounds, and digital metering and processing (Chapter 3) are sometimes more difficult. Yet even this option requires that we document what has happened - future workers cannot be expected to guess it.
Converting a linear PCM recording to a greater number of bits (such as going from 14-bit to 16-bit) does not theoretically mean any losses. In fact, if it happens at the same time as a sample-rate conversion, the new medium can be made to carry two bits of the decimal part of the interpolation mentioned earlier, thereby reducing the side-effects. Meanwhile the most significant bits retain their status, and the peak volume will be the same as for the original recording. So if it ever becomes normal to copy from (say) 16-bits to 20-bits in the digital domain, it will be easier to change the sampling frequency as well, because the roundoff errors will have less effect, by a factor of 16 in this example. Thus, to summarise, satisfactory sampling-frequency conversion will always be difficult; but it will become easier with higher bit-resolutions. Yet even this option requires that we document what has happened - future workers cannot be expected to guess it.
All these difficulties are inherent - they cannot be solved with “better technology” - and this underlines the fact that analogue-to-digital conversions and digital signal processing must be done to the highest possible standards. Since the AES Interface for digital audio allows for 24-bit samples, it seems sensible to plan for this number of bits, even though the best current converters can just about reach the 22-bit level.
It is nearly always better to do straight digital standards conversions in the digital domain when you must, and a device such as the Digital Audio Research “DASS” Unit may be very helpful. This offers several useful processes. Besides changing the pre-emphasis status and the bit-resolution, it can alter the copy-protect bits, reverse both relative and absolute phases, change the volume, and remove DC offsets. The unit automatically looks after the process of “re-dithering” when necessary, and offers two different ways of doing sampling-rate conversion. The first is advocated when the two rates are not very different, but accurate synchronisation is essential. This makes use of a buffer memory of 1024 samples. When this is either empty or full, it does a benign digital crossfade to “catch up,” but between these moments the data-stream remains uncorrupted. The other method is used for widely-differing sampling-rates which would overwhelm the buffer. This does the operation described at the start of this section, causing slight degradation throughout the whole of the recording. Yet even this option requires that we document what has happened - future workers cannot be expected to guess it.
Finally I must remind you that digital recordings are not necessarily above criticism. I can think of many compact discs with quite conspicuous analogue errors on them. Some even have the code “DDD” (suggesting only digital processes have been used during their manufacture). It seems some companies use analogue noise reduction systems (Chapter 8) to “stretch” the performance of 16-bit recording media, and they do not understand the “old technology.” Later chapters will teach you how to get accurate sound from analogue media; but please keep your ears open, and be prepared for the same faults on digital media!
3.6 Digital data compression
The undoubted advantages of linear PCM as a way of storing audio waveforms are being endangered by various types of digital data compression. The idea is to store digital sound at lower cost, or to transmit it using less of one of our limited natural resources (the electromagnetic spectrum).
Algorithms for digital compression are of two kinds, “lossless” and “lossy.” The lossless ones give you back the same digits after decompression, so they do not affect the sound. There are several processes; one of the first (Compusonics) reduced the data to only about four-fifths the original amount, but the compression rate was fixed in the sense that the same number of bits was recorded in a particular time. If we allow the recording medium to vary its data-rate depending on the subject matter, data-reduction may be two-thirds for the worst cases to one-quarter for the best. My personal view is that these aren’t worth bothering with, unless you’re consistently in the situation where the durations of your recordings are fractionally longer than the capacity of your storage media.
For audio, some lossless methods actually make matters worse. Applause is notorious for being difficult to compress; if you must use such compression, test it on a recording of continuous applause. You may even find the size of the file increases.
But the real trouble comes from lossy systems, which can achieve compression factors from twofold to at least twentyfold. They all rely upon psychoacoustics to permit the digital data stream to be reduced. Two such digital sound recording formats were the Digital Compact Cassette (DCC) and the MiniDisc, each achieving about one-fifth the original number of bits; but in practice, quoted costs were certainly not one-fifth! While they make acceptable noises on studio-quality recordings, it is very suspicious that no “back-catalogue” is offered. The unpredictable nature of background noise always gives problems, and that is precisely what we find ourselves trying to encode with analogue sources. Applause can also degenerate into a noisy “mush”. The real reason for their introduction was not an engineering one. Because newer digital systems were designed so they could not clone manufactured CDs, the professional recording industry was less likely to object to their potential for copyright abuse (a consideration we shall meet in section 3.8 below).
Other examples of digital audio compression methods are being used for other applications. To get digital audio between the perforation-holes of 35mm optical film, cinema surround-sound was originally coded digitally into a soundtrack with lossy compression. Initial reports suggested it sometimes strained the technology beyond its breaking-point. While ordinary stereo didn’t sound too bad, the extra information for the rear-channel loudspeakers caused strange results to appear. An ethical point arises here, which is that the sound-mixers adapted their mixing technique to suit the compression-system. Therefore the sound was changed to suit the medium. (In this case, no original sound existed in the first place, so there wasn’t any need to conserve it.)
A number of compression techniques are used for landline and satellite communication, and here the tradeoffs are financial - it costs money to buy the power-bandwidth product of such media. Broadcasters use digital compression a lot – NICAM stereo and DAB have it - but this is more understandable, because there is a limited amount of electromagnetic spectrum which we must all share, especially for consistent reception in cars. At least we can assume that wildlife creatures or analytical machinery won’t be listening to the radio, visiting cinemas, or driving cars.
The “advantages” of lossy digital compression have six counterarguments. (1) The GDR Archive found that the cost of storage is less than five percent of the total costs of running an archive, so the savings are not great; (2) digital storage (and transmission) are set to get cheaper, not more expensive; (3) even though a system may sound transparent now, there’s no evidence that we may not hear side-effects when new applications are developed; (4) once people think digital recordings can be “cloned”, they will put lossy compression systems one after the other and think they are preserving the original sound, whereas cascading several lossy compression-systems magnifies all the disadvantages of each; (5) data compression systems will themselves evolve, so capital costs will be involved; (6) even digital storage media with brief shelf-lives seem set to outlast current compression-systems.
There can be no “perfect” lossy compression system for audio. In section 1.2 I described how individual human babies learned how to hear, and how a physiological defect might be compensated by a psychological change. Compression-systems are always tested by people with “normal” hearing (or sight in the case of video compression). This research may be inherently wrong for people with defective hearing (or sight). Under British law at least, the result might be regarded as discriminating against certain members of the public. Although there has not yet been any legal action on this front, I must point out the possibility to my readers.
With all lossy systems, unless cloning with error-correction is provided, the sound will degrade further each time it is copied. I consider a sound archive should have nothing to do with such media unless the ability to clone the stuff with error-correction is made available, and the destination-media and the hardware also continue to be available. The degradations will then stay the same and won’t accumulate. (The DCC will do this, but won’t allow the accompanying text, documentation, and start-idents to be transferred; and you have to buy a special cloning machine for MiniDisc, which erases the existence of edits).
Because there is no “watermark” to document what has happened en route, digital television is already giving endless problems to archivists. Since compression is vital for getting news-stories home quickly - and several versions may be cascaded depending on the bitrates available en route - there is no way of knowing which version is “nearest the original”. So even this option requires that we document what has happened if we can - future workers cannot be expected to guess it.
Ideally, hardware should be made available to decode the compressed bit-stream with no loss of power-bandwidth product under audible or laboratory test-conditions. This isn’t always possible with the present state-of-the-art; but unless it is, we can at least preserve the sound in the manner the misguided producers intended, and the advantages of uncompressed PCM recording won’t come under a shadow. When neither of these strategies is possible, the archive will be forced to convert the original back to analogue whenever a listener requires, and this will mean considerable investment in equipment and perpetual maintenance costs. All this underlines my feeling that a sound archive should have nothing to do with lossy data compression.
My mention of the Compusonics system reminds me that it isn’t just a matter of hardware. There is a thin dividing line between “hardware” and “software.” I do not mean to libel Messrs. Compusonics by the following remark, but it is a point I must make. Software can be copyrighted, which reduces the chance of a process being usable in future.
3.7 A severe warning
I shall make this point more clearly by an actual example in another field. I started writing the text of this manual about ten years ago, and as I have continued to add to it and amend it, I have been forced to keep the same word-processing software in my computer. Unfortunately, I have had three computers during that time, and the word-processing software was licensed for use on only one machine. I bought a second copy with my second machine, but by the time I got my third machine the product was no longer available. Rather than risk prosecution by copying the software onto my new machine, I am now forced to use one of the original floppies in the floppy disk drive, talking to the text on the hard drive. This adds unnecessary operational hassles, and only works at all because the first computer and the last happen to have had the same (copyright) operating system (which was sheer luck).
For a computer-user not used to my way of thinking, the immediate question is “What’s wrong with buying a more up-to-date word-processor?” My answer is threefold. (1) There is nothing wrong with my present system; (2) I can export my writings in a way which avoids having to re-type the stuff for another word-processor, whereas the other way round simply doesn’t work; (3) I cannot see why I should pay someone to re-invent the wheel. (I have better things to spend my time and money on)! And once I enter these treacherous waters, I shall have to continue shelling out money for the next fifty years or more.
I must now explain that last remark, drawing from my own experience in Britain, and asking readers to apply the lessons of what I say to the legal situation wherever they work. In Britain, copyright in computer software lasts fifty years after its first publication (with new versions constantly pushing this date further into the future). Furthermore, under British law, you do not “buy” software, you “license” it. So the normal provisions of the Consumer Protection Act (that it must be “fit for the intended purpose” - i.e. it must work) simply do not apply. Meanwhile, different manufacturers are free to impose their ideas about what constitutes “copying” to make the software practicable. (At present, this isn’t defined in British law, except to say that software may always legally be copied into RAM - memory in which the information vanishes when you switch off the power). Finally, the 1988 Copyright Act allows “moral” rights, which prevent anyone modifying anything “in a derogatory manner”. This right cannot be assigned to another person or organisation, it stays with the author; presumably in extreme cases it could mean the licensee may not even modify it.
It is easy to see the difficulties that sound archivists might face in forty-nine years time, when the hardware has radically changed. (Think of the difficulties I’ve had in only ten years with text!) I therefore think it is essential to plan sound archival strategy so that no software is involved. Alternatively, the software might be “public-domain” or “home-grown,” and ideally one should have access to the “source code” (written in an internationally-standardised language such as FORTRAN or “C”), which may subsequently be re-compiled for different microprocessors. I consider that even temporary processes used in sound restoration, such as audio editors or digital noise reduction systems (Chapter 3) should follow the same principles, otherwise the archivist is bound to be painted into a corner sooner or later. If hardware evolves at its present rate, copyright law may halt legal playback of many digital recording formats or implementation of many digital signal processes. Under British law, once the “software” is permanently stored in a device known as an “EPROM” chip, it becomes “hardware”, and the problems of copyright software evaporate. But this just makes matters more difficult if the EPROM should fail.
I apologise to readers for being forced to point out further “facts of life,” but I have never seen the next few ideas written down anywhere. It is your duty to understand all the “Dangers of Digital”, so I will warn you about more dangers of copyright software. The obvious one, which is that hardware manufacturers will blame the software writers if something goes wrong (and vice versa), seems almost self-evident; but it still needs to be mentioned.
A second-order danger is that publishers of software often make deliberate attempts to trap users into “brand loyalty”. Thus, I can think of many word-processing programs reissued with “upgrades” (real or imagined), sometimes for use with a new “operating system”. But such programs usually function with at least one step of “downward compatibility”, so users are not tempted to cut their losses and switch to different software. This has been the situation since at least 1977 (with the languages FORTRAN66 and FORTRAN77); but for some reason no-one seems to have recognised the problem. I regret having to make a political point here; but as both the computer-magazine and book industries are utterly dependent upon not mentioning it, the point has never been raised among people who matter (archivists!).
This disease has spread to digital recording media as well, with many backup media (controlled by software) having only one level of downwards compatibility, if that. As a simple example, I shall cite the “three-and-a-half inch floppy disk”, which exists in two forms, the “normal” one (under MS-DOS this can hold 720 kilobytes), and the “high-density” version (which can hold 1.44 megabytes). In the “high density” version the digits are packed closer together, requiring a high-density magnetic layer. The hardware should be able to tell which is which by means of a “feeler hole”, exactly like analogue audiocassettes (Chapter 6). But, to cut costs, modern floppy disk drives lack any way of detecting the hole, so cannot read 720k disks. The problem is an analogue one, and we shall see precisely the same problem when we talk about analogue magnetic tape. Both greater amplification and matched playback-heads must coexist to read older formats properly.
Even worse, the downwards-compatibility situation has spread to the operating system (the software which makes a computer work at all). For example, “Windows NT” (which was much-touted as a 32-bit operating system, although no engineer would see any advantage in that) can handle 16-bit applications, but not 8-bit ones. A large organisation has this pistol held to its head with greater pressure, since if the operating system must be changed, every computer must also be changed - overnight - or data cannot be exchanged on digital media (or if they can, with meaningless error-messages or warnings of viruses). All this arises because pieces of digital jigsaw do not fit together.
To a sound archivist, the second-order strategy is only acceptable so long as the sound recordings do not change their format. If you think that an updated operating-system is certain to be better for digital storage, then I must remind you that you will be storing successive layers of problems for future generations. Use only a format which is internationally-standardised and widely-used (such as “Red Book” compact disc), and do not allow yourself to be seduced by potential “upgrades.”
A “third-order” problem is the well-known problem of “vapourware.” This is where the software company deliberately courts brand-loyalty by telling its users an upgrade is imminent. This has four unfavourable features which don’t apply to “hardware.” First, no particular delivery-time is promised - it may be two or three years away; second, the new version will need to be re-tested by users; third, operators will have to re-learn how to use it; and almost inevitably people will then use the new version more intensely, pushing at the barriers until it too “falls over.” (These won’t be the responsibility of the software company, of course; and usually extra cash is involved as well). Even if the software is buried in an EPROM chip (as opposed to a removable medium which can be changed easily), this means that sound archivists must document the “version number” for any archive copies, while the original analogue recording must be preserved indefinitely in case a better version becomes available.
And there are even “fourth-order” problems. The handbook often gets separated from the software, so you often cannot do anything practical even when the software survives. Also, many software packages are (deliberately or accidentally) badly-written, so you find yourself “trapped in a loop” or something similar, and must ring a so-called “help-line” at a massive cost to your telephone bill. Even this wouldn’t matter if only the software could be guaranteed for fifty years into the future . . . .
Without wishing to decry the efforts of legitimate copyright owners, I must remind you that many forms of hardware come with associated software. For example, every digital “sound card” I know has copyright software to make it work. So I must advise readers to have nothing to do with sound cards, unless they are used solely as an intermediate stage in the generation of internationally-standardised digital copies played by means of hardware alone.
3.8 Digital watermarking and copy protection.
As I write this, digital audio is also becoming corrupted by “watermarking”. The idea is to alter a sound recording so that its source may be identified, whether broadcast, sent over the Internet, or whatever. Such treatment must be rugged enough to survive various analogue or digital distortions. Many manufacturers have developed inaudible techniques for adding a watermark, although they all corrupt “the original sound” in the process. As I write this, it looks as though a system called “MusiCode” (from Aris Technologies) will become dominant (Ref. 1). This one changes successive peaks in music to carry an extra encoded message. As the decoding software will have the same problems as I described in the previous section, it won’t be a complete answer to the archivist’s prayer for a recording containing its own identification; but for once I can see some sort of advantage accompanying this “distortion.” On the other hand, it means the archivist will have to purchase (sorry, “license”) a copy of the appropriate software to make any practical use of this information. And of course, professional sound archivists will be forced to license both it and all the other watermarking systems, in order to identify unmodified versions of the same sound.
Wealthy commercial record-companies will use the code to identify their products. But such identification will not identify the copyright owner, for example when a product is licensed for overseas sales, or the artists own the copyright in their own music. This point is developed on another page of the same Reference (Ref. 2), where a rival watermarking system is accompanied by an infrastructure of monitoring-stations listening to the airwaves for recordings with their watermarks, and automatically sending reports to a centralised agency which will notify the copyright owners.
I am writing this paragraph in 1999: I confidently predict that numerous “watermarking” systems will be invented, they will allegedly be tested by professional audio listeners using the rigorous A-B comparison methods I described in section 3.4, and then discarded.
All this is quite apart from a machine-readable identification to replace a spoken announcement (on the lines I mentioned in section 2.9). Yet even here, there are currently five “standards” fighting it out in the marketplace, and as far as I can see these all depend upon Roman characters for the “metadata”. I will say no more.
Another difficulty is “copy protection.” In a very belated attempt to restrict the digital cloning of commercial products, digital media are beginning to carry extra “copy protection bits.” The 1981 “Red Book” standard for compact discs allowed these from Day One; but sales people considered it a technicality not worthy of their attention! But people in the real world - namely, music composers and performers - soon realised there was an enormous potential for piracy; and now we have a great deal of shutting of stable doors.
The Compact Disc itself carries any “copy protect” flag, and many countries now pay royalties on retail sales of recordable CDs specifically to compensate music and record publishers. (Such discs may be marketed with a distinctly misleading phrase like “For music”, which ought to read “For in-copyright published music, and copyright records, only.” Then, when it doesn’t record anything else, there could be action under the Trades Descriptions Act.) So a professional archive may be obliged to pay the “music” royalty (or purchase “professional” CD-Rs) to ensure the copy-protect flags are not raised. Meanwhile, blank CD-ROM discs for computers (which normally permit a third layer of error-correction) can also be used for audio, when the third layer is ignored by CD players (so the result becomes “Red Book” standard with two layers). The copy-protect bit is not (at the time of writing) raised; so now another company has entered the field to corrupt this third layer of error-correction and prevent copying from on a CD-ROM drive. (Ref. 3)
Most other media (including digital audio tape and MiniDisc) add a “copy-protect” bit when recorded on an “amateur” machine, without the idea of copyright being explained at any point - so the snag only becomes apparent when you are asked to clone the result. The “amateur” digital connection (SP-DIF) carries start-flags as well as copy-protect flags, while the “professional” digital connection (AES3) carries neither. So if you need to clone digital recordings including their track-flags, it may be virtually impossible with many combinations of media.
A cure is easy to see - it just means purchasing the correct equipment. Add a digital delay-line with a capacity of ten seconds or so using AES connectors. Use a source-machine which does a digital count-down of the time before each track ends. Then the cloning can run as a background task to some other job, and meanwhile the operator can press the track-increment button manually with two types of advance-warning - visual and aural.
3.9 The use of general-purpose computers
Desktop computers are getting cheaper and more powerful all the time, so digital sound restoration techniques will become more accessible to the archivist. So far these processes are utterly dependent upon classical linear PCM coding; this is another argument against compressed digital recordings.
Unfortunately, desktop computers are not always welcomed by the audio fraternity, because most of them have whirring disk-drives and continuous cooling fans. Analogue sound-restoration frequently means listening to the faintest parts of recordings, and just one desktop computer in the same room can completely scupper this process. Finally, analogue operators curse and swear at the inconvenient controls and non-intuitive software. The senses of touch and instant responses are very important to an analogue operator.
It is vital to plan a way around the noise difficulty. Kits are available which allow the “system box” to be out of the room, while the keyboard screen and mouse remain on the desktop. Alternatively, we might do trial sections on noise-excluding headphones, and leave the computer to crunch through long recordings during the night-time. In section 1.4 I expressed the view that the restoration operator should not be twiddling knobs subjectively. A computer running “out of real-time” forces the operator to plan his processing logically, and actually prevents subjective intervention.
This brings us to the fact that desktop computers are only just beginning to cope with real-time digital signal processing (DSP), although this sometimes implies dedicated accelerator-boards or special “bus architectures” (both of which imply special software). On the other hand, the desktop PC is an ideal tool for solving a rare technical problem. Sound archivists do not have much cash, and there aren’t enough to provide a user-base for the designers of special hardware or the writers of special software. But once we can get a digitised recording into a PC and out again, it is relatively cheap to develop a tailor-made solution to a rare problem, which may be needed only once or twice in a decade.
Even so, archivists often have specialist requirements which are needed more often than that. This writer considers an acceptable compromise is to purchase a special board which will write digital audio into a DOS file on the hard disk of a PC (I am told Turtle Beach Electronics makes such a board, although it is only 16-bit capable, and requires its own software). Then special software can be loaded from floppy disk to perform special signal processing.
3.10 Processes better handled in the analogue domain
The present state-of-the-art means that all digital recordings will be subject to difficulties if we want to alter their speeds. To be pedantic, the difficulties occur when we want to change the speed of a digital recording, rather than its playback into the analogue domain. In principle a digital recording can be varispeeded while converting it back into analogue simply by running it at a different sampling-frequency, and there are a few compact disc players, R-DAT machines, and multitrack digital audio workstations which permit a small amount of such adjustment. But vari-speeding a digital recording can only be done on expensive specialist equipment, often by a constant fixed percentage, not adjustable while you are actually listening to it. Furthermore, the process results in fractional rounding-errors, as we saw earlier.
So it is vital to make every effort to get the playing-speed of an analogue medium right before converting it to digital. The subject is dealt with in Chapter 4; but I mention it now because it clearly forms an important part of the overall strategy. Discographical or musical experts may be needed during the copying session to select the appropriate playing-speed; it should not be done after digitisation.
With other processes (notably noise-reduction, Chapter 3, and equalisation, Chapters 5, 6 and 11) it may be necessary to do a close study of the relationship between the transfer and processing stages. The analogue transfer stage cannot always be considered independently of digital processing stage(s), because correct processing may be impossible in one of the domains. For readers who need to know the gory details, most digital processes are impotent to handle the relative phases introduced by analogue equipment (section 2.11), and become impotent as zeroes or infinities are approached (especially very low or very high frequencies).
To take this thought further, how do we judge that a digital process is accurate? The usual analogue tests for frequency-response, noise, and distortion, should always be done to show up unsuspected problems if they exist. But measuring a digital noise reduction process is difficult, because no-one has yet published details of how well a process restores the original sound. It may be necessary to set up an elementary “before-and-after” experiment - taking a high-fidelity recording, deliberately adding some noise, then seeing if the process removes it while restoring the original high-fidelity sound. The results can be judged by ear, but it is even better to compare the waveforms (e.g. on a digital audio editor). But what tolerances should we aim for? This writer’s current belief is that errors must always be drowned by the natural background noise of the original - I do not insist on bit-perfect reproduction - but this is sometimes difficult to establish. We can, however, take the cleaned-up version, add more background-noise, and iterate the process. Eventually we shall have a clear idea of the limitations of the digital algorithm, and work within them.
Draft standards for the measurement of digital equipment are now being published, so one hopes the digital audio engineering fraternity will be able to agree common methodology for formal engineering tests. But the above try-it-and-see process has always been the way in which operators assess things. These less-formal methods must not be ignored - especially at the leading edge of technology!
One other can-of-worms is becoming talked about as I write - the use of “neural methods.” This is a buzzword based on the assumption that someone’s expertise can be transferred to a digital process, or that an algorithm can adapt itself until the best results are achieved. You are of course free to disagree, but I consider such techniques are only applicable to the production of service-copies which rely on redundancies in the human hearing process. For the objective restoration of the power-bandwidth product, there can be no room for “learning.” The software can only be acceptable if it always optimises the power-bandwidth product, and that is something which can (and should) be the subject of formal measurements. Ideally, any such assumptions or experiences must be documented with the final recording anyway; so when neural techniques do not even allow formal documentation, the objective character of the result cannot be sustained.
3.11 Digital recording media
Although I don’t regard it of direct relevance to the theme of my book, I must warn innocent readers that conservation problems are not necessarily solved by converting analogue sounds to digital media. Some digital formats are more transitory than analogue formats, because they have shorter shelf-lives and less resistance to repeated use. One paper assessed the shelf life of unplayed R-DAT metal-particle tapes as 23 years, and you certainly don’t want to be cloning your entire collection at 22-year intervals! Your digitisation process must therefore include careful choice of a destination medium for the encoded sounds. I could give you my current favourite, but the technology is moving so rapidly that my idea is certain to be proved wrong. (It could even be better to stick everything on R-DAT now, and delay a firm decision for 22 years).
It is also vital to store digitised sounds on media which allow individual tracks and items to be found quickly. Here we must outguess posterity, preferably without using copyright software.
I shall ask you to remember the hassles of copy-protection (section 3.8 above), and next I will state a few principles you might consider when making your decision. They are based on practical experience rather than (alleged) scientific research.
- (1) It is always much easier to reproduce widely-used media than specialist media. It is still quite cheap to install machinery for reproducing Edison cylinders, because there were over a million machines sold by Edison, and both enough hardware and software survives for experience also to survive.
- (2) The blank destination-media should not just have some “proof” of longevity (there are literally thousands of ways of destroying a sound recording, and nobody can test them all)! Instead, the physical principles should be understood, and then there should be no self-evident failure-mechanism which remains unexplained. (For example, an optical disc which might fade).
- (3) The media should be purchased from the people who actually made them. This is (a) so you know for certain what you’ve got, and (b) there is no division of responsibility when it fails.
- (4) Ideally the media (not their packaging) should have indelible batch-numbers, which should be incorporated in the cataloguing information. Then when an example fails, other records from the same batch can be isolated and an emergency recovery-programme begun.
- (5) On the principle “never put all your eggs into one basket,” the digital copy should be cloned onto another medium meeting principles (2) to (4), but made by a different supplier (using different chemicals if possible), and stored in a quite different place.
So, after a long time examining operational strategy, we are now free to examine the technicalities behind retrieving analogue signals from old media.
REFERENCES
- 1: anon, “SDMI chooses MusiCode from Aris to control Internet copying” (news item), London: One To One (magazine), Issue 110 (September 1999), page 10.
- 2: ibid, pages 73-74 and 77.
- 3: Barry Fox, “technology” (article), London: Hi-Fi News & Record Review (magazine), Vol. 44 No. 10 (October 1999), page 27.
-