Digital Audio Watermarking

The previous section of this report details demonstrations and explanations of various perceptual audio phenomena as perceived by the human auditory system.  These phenomena reveal some of the limitations of the human auditory system which can be exploited through technologies such as perceptual encoding (MP3), and digital watermarking.

Digital watermarking is a technology which allows a secret message to be hidden in a computer file, without the detection of the user.  The watermark is not apparent to the user, and does not affect in any way, the use of the original file.  Watermark information is predominantly used to identify the creator of a digital file, i.e. a picture, a song, or text.

Digital audio watermarking involves the concealment of data within a discrete audio file.  Applications for this technology are numerous.  Intellectual property protection is currently the main driving force behind research in this area.  To combat online music piracy, a digital watermark could be added to all recording prior to release, signifying not only the author of the work, but the user who has purchased a legitimate copy.  Newer operating systems equipped with digital rights management software (DRM) will extract the watermark from audio files prior to playing them on the system.  The DRM software will ensure that the user has paid for the song by comparing the watermark to the existing purchased licences on the system.

Other non-rights related uses for watermarking technology include embedding auxiliary information which is related to a particular song, like lyrics, album information, or a small web page, etc.  Watermarking could be used in voice conferencing systems to indicate to others which party is currently speaking.  A video application of this technology would consist of embedding subtitles or closed captioning information as a watermark.


DC Watermarking Scheme

This section details the implementation of a digital audio watermarking scheme, which can be used to hide auxiliary information within a sound file.  Although this watermarking scheme is for instructional use as a tool for perceptual audio education, it provides an overview of techniques which are common to all digital audio watermarking schemes.

The DC watermarking scheme hides watermark data in lower frequency components of the audio signal, which are below the perceptual threshold of the human auditory system.


Watermark Insertion

The process of inserting a digital watermark into an audio file can be divided into four main processes (see Figure 8).  A original audio file in wave format is fed into the system, where it is subsequently framed, analyzed, and processed, to attach the inaudible watermark to the output signal.


Figure 7.  Watermark Insertion Process.



The audio file is portioned into frames which are 90 milliseconds in duration.  This frame size is chosen so that the embedded watermark does not introduce any audible distortion into the file.

With a 90 ms frame size, our bit rate for watermarked data is equal to 1 / 0.09 = 11.1 bits per second.


Spectral Analysis

Subsequent to the framing of the unprocessed audio signal, we perform spectral analysis on the signal, consisting of a fast Fourier transform (FFT), which allows us to calculate the low frequency components of each frame, as well as the overall frame power.  The FFT processing is accomplished in Matlab, using the following equation:

                                                                                    (N denotes the last frame in the audio file)

With a standard 16 bit CD quality audio file having a sampling rate, Fs =  44,100 samples per second, a frame consists of 3969 samples.  If we  perform a FFT on a frame of this size with N = 3969, we end up with a  frequency resolution as follows:




From the FFT, we are now able to determine the low frequency (DC) component of the frame F(1), as well as the frame spectral power.  To calculate the frame power, we use the sum of amplitude spectrum squared:



Figure 9 below shows an example of the above spectral analysis completed on the first eight frames of an audio file.  The spectrum plot is restricted to frequencies from 0 to 4000 Hz for visibility.


Figure 8.  Sample spectrum from first 8 signal frames.


DC Removal

From the above spectral analysis of each frame, we have calculated the low frequency (DC) component F(1), which can now be removed by subtraction from each frame using the following formula:


Watermark Signal Addition

From the spectral analysis completed previously, we calculated the spectral power for each frame, which is now utilised for embedding the watermark signal data.  The power in each frame determines the amplitude of the watermark which can be added to the low frequency spectrum.  The magnitude of the watermark is added according to the formula:

Where Ks is the scaling factor, which ensures the watermark is embedded below the audibility threshold, and w(n) represents the watermark signal data, which is binary, having a value of 1, or -1.

The f(n) function has now been watermarked with the above process, and is ready for storage, testing, and watermark extraction.

Watermark Extraction

The process of extracting the digital watermark from the audio file is similar to the technique for inserting the watermark.  The computer processing requirements for extraction are slightly lower.  A marked audio file in wave format is fed into the system, where it is subsequently framed, analysed, and processed, to remove the embedded data which exists as a digital watermark.


Figure 9.  Watermark Extraction Process.



As with the insertion process, the audio file is partitioned into frames which are 90 milliseconds in duration.  With a 90 ms frame size, we expect an extracted watermark data rate equal to 11.1 bits per second.


Spectral Analysis

Subsequent to the framing of the watermarked audio signal, we perform spectral analysis on the signal, consisting of a fast Fourier transform (FFT), which again allows us to calculate the low frequency components of each frame, as well as the overall frame power.  The FFT processing is accomplished in Matlab, using the previous FFT equation.

As before, with 16 bit CD quality audio, our frames consist of 3969 samples.


Watermark Signal Extraction

From the spectral analysis completed previously, we calculated the spectral power for each frame, which allows us to examine the low frequency power in each frame and subsequently extract the watermark, according to the following formula:

                                                                         (N denotes the last frame in the audio file)

The extracted watermark signal, w(n), should be an exact replica of the original watermark, providing the original audio file has enough power per frame to embed information below the audible threshold, and above the quantization floor.



This DC watermarking scheme has some major limitations with regards to robustness and data density.  The robustness of the scheme can be increased somewhat with longer audio files, by inserting the watermark signal multiple times, which will aid in extraction, and also in error correction if the signal is manipulated.

In order to attain higher hidden data density in the watermarked signal, more advanced techniques must be used such as spread spectrum, phase encoding, or echo hiding.  The highest rate and most versatile and reliable watermarking scheme would consist of a combination of all of the above, allowing the software to capitalise on the strengths of each technique when processing the unmarked audio.

Other Watermarking Techniques

As mentioned previously in this report, there are numerous other watermarking techniques which are still the subject of research, which offer increased robustness, and higher data rates than the method which was implemented in this report.


            Phase Encoding

This watermarking technique exploits the human auditory system’s lack of sensitivity to absolute phase changes by encoding the watermark data in an artificial phase signal.

Phase encoding works by breaking the audio signal into frames, and performing spectral analysis on each frame.  Once the spectrum has been computed, the magnitude and phase of consecutive frames are compared, and an artificial phase signal is created to transmit data (see Figure 11).  The artificial phase is modulated in with the phase from each frame, and the new phase frames are combined to form the watermarked signal.


Figure 10.  Phase Coding Scheme.  Figure courtesy of Bender, et al.


The modified phase frames can also be smoothed to limit the amount of distortion present in the marked signal (Figure 12), but in minimizing distortion, the data rate of the watermark is constrained respectively.


Figure 11.  Phase Smoothing.  Figure courtesy of Bender, et al.


The phase encoding watermarking technique offers higher data rates over the previous methods, averaging from 8 to 32 bits per second.  This technique is increasingly effective in the presence of noise.


Spread Spectrum Watermarking

This watermarking technique relies on direct sequence spread spectrum (DSSS) to spread the watermarked signal over the entire audible frequency spectrum such that it approximates white noise, at a power level as to be inaudible.  A pseudorandom sequence (chip) is used to modulate a carrier wave which creates the spread signal watermark code (Figure 13).  This code is attenuated to a level roughly equal to 0.5% of the dynamic range of the original audio file, before being mixed with the original.


Figure 12.  Spread Spectrum.  Figure courtesy of Bender, et al.


The data rate from this technique is much lower than previous methods, and averages around 4 bits per second.  The low data rate is compensated by the robustness of this algorithm due to high noise immunity.


Echo Watermarking

The echo data hiding technique relies on distorting an audio signal in a way which is perceptually dismissed by the human auditory system as environmental distortion.

The original audio signal is copied into two segments (kernels), one which leads the original signal in time, and one which lags.  Each kernel represents either a zero or a one bit for watermark data transmission.  The bit stream of watermark data is used to mix the two kernels together (Figure 14).  The signals are mixed with gradually sloped transitions to reduce distortion.


Figure 13.  Echo Watermarking.  Figure courtesy of Bender, et al.


Future Applications

Digital audio watermark technology is an active research area in industry and at the post-graduate level.  The force currently driving development is intellectual property protection, via copy-prevention and detection systems.  The digital watermark has great potential to be used as part of an overall system for managing IP rights, and can be used not only to signify the author of a particular audio file, but catalog the path a particular file takes if it is distributed in an unauthorized manner.  With the rise of online music download sites such as Apple’s itunes, there is increased pressure to implement a comprehensive content management system.  Companies like Verance have already released watermarking tools for the commercial market, and others will be following suit.  This is a research area which will continue to grow over subsequent years.

An increase in high bandwidth internet connections has also prompted the motion picture industry to take note of possible revenue losses due to unauthorized movie distribution via the internet.  Microsoft is currently developing new watermark technologies and is in the process of testing future operating systems equipped with DRM for all media types.

The success of IP protection schemes over past years, including most current watermarking technology, is limited.  Virtually all the technologies introduced to date have been cracked or have proven unsuitable due to the degradation and distortion they introduce – for example, past watermarking schemes from Verance have proven audible when demonstrated for record companies.  Watermarking digital audio becomes particularly difficult when the file is subject to a lossy compression schemes such as MP3 or WMA.  These perceptual compression schemes, and current watermarking schemes, take advantage of many of the same limitations of the human auditory system – creating a problem whereby watermark information can be distorted and lost.  There is a narrow margin between hiding watermark information in ways which pass through perceptual encoders, and still cannot be perceived by listeners.