Speaker
Verification System |
||||
Speaker Verification Submenu Overview |
feature extraction | |||
Figure 3: Mel-Cepstrum Block Diagram
Each block is windowed to minimize spectral distortion and discontinuities. A Hamming window is used. The Fast Fourier Transform is then applied to each windowed block at the beginning of the Mel-Cepstral Transform. After this stage, the spectral coefficients of each block are generated. The Mel Frequency Transform is then applied to each spectral block to convert the scale to a mel scale. The mel scale is a logarithmic scale similar to the way the human ear perceives sound. A filter bank of 29 filters captures frequency bands representative of the mel-scale. See the figure below for the filterbank plot. The output powers of each of the filters are then put through a discrete cosine transform to arrive at the Mel-Frequency coefficients. Each frame gets 12 coefficients. |