Speaker Verification System

Speaker Verification Submenu

- high level description
- preprocessing
- feature extraction
- enrollment
- threshold creation
- verification

 High Level Description

The Speaker Verifcation process can be broken up into a number of modules as shown in Fig. 1 below. A one second voice signal sampled at 16kHz first undergoes preprocessing to to minimize analysis errors. Preprocessing includes silence removal and high pass filtering. The processed voice signal then goes into the feature extraction module that identifies the voice characteristics. The next step depends on what the specific task, whether it is enrollment, threshold creation, or verification.

In enrollment, the feature vectors (from the feature extraction module) are compressed using vector quantisation. The compressed vectors are referred to as the voice print. These are stored onto the computer's hard drive.

In threshold creation, the feature vectors are compared with the existing voice print to create the threshold used for verification.

In verification, the feature vectors are compared in the same way as threshold creation but the difference is compared to the threshold to arrive at a pass or fail verdict. Pass means that the speaker was verified to be who they claim.

Figure 1: High Level System Diagram