Speaker Verification System

Speaker Verification Submenu



Threshold Generation Using Multiple Verification Sessions
Currently, the threshold is set based on the average distortion calculated by one verification session. If for some reason there is a large variance in that one session, then in future verification sessions, there may be very high false acceptances or very high false rejections.

To account for this, several verification sessions should be held and the average distortion factors from each session should be summed and averaged before scaling as the threshold.

Weighting the Code Vectors
If the codebooks for several users were compared, it can be seen that certain code words are generated relatively close to each other among the different users. Accuracy could be improved by creating some sort of weighting for each code word in each codebook. This way, code words that occur frequently would be weight less in the average distortion calculations.

Code Book Adaptation Over Time
An idea that was discussed was allowing the codebook of each user to adapt over successful verification sessions. This would modify the thresholds and codebooks each time the user successfully passed the verification test. However, it was debated on whether it would then be plausible to convert a given user's codebook to match that of a different user. Thus defeating the purpose of the system.

Signal Normalization
In the Mel-Frequency transform, the output of the filterbanks depends on the power of the signal. This implies that speaking loudly will be seen differently than quietly. By normalizing the recording signal, this effect can be reduced.