This experiment is the primary experiment, designed to yield similarity judgments among all voices in the database, separately for each gender. The resulting matrix will be used to determine the number and characteristic features of individual voice types through subsequent perceptual testing and the utilization of a hierarchical clustering method.
- Existing Database: 150 Voices (75 Male and 75 Female, each representing 3 age categories--young, middle-age, and old)
- All voices were recorded reading speech
- Samples will be omitted that possess any significant regional dialect, as well as speakers of advanced age
- Total Listeners: 100 (50 Male and 50 Female; 10 people per data set: stratified sampling)
- Listeners will rank all possible pairs of voice within a gender on a 1-7 scale
- 50 voices per gender for a total of 2,450 trials
- Each listener will hear a stratified sample of the 2,450 pairings, specifically 490 trials per person for 1-1.5 hours of total testing per person
The resulting similarity matrix from Experiment 1 will be submitted to Hierarchical Clustering Scheme (HCS) analysis to group the voice types into "n" number of voice types, hierarchically (from as few as two to as many as half the voice sample size). The resulting HCS levels function to group all of the voices by similarity, but also provide the necessary framework for the judgments rendered by expert listeners in Experiment 2. Figure 1 illustrates a hypothetical HCS output that groups voices into types whose population increases at higher levels of analysis.
Experiment 2: Category Number and Labels
This experiment is performed after the primary experiment using trained/expert participants to determine the number of voice types, given the HCS results from Experiment 1. Expert participants are given all of the voices in the sample to listen to and are free to create as many groups as they like, totally under the listener's control. This experiment requires concentration and a commitment to do the task with a degree of vigilance. As such, trained and dedicated "expert" participants from the Speech Perception Lab at UF will be utilized.
The purpose of Experiment 2b is to take clusters of individual voices arranged hierarchically in Experiment 2a and determine 1) the total number of clusters that will constitute the inventory of voice types and 2) coin the best descriptive labels for each voice type (this is an open set task). These determinations will be made by ten expert participants who will listen to samples of all of the voices within a gender set, arranged visually in the HCS hierarchy. The modal number of clusters across all ten participants from the Speech Perception Lab at UF will be utilized.
This experiment also uses trained/expert participants to determine the labels for the voice types. In this case, the subjects will be provided with a limited, but familiar set of adjectives to describe the quality, type, and distinguishing features of the voices in a given set. "Here are some voice type labels, apply the best 50 labels to your categories that you can, creating the best match--though you may not be totally happy with the results." Adjectives and vocabulary items to describe human voices and vocal pleasantness will be obtained in part through through literature on voice quality and interviews with members of the acting and singing departments at UF and voice casting agents (where the result of these experiments will eventually find utility) and from the adjectives generated in Experiment 2b.. Examples might include: grumbly, twangy, raspy, nasally, etc.