Francesco CANGEMI, HA Kieu-Phuong, Christian WEITZ & Martine GRICE

University of Cologne


Speaker-specific use of intonational cues for sentence modality and affect contrasts in Standard Vietnamese


Besides a complex tone system with six lexical tones, Standard Vietnamese also makes use of a large number of sentence final particles to express postlexical meaning (e.g. sentence modality contrasts). Recent studies (Đỗ et al. 1998, Nguyễn & Boulakia 1999, Michaud & Vũ 2004, Vũ et al. 2006) show that postlexical meaning can also be expressed intonationally in Vietnamese, through the interplay of pitch, duration, intensity and voice quality cues. Further references (Hạ & Grice 2010, Hạ 2012) show with a corpus of spontaneous interactions that lexical tones can be obscured by intonation in certain discourse functions (e.g. backchannels and repair initiations). Brunelle et al. (2012) show with a corpus of read speech that speakers encode contrasts in sentence modality (question vs. statement) and affect (neutral vs. emphatic) by using continuous phonetic cues (overall intensity, pitch or duration), discrete phonological options (intonational tones), or a combination of both. However, they also documented a large amount of speaker-specific variation. This questions the existence of a straightforward phonologised mapping between phonetic cues and postlexical meaning, especially if the latter can be expressed through other grammatical devices. In this sense, speaker-specific variation could be expected to be greater in the encoding of contrasts which also rely on other cues. According to this “redundancy-specificity” hypothesis, the encoding of sentence modality contrasts (which are also conveyed by final particles) is expected to be more prone to speaker-specific variation than the encoding of discourse functions.
In this talk, we provide some infrastructure for the exploration and the quantification of speaker-specific variability, through an in-depth analysis of the Brunelle et al. (2012) corpus. The dataset features 72 sentences ending with không (lexically high-level, meaning ‘empty/only’ or ‘yes/no question particle’), recorded by 16 speakers of Standard Vietnamese in four different communicative functions (statement, question, annoyed statement and insisting request) yielded by the combination of sentence modality (question vs. statement) and affect (neutral vs. emphatic). The sentences consist of four syllables and are tonally as well as segmentally identical. We first analyse the dataset by profiling each speaker with respect to whether duration and/or f0 are reliably used to encode the modality and/or affect contrasts (top-down analysis). The results are then used to predict the outcome of an unsupervised clustering algorithm, ran on data from the four meaning categories (bottom-up analysis). The solutions for each individual speaker (in terms of the optimal number of clusters and in the categorization errors made by the algorithm) are then interpreted with respect to the profiling yielded by the top-down analysis. The available results show that this method provides a detailed account of speaker-specific variation in a dataset, and that it could prove instrumental in the exploration of the redundancy-specificity hypothesis.