MRI of the Vocal Tract and Articulators’ Automatic Delineation

Speaker: Karyna Isaieva

Date and place:November 5, 2020 at 10:30, VISIO-CONFERENCE

Abstract:

MRI is a very popular technology that enables fully non-invasive and non-ionizing investigation of the vocal tract. It has multiple applications including studies of healthy speech as well as some medical applications (pathological speech studies, swallowing, etc.). We acquired a database of 10 healthy subjects (5 men/5 women) with two different approaches: static 3D imaging of the vocal tract fixed in positions corresponding to different French phonemes and real-time 2D imaging of the mid-sagittal plane of the vocal tract during the speech. Real-time MRI is a relatively novel technology with very high time resolution (20 ms in our case). However, gain of time resolution leads to losses in quality: the images become quite smoothed. Also, due to the finite slice thickness, we can observe contours superposition, and fast motion creates some specific artifacts. That is why it is important to find a good segmentation approach to enable further exploitation of the articulators’ contours. In contrast to other similar works, we decided to segment not the articulators themselves, but their boundaries, since most of the articulators on MR images do not represent closed contours.
For this, we have chosen U-Net convolutional neural network with 1-plixel width contours as ground-truth segmentation maps. Predicted images are probability maps and need some post-processing. We used Dijkstra shortest path search to extract the contours from the probability maps. The approach was tested on 2 subjects with only the tongue segmentation and the results are promising [1].
The current work consists in applying these results to the acquired database of 10 subjects. To choose a small representable dataset from the whole bulk of images (10x16x2200 images), we used k-means clustering (n=100) applied to the images of each subject. We take the closest to the centroid images, so that our dataset consists of 100 images for each of 10 subjects. Also, we applied cropping and some data augmentation (rotations, scaling), which improved results of inter-subject prediction for the tongue. Further work will consist in delineation of other articulators and usage of other learning and/or post-processing algorithms. The results will be applied for creation of co-articulation models and for acoustical simulations. Medical aspects which can be extracted from the segmented images, will be also investigated in IADI laboratory.
[1] Isaieva, K. et al. Automatic Tongue Delineation from MRI Images with a Convolutional Neural Network Approach. Appl. Artif. Intell. 00 (2020) 1–9.