The movement of the throat during speech provides a promising pathway for information transmission, but the complex dynamics of the vocal cords and surrounding muscles can generate complex signals in the throat, posing a challenge to the optimal placement of sensors for signal acquisition. In this article, the team of Academician Diao Dongfeng from Shenzhen University published a paper titled "Throat map of speech recognition achieved by flexible ultra sensitive carbon array sensors with deep learning" in the journal Carbon. The research proposes a "throat map" that utilizes flexible carbon array sensors and deep learning based signal processing methods to determine the optimal placement coordinates of sensors required for high-precision speech recognition. In the process of sensor preparation, graphene nanocrystalline carbon film is deposited as the sensing unit by electron cyclotron resonance, and then transferred to a polydimethylsiloxane (PDMS) substrate and integrated with a stretchable circuit. The sensor contains 16 sensing units within a 2 × 2 cm ² area, achieving an ultra-high sensitivity coefficient (>1000) and a frequency response limit of 10000 Hz.
In terms of signal processing, we have proposed for the first time a signal position classification method based on convolutional neural network algorithm. This method visualizes coordinate information centered around the Adam's apple and constructs a throat map to identify the location that produces the most unique and consistent signal. With the help of throat maps, the selected sensing units achieved a classification accuracy of over 96% for 14 phonemes through deep learning models. This throat map can serve as a guide for sensor layout in speech and throat language recognition applications.
We propose a method for constructing a "throat map" to enhance the recognition ability of phoneme information by detecting throat motion signals. This method is based on two key technological breakthroughs: a feasible preparation of flexible ultra sensitive carbon array sensor technology for capturing multi-point subtle throat motion signals in narrow throat areas; And a deep learning based SPC method that creates throat maps by accumulating position weights. The carbon array sensor has 16 sensing units within an area of 2 × 2 cm ², with a high sensitivity coefficient (>1000) and a high-frequency response limit of up to 10000 Hz. The construction of the throat map provides coordinate information for arranging sensing units in the throat area to achieve phoneme recognition. By using 5 sensing units, this method achieved a high accuracy of over 96% in recognizing 14 phonemes. This article proposes a throat map construction method that provides an efficient solution for the position layout of multi-sensor units in motion recognition.
