Voice singing by function fitting

WANG Yibu; LI Jianwen

doi:10.3969/j.issn.1000-5641.202022009

Issue 1

Jan. 2021

Turn off MathJax

Article Contents

Article Navigation > Journal of East China Normal University (Natural Sciences) > 2021 > (1): 152-164

WANG Yibu, LI Jianwen. Voice singing by function fitting[J]. Journal of East China Normal University (Natural Sciences), 2021, (1): 152-164. doi: 10.3969/j.issn.1000-5641.202022009

Citation:

WANG Yibu, LI Jianwen. Voice singing by function fitting[J]. Journal of East China Normal University (Natural Sciences), 2021, (1): 152-164. doi: 10.3969/j.issn.1000-5641.202022009

WANG Yibu, LI Jianwen. Voice singing by function fitting[J]. Journal of East China Normal University (Natural Sciences), 2021, (1): 152-164. doi: 10.3969/j.issn.1000-5641.202022009

Citation:

WANG Yibu, LI Jianwen. Voice singing by function fitting[J]. Journal of East China Normal University (Natural Sciences), 2021, (1): 152-164. doi: 10.3969/j.issn.1000-5641.202022009

PDF( 0 KB)

Voice singing by function fitting

doi: 10.3969/j.issn.1000-5641.202022009

WANG Yibu,
LI Jianwen^,

School of Electronic Information and Artificial Intelligence, Shaanxi University of Science and Technology, Xi’an　710021, China

Received Date: 2020-06-10
Publish Date: 2021-01-27

Abstract

Abstract

Intonation is the tone of speech, which is formed by variations in pitch and emphasis; it is one of the characteristics of human emotion transmission. By adjusting the intonation parameters to change the length and height of certain words in discourse, the controlled intonation can mimic the effect of singing; this approach, in turn, can be used to address the lack of research on voice synthesis in singing. The cepstrum method is used to extract the pitch frequency, the LPC (linear predictive coding) method is used to estimate the formant, and a high-order polynomial is used to fit the pitch of the voice; the fitting function is then adjusted in real time to form the tone required to achieve the objective of singing. Given two basic speech parameters, pitch frequency and formant, combined with the mathematical nature of pronunciation, this paper uses an intuitive mathematical method to synthesize the effect of singing; using this method, the original voice and the synthetic voice reach an overall recognition rate of 87.6%. The result of this synthesis shows that by adjusting the parameters of speech synthesis, we can achieve greater control over voice singing.
- intonation,
- tone,
- voice singing,
- cepstrum,
- pitch frequency,
- LPC (linear predictive coding) method,
- formant,
- fitting function

FullText(HTML)

References(24)

References

[1]	杨润. 语音语调中蕴含的情感表达点 [J]. 北方音乐, 2018, 38(15): 61.
[2]	赵一勤, 曹嘉欣, 刘靖禹. 基于语音语调的抑郁症检测软件 [J]. 电脑编程技巧与维护, 2019(5): 37-39.
[3]	徐晨煜. 基于统计机器学习的端到端的语音合成研究 [J]. 电子世界, 2020(6): 77-79.
[4]	王永鑫, 贾珈, 张雨辰, 等. 基于HMM语音合成的语调控制 [J]. 清华大学学报(自然科学版), 2013, 53(6): 781-786.
[5]	吴秀坤. 发声器官的构造与功能 [J]. 中国科技信息, 2006(6): 243.
[6]	陶曙光. 歌唱发声器官的基本构造与发声原理 [J]. 音乐天地, 2015(9): 48-50.
[7]	宋知用. MATLAB在语音信号分与合成中的应用 [M]. 北京: 北京航空航天大学出版社, 2013: 16-20.
[8]	周珺. 在汉语语音识别中语速、音量和音调调整的研究 [D]. 西安: 西安电子科技大学, 2002.
[9]	余叶. 音色在声乐演唱中的运用 [J]. 黄河之声, 2020(2): 70-71.
[10]	彭佳, 许桂清, 吴先球. 具身认知视野下的初中物理概念教学设计优化—以声音的特征“响度”课堂教学为例 [J]. 物理通报, 2020(1): 45-48.
[11]	SCHARINE A A, MCBEATH M K. Natural regularity of correlated acoustic frequency and intensity in music and speech: Auditory scene analysis mechanisms account for integrality of pitch and loudness [J]. Routledge, 2018, 1(3/4): 205-228.
[12]	杨懿. 噪音音乐艺术在古筝演奏中的展现 [J]. 儿童音乐, 2013(8): 62-64.
[13]	王建群, 高下, 刘晓宙, 等. 艺术嗓音中不同唱法的元音共振峰研究 [J]. 临床耳鼻咽喉头颈外科杂志, 2008(15): 679-682.
[14]	王莲子, 李钟晓, 陈倩倩, 等. 基于K-SVD算法和组合字典的语音信号清浊音判决研究 [J]. 青岛大学学报(工程技术版), 2020, 35(2): 17-23.
[15]	BRAUN S. Cepstrum based methods [J]. Mechanical Systems and Signal Processing, 2019, 128: 674-676.
[16]	焦蓓, 曾以成, 侯丽霞. 结合改进自相关与加权小波分量的基音检测法 [J]. 计算机工程与应用, 2013, 49(14): 222-226,246.
[17]	戴维一. 论基于电子音乐音响合成的创作思维 [D]. 上海: 上海音乐学院, 2010.
[18]	刘建新, 曹荣, 赵鹤鸣. 一种LPC改进算法在提取耳语音共振峰中的应用 [J]. 西华大学学报(自然科学版), 2008(3): 77-80,110.
[19]	ILYAS M, OTHMANI A, NAIT-ALI A. Auditory perception based system for age classification and estimation using dynamic frequency sound [J]. Multimedia Tools and Applications, 2020, 79: 21603-21626.
[20]	VAN MAASTRICHT L, ZEE T, KRAHMER E, et al. The interplay of prosodic cues in the L2: How intonation, rhythm, and speech rate in speech by Spanish learners of Dutch contribute to L1 Dutch perceptions of accentedness and comprehensibility [J/OL]. Speech Communication, (2020-04-28)[2020-06-01]. https://doi.org/10.1016/j.specom.2020.04.003.
[21]	郭慧. 汉语普通话陈述句与疑问句语调的声学特征分析 [J]. 文教资料, 2019, 35: 36-39.
[22]	HA-KYUNG K, 赵风云, 刘晓明, 等. 正常青年人不同语料测试基频的研究 [J]. 听力学及言语疾病杂志, 2015, 23(6): 575-577.
[23]	ARUL E. Deep nonlinear regression least squares polynomial fit to detect malicious attack on IoT devices [J/OL]. Journal of Ambient Intelligence and Humanized Computing, (2020-05-14)[2020-06-01]. https://doi.org/10.1007/s12652-020-02075-y.
[24]	宋刚, 姚艳红. 用于汉语单音节声调识别的基频轨迹拟合方法 [J]. 计算机工程与应用, 2008, 29: 239-240, 244.