
基于听觉感知模型和统计学习的语音鲁棒处理.pdf
115页上海交通大学博士学位论文基于听觉感知模型和统计学习的语音鲁棒处理姓名:张文军申请学位级别:博士专业:控制理论与控制工程指导教师:谢剑英20030910上海交通大学博士学位论文 i 基于听觉感知模型和统计学习的语音鲁棒处理 摘 要 语音技术被广泛应用于多样环境之前依然面临着各种挑战例如如何在具有环境噪声和通道失真的情况下加强语音处理技术的鲁棒性 语音鲁棒处理的研究是开展包括语音识别语音合成语种识别以及说话人识别在内的语音学研究的基础和关键也是语音库建立过程中的重要工作 目前语音处理系统的识别率和语音合成的自然度还不能令人满意 其根本原因是对自然语音的研究不够深入不能准确归纳描述和模拟自然语音的规律语音处理技术的进展必须依靠现实环境中各种语音数据的语料库的收集整理和发布 本文主要目的是研究语音鲁棒处理技术 提高噪声环境中语音切分的鲁棒性 然后在此基础上具体实现语音库建设辅助工具 本文首先基于人类的听觉感知模型 研究了语音信号的时频分析方法 构造了满足听觉感知模型的非均匀完全重构滤波器组 完成了基于最大似然估计的子带语音去噪算法实现了基于 MDL最小描述长度的自适应平滑子带语音鲁棒端点检测算法 其次讨论了基于隐马尔可夫模型语音切分的缺陷 指出了韵律因素对语音切分的影响提出了语音鲁棒切分的贝叶斯框架最后描述了标注图的主要思想提出了基于 XML 的语音标注体系结构并利用可扩展标注语言 XMLVisual Basic 和 SQL 实现了语音库建设辅助工具的原形系统 具体标注了孤立数字语音库 连续数字串语音库和用于说话人识别的特殊语音库本文的主要贡献包括 ? 基于人类听觉感知模型在完全重构滤波器组的时域条件基础上利用 Bark变换和全通系统实现了满足听觉感知模型的非均匀完全重构滤波器组 ? 根据小波去噪的基本原理 在最大似然谱估计的基础上 引入了自适应机制调整不同子带的门限 得到了适合于缓变非平稳噪声的子带语音去噪方法 概率密度的计算利用了正交基下概率密度的计算思想 ? 通过启发式边缘聚焦的思想首先通过双门限方法得到语音的低能量区然后采用基于最小描述长度MDL的自适应平滑算法确定不同子带的边缘 最后利用模糊决策模型综合了不同子带的结果 实现了鲁棒的子带语音端点检测 上海交通大学博士学位论文 ii ? 基于贝叶斯决策方法 分析了语音分割中韵律因素的影响和基于隐马尔可夫模型语音切分的缺陷 提出了语音鲁棒分割的贝叶斯框架 实现了用于贝叶斯框架的语音分割模型 ? 基于 XML 的语音标注体系结构利用标注图的理论框架建立了语音库建设辅助工具的原形系统 并实现了孤立数字语音库 连续数字串语音库和用于说话人识别的特殊语音库 本文进行了大量的仿真研究和实验 同时将改进后的算法同原算法进行了比较 结果表明我们提出的算法是有价值的 关键词子带语音最小描述长度MDL听觉模型贝叶斯方法语音切分标注图 小波去噪端点检测 上海交通大学博士学位论文 iii Robust Speech Processing Based on Auditory Model and Statistical Learning Abstract Before applied in the adverse environment far and wide, speech technique is still confronted with varieties of challenge, for example, how to enhance the robustness of speech technique in the condition of noise and channel distortion. The study of robust speech processing play the key role in speech recognition, speech synthesis and speaker recognition, it is also the important basis of producing speech corpora. The ultimate reason why speech applications presently does not turn up trumps is that our study of natural speech is not thorough enough to induce, depict and simulate the rule of natural speech by rule and line, so the development of speech technique builds upon the collection, settlement and issuance of varieties of speech corpora in real-life environment. Our study aims at studying robust speech processing in order to satisfy the robustness of speech segmentation in adverse environment, and implementing tools for assisting speech corpora production based on annotation graph. In this paper, we firstly study the time-frequency analysis of speech based on auditory model, construct non-uniform PR filter banks based on auditory models, realize sub-band speech de-noising based on ML, and accomplish fuzzy sub-band speech endpoint detection based adaptive smoothing using MDL (minimal description length). Secondly, we discuss the speech segmentation based on Hidden Markov Model, study the effect of rhythm in this problem and put forward the Bayesian framework of robust speech segmentation. Thirdly, we describe the principle of annotation graph, and then put forward the architecture of speech labeling based XML, finally realize the example system of tools for assisting speech corpora production utilizing XML, Visual Basic and SQL. Finally, we practically label some speech corpora, for example isolated-digital speech corpora. The contribution and innovation of this paper include: ? According to the condition of PR filter banks in time domain, we analyze human’s auditory model, and then realize the non-uniform PR filter banks based on auditory model using the Bark transformation; ? Utilizing the principle of wavelet de-noising, we introduce the adaptive mechanism based on ML to tune the threshold in different sub-band speech, and obtain sub-band speech de-noising method fitted for the slow- changed non-stationary noise. 上海交通大学博士学位论文 iv ? Based on the thought of heuristic “edge- focus”, we firstly fix on the low-energy areas using “double-threshold” method, and then make use of the adaptive smoothing using MDL in sub-band speech to locale the actual endpoint. In order to synthesize the result o f different sub-band speech, we utilize the fuzzy decision model to achieve the robust sub-band speech endpoint detection. ? We analyze the effect of rhythm and limitation of speech segmentation based on HMM, and then build the Bayesian framework of robust speech segmentation using Bayesian decision theory, finally realize the segmentation model used by Bayesian speech segmentation. ? After constituting the speech labeling architecture based on XML using the principle of annotation-graph, we build the example system of tools for assisting speech corpora production, and then label some speech corpora, for example isolated-digital speech corpora. Many simulations and experiments show these algorithms effective. KeywordSub-band speech, MDL (minimal description length)Auditory Model, Bayesian Decision, Speech Segmentation, Wavelet De-noising, Endpoint Detection, Annotation-graph上海交通大学学位论文原创性声明本人郑重声明所 呈 交 的 学 位 论 文是 本 人 在 导 师 的 指 导 下独 立 进 行 研究 工 作 所 取 得 的 成 果除 文 中 已 经 注 明 引 用 的 内 容 外本 论 文 不 包 含 任 何其 他 个 人 或 集 体 已 经 发 表 或 撰 写 过 的 作 品 成 。
