当前位置:首页  新闻公告  新闻资讯
ITNLP实验室将于本周三(10月11日)上午8:30在新技术楼618#举行学术活动,会上由肖镜辉博士生为大家作报告
发布人:test  发布时间:2006-10-10   浏览次数:16
Title: A Similarity-Based Smoothing Algorithm for Chinese Language Modeling and its Application on Pinyin-to-Character Conversion
Abstract: Data sparseness is a common and inherent problem of statistical language model which greatly damage the model performance and limit its applications. But current smoothing methods are too simple to further exploit the linguistic knowledge and prevent the performance improvement. By using word semantic information, this paper introduces a similarity-based smoothing algorithm for Chinese language modeling which combines word similarity calculation with back-off smoothing method, and presents an iterative method to optimize the parameters in the algorithm. Furthermore, the similarity-based smoothing algorithm is extended from low-level language model to high-level model. By applying our method to Pinyin-to-Character conversion system, the experiment shows that our method improves the performance of language model significantly and reduces the error rate of Pinyin-to-Character conversion system effectively.