周四下午的学术报告内容摘要和调查

发布人：test 发布时间：2008-03-12 浏览次数:16

ITNLP全体师生：
     明天下午的学术报告摘要如下：
     请大家准时到会（下午2：00）
      会后可能会要求填一张有关报告效果的调查表，采用匿名方式。调查的主要目的是考查学术报告的效果，根据反馈结果调整报告内容及形式，提高学术报告的质量，真正发挥学术报告在学习探讨、科研交流、能力锻炼和成果展示方面的作用，另一方面，为报告人在提高讲述技能、技巧和方法方面提供参考意见，让报告人和参与者都从中受益和提高。所以请认真、如实填写

In this talk, I will first give a brief introduction about how to use machine learning method to solve natural language processing problems according to my personal experiences. Then, I will present part of my work finished in Singapore. The abstract is as following:

Name origin recognition is to identify the source language of a name where it originates from. It is a necessary step for name transliteration/translation because different origins need different translation strategies. It is more important when translating across languages with different orthographic symbols and sound inventories. Previous work used either rule or statistics based methods and was usually based on single knowledge source. In this work, we cast name origin recognition as a classification issue and propose using Maximum Entropy model to solve this problem. Under the proposed framework, we investigate diverse phonetic-rule-based, n-gram statistics and character position features for name origin recognition. Experiments on a publicly available personal name dataset show that our approach can effectively incorporate the above diverse features and achieves an overall accuracy of 98.44% for names written in English and 98.10% for names written in Chinese, which are significantly and consistently better than that of previous methods.

Thanks!

Chengjie Sun, Ph.D Candidate

ITNLP Lab

School of Computer Science and Technology

Harbin Institute of Technology, Harbin, China

P.C.: 150001

Tel: 86-451-86413322-89(O)

Homepage: www.insun.hit.edu.cn/~cjsun

Email: cjsun@insun.hit.edu.cn