当前位置:首页  新闻公告  新闻资讯
ITNLP实验室将于本周三(9月13日)上午9:00在新技术楼618#举行学术活动,会上由徐永东博士生为大家作报告
发布人:test  发布时间:2006-09-12   浏览次数:16

Title:Discovering Topic Boundaries for Multi-document Automatic Summarization Based on Statistical Model

 Abstract:In general, a document should be regarded as form of some coherent units which are called discourse segments. Discovering the segment boundaries is an important task for many natural language processing applications. In this paper, we propose a new Chinese text topic boundaries identification method based on multiple features fusion. Our approach firstly extracts multiple features of topics shift from text. For each feature, we adopt corresponding F-dotplotting model to respectively calculate the boundary values of neighboring sentences. Subsequently, the useful features among above cues are automatically select and combined to determine topic boundaries automatically by a statistical method based on logistic regression analysis. The experimental result shows that the F-dotplotting method is more effective than common dotplotting method and the multiple features fusion method based on the logistic regression model can effectively improve Chinese text topic segmentation performance.