当前位置:首页  新闻公告  新闻资讯
ITNLP实验室将于12月25日在新技术楼618举行学术报告会,博士生董启文、王强、李明辉为实验室师生作学术报告
发布人:test  发布时间:2004-12-20   浏览次数:19

1.董启文

题目:“基于词典的蛋白质二级结构预测方法”

摘要:提出了一种新的蛋白质二级结构预测方法。该方法从氨基酸序列中提取出和自然语言中的类似的与物种相关的蛋白质二级结构词条,这些词条形成了蛋白质二级结构词典,该词典描述了氨基酸序列和蛋白质二级结构之间的关系。预测蛋白质二级结构的过程和自然语言中的分词和词性标注一体化的过程类似。该方法把词条序列看成是马尔科夫链,通过Viterbi算法搜索每个词条被标注为某种二级结构类型的最大概率,其中使用词网格描述分词的结果,使用最大熵马尔科夫模型计算词条的二级结构概率。蛋白质二级结构预测的结果是最优的分词所对应的二级结构类型。在四个物种的蛋白质序列上对这种方法进行测试,并和PHD方法进行比较。试验结果显示,这种方法的Q3准确率比PHD方法高3.9%SOV准确率比PHD方法高4.6%。结合BLAST搜索的局部相似的序列可以进一步提高预测的准确率。在50CASP5目标蛋白质序列上进行测试的结果是:Q3准确率为78.9%SOV准确率为77.1%。基于这种方法建立了一个蛋白质二级结构预测的服务器,可以通过http://www.insun.hit.edu.cn:81/demos/biology/index.html来访问。

2.李明辉

Title: Discovering the relationship between protein sequence pattern and protein secondary structure.

Abstract: Understanding protein's structure is necessary for understanding protein's function and biological  process. Although there are many protein structure prediction methods, there is no one that can satisfy biological scientists. Pattern discovery plays an important role in bioinformatics, especially in protein structure and function prediction. Here the relationship between protein sequence pattern and protein secondary structure was discovered in some way. The solution of this problem is important in understanding the rules for mapping amino acid sequence to the structure and function of proteins.

3.王强

TopicResearch on Intelligent Information Retrieval System for Tourism Domain

Introduction:

This report will focus on five aspects about NLP research for IR. Firstly, a brief explanation of the Intelligent information retrieval (intelligent IR) is given, including the common and difference between Intelligent IR and QA; Secondly, the research on intelligent IR we constructed is proposed; Thirdly, a factual Intelligent IR system for tourism domain, including system architecture and processing steps, is introduced. Fourth, further research on Intelligent IR is presented; At last, some conclusions are drawn.