一种半监督学习方法及其在命名实体识别中的应用
30页1、2018/11/16,一种半监督学习方法及其在命名实体识别中的应用 李彦鹏,Information Retrieval Laboratory, Dalian University of Technology,Outline,Biomedical named entity recognition Data sparseness Feature coupling generalization Experimental results Current & future work,Biomedical Named Entity Recognition (BioNER ),Recognize named entities in biomedical texts, e.g., genes, proteins, and cells, etc. An important preliminary step for advanced text mining tasks. For example:,The TCF-1 alpha binding site was also required for TCR
2、alpha enhancer activity in transcriptionally active extracts from Jurkat but not HeLa cells, confirming that TCF-1 alpha is a T-cell-specific transcription factor,Challenges of BioNER,Huge vocabulary size. e.g., millions of gene/protein names. Long names, “ethanol repression autoregulation ( ERA ) / twelve-fold TA repeat ( TAB ) repressor element” Ambiguous definition of entity boundaries. The same term can refer to different types in different contexts.,Current state-of-the-art,Challenge evalua
3、tions:,In these challenges,Dictionary look-up methods yield poor performances (50%-70% F-score): Low coverage, e.g., long names, variants Large noise, e.g., common English terms, entities of other types. Machine learning methods show great success. The framework like “Lexical features + regularized linear model” is applied by all the top-performing systems.,Lexical-level features,IL 2 gene,0 1 0 1 1 0 1 1,W=IL, W=2, bigram=IL 2, norm=ILgene, suffix = *ene,Data sparseness in lexical features,Larg
4、e out-of-vocabulary (OOV) rate Terms not in the training corpus are not modeled well Extreme low frequency terms can not provide sufficient information to train a good classifier. Regular expression features can alleviate the problem, but far not enough: Surface information is not always indicative Indicative patterns also lead to sparseness,Overcome data sparseness,Taxonomy based methods Word net, UMLS Depend on their qualities Subspace based methods LSA, KPCA, sparse coding Automatic methods H
《一种半监督学习方法及其在命名实体识别中的应用》由会员ZJ****1分享,可在线阅读,更多相关《一种半监督学习方法及其在命名实体识别中的应用》请在金锄头文库上搜索。
北师大五年级语文下册《叶公好龙》[名师ppt课件]
北师大五年级语文上册《红树林》2[名师ppt课件]
北师大五年级语文上册《浪淘沙》[名师ppt课件]
北师大五年级语文上册《成吉思汗》[名师ppt课件]
北师大五年级语文上册《只有一个地球》9[名师ppt课件]
北师大五年级语文上册《只有一个地球》13[名师ppt课件]
北师大五年级语文上册《只有一个地球》4[名师ppt课件]
北师大四年级语文下册《海上日出》 (2)[名师ppt教学课件]
北师大四年级语文下册《挑山工》[名师ppt教学课件]
北师大四年级语文上册《美丽的集邮册》 (3)[名师ppt教学课件]
北师大四年级语文上册《第一场雪》1[名师ppt教学课件]
北师大四年级语文上册《孔子和学生》[名师ppt教学课件]
北师大三年级语文下册《捞铁牛》[名师ppt课件]
北师大三年级语文下册《珍贵的教科书》3[名师ppt课件]
北师大三年级语文下册《小小的书橱》[名师ppt课件]
北师大三年级语文下册《大自然的语言》 (2)[名师ppt课件]
北师大三年级语文上册《集市和超市》[名师ppt教学课件]
北师大三年级语文上册《我想》[名师ppt教学课件]
北师大三年级语文上册《一只小鸟》[名师ppt教学课件]
北师大二年级语文下册《迷人的夏天》名师教学PPT课件
2024-04-09 20页
2024-04-09 15页
2024-04-09 20页
2024-04-09 26页
2024-04-09 19页
2024-04-09 15页
2024-04-09 20页
2024-04-09 19页
2024-04-09 22页
2024-04-09 9页