您所在位置：网站首页 > 行业资料 > 其它行业文档 > 高校教师社会网络自动构建技术研究

高校教师社会网络自动构建技术研究.pdf

65页

卖家[上传人]：206****923

文档编号：46883104

上传时间：2018-06-28

文档格式：PDF

文档大小：891.05KB

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10金贝

下载

/ 65 举报版权申诉马上下载

文本预览

下载提示

常见问题

国内图书分类号：TP391.3 学校代码：10213 国际图书分类号：681.37 密级：公开工工学硕士学硕士学学位论文位论文高校教师社会网络的自动构建技术研究硕士研究生：王长伟导师：王晓龙教授申请学位：工学硕士学科：计算机科学与技术所在单位：计算机科学与技术学院答辩日期： 2011 年 6 月授予学位单位：哈尔滨工业大学 Classified Index: TP391.3 U.D.C: 681.37 Dissertation for the Master Degree in Engineering RESEARCH ON AUTO-CONSTRUCTION TECHNOLOGY FOR UNIVERSITY TEACHER SOCIAL NETWORK Candidate：： Wang Changwei Supervisor：： Prof. Wang Xiaolong Academic Degree Applied for：： Master of Engineering Speciality：： Computer Science and Technology Affiliation：： School of Computer Science and Technology Date of Defence：： June, 2011 Degree-Conferring-Institution：： Harbin Institute of Technology 摘要 - I - 摘要随着互联网的快速发展，网页信息内容出现了爆炸性的增长。

这虽然使得人们从互联网上获取想要的信息变得可能，但是如何从海量信息中快速发现这部分有用内容却是亟待解决的问题另一方面，社交网络的兴起有效地推动了人与人之间的交流，并在一定程度上改变了人们获取信息的方式本课题旨在利用机器学习、数据挖掘等自然语言处理相关技术，以高校教师为研究对象，自动化的构建一个高校教师社会网络，不仅向广大互联网用户提供教师个人信息、研究信息等内容，实现一个更直接、高集成、全方位、多角度的信息展示平台，而且在此基础上打造一个大量科研人员参与的学术交流平台本文重点研究了以下问题：首先，本文实现了一种基于块划分的教师个人信息抽取模型教师个人信息是指姓名、学校、职称等，是教师信息的基本组成部分本文对于互联网上的教师介绍页面，首先进行预处理，然后将其划分成不连续的信息块，利用条件随机域模型对块中的信息项进行标注词级别特征对于基本信息和联系信息的抽取已经有了较好的结果通过将特征从词级别扩充到块级别，能够有效解决教育相关信息项存在的长距离依赖的问题其次，论文发表情况最能反映教师研究信息，本文设计了相应的框架获取教师论文信息获取到的教师论文中存在教师姓名非完全匹配、重名等引入的错误，姓名非完全匹配利用规则方法可以去除，本文重点研究了论文作者重名消歧问题，提出了一种基于层次聚类的消歧策略。

在聚类过程中使用论文的基本信息作为特征，并分别使用了基于先验知识和基于相似度阈值的两种聚类终止条件最后，基于教师基本信息和研究信息，本文研究了高校教师社会网络的构建和社区发现教师之间有多种关系，这里主要依据相同研究方向进行网络构建，并使用了两种方法一是利用主题模型发现教师所有论文的主题分布特征，以此计算每两个教师之间的关系构建出社会网络，然后利用马尔科夫聚类模型进行社区发现另一种方法是利用教师论文关键词集合建立教师之间的联系，针对该网络使用了两种复杂网络聚类算法进行社区发现，并从社区发现质量和时间效率上对这两种方法进行了分析关键词：信息抽取；重名消歧；社会网络；社区发现哈尔滨工业大学工学硕士学位论文 - I - Abstract With the rapid development of Internet, the number of web pages has grown explosively. This makes it possible for people to obtain information from web. But how to acquire the useful information quickly and effectively from information-sea has become an urgent problem. On the other hand, the rise of social networking has effectively promoted the communicatin among people, and to some extent changed the way people access information. This subject aims to use machine learning, datamining and other natural language processing technologies to automatically build a social network of university teachers. Not only to provide Internet users personal information and research information, realize a more direct, high integration, all-round, multi-angle information platform, but also to create an academic exchange platform for researchers. This article focuses on the following issues: First, this article implements a block segmentation model for teacher information extraction. Teachers personal information refers to name, university, professional titles, and so on. They are basic components of teacher’s information. We firstly do the pretreatment with teacher introduction web pages, and then divide them into discrete information blocks. Conditional random fields model are employed to label information fields in the block. For basic information and contact information, word level feature can archieve a good result. By expanding features from word level to block level, it’s can effectively solve the long distance dependence problem with education related information fields. Secondly, as published papers best reflects teacher’s research information, we design a framework to obtain the paper set of a teacher. There are name non- exact match and name ambiguation errors in the paper set. We can easily remove the first type of errors with rules, so this article focuses on the author name disambiguation problem, using a hierarchical clustering based method. Only basic paper information are used as features. The method uses two cluster termination conditions, prior knowledge based and similarity threshold based. Finally, based on teacher personal information and research information, we studied the construction of teacher social network and community detection. There are multiple relationships between teachers, here we build the teacher 哈尔滨工业大学工学硕士学位论文 - II - network according to teacher’s research area. Two methods are employed to achieve the goal. In the first method, topic model are used to find the topic distribution of one teachers’s paper set. We calculate the distance between two teacher according to the distribution feature. Then Markov clustering model is applied to find communities. Another method uses keyword collection of papers to establish links among teachers. Two complex network clustering algorithms are employed to detect the communities in the network. We then analysis the two methods on the community quality and time complexity. Keywords: information extraction, name disambiguation, social network, community detection 哈尔滨工业大学工学硕士学位论文 - III - 目目录录摘要 ..................................................................................................................... I Abstract ..............................................。

点击阅读更多内容