Crawling strategy of focused crawler based on niche genetic algorithm | |
Fan, Huilian1; Zeng, Guangpu1; Li, Xianli2 | |
2009 | |
会议录名称 | 8th IEEE International Symposium on Dependable, Autonomic and Secure Computing, DASC 2009
![]() |
页码 | 591-594 |
会议名称 | 8th IEEE International Symposium on Dependable, Autonomic and Secure Computing, DASC 2009 |
会议日期 | December 12, 2009 - December 14, 2009 |
会议地点 | Chengdu, China |
会议录编者/会议主办者 | IEEE Chengdu Section ; IEEE Computer Society Technical Committee on Scalable Computing ; National Natural Science Foundation of China ; University of Electronic Science and Technology of China |
出版者 | IEEE Computer Society |
摘要 | In order to improve the search efficiency of focused crawler, we design a new crawling strategy based on the niche genetic algorithm. Rather than colleting and indexing all accessible hypertext documents to be able to answer all possible ad-hoc queries, the new crawling strategy, combined the advantages of hyperlinks structure and web content strategies, uses hyperlink as genetic individual and topic-keywords based VSM is used to evaluate individual fitness, and imports new URLs to implement crossover and mutation, and the URLs that have the same prefix are regarded as niche. Guide the crawl direction by niche genetic algorithm to selectively seek out pages that are likely to be most relevant to a pre-defined set of topics. Compared with the other algorithms, experiments show that the strategy has higher precision and recall in searching the topic pages. © 2009 Crown Copyright. |
关键词 | Hypertext systems Query processing Vector spaces Crossover and mutation Focused crawler Hypertext documents Niche genetic algorithm Precision and recall Search efficiency Topic relevancy Vector space models |
DOI | 10.1109/DASC.2009.49 |
收录类别 | EI |
语种 | 英语 |
EI入藏号 | 20101512838066 |
原始文献类型 | Conference article (CA) |
引用统计 | |
文献类型 | 会议论文 |
条目标识符 | https://ir.cqcet.edu.cn/handle/39TD4454/3314 |
专题 | 重庆电子科技职业大学 |
作者单位 | 1.School of Mathematics and Computer Science, Yangtze Normal University, Fuling, Chongqing, China; 2.Electronics Information Department, Chongqing College of Electronic Engineering, Chongqing, China |
推荐引用方式 GB/T 7714 | Fan, Huilian,Zeng, Guangpu,Li, Xianli. Crawling strategy of focused crawler based on niche genetic algorithm[C]//IEEE Chengdu Section, IEEE Computer Society Technical Committee on Scalable Computing, National Natural Science Foundation of China, University of Electronic Science and Technology of China:IEEE Computer Society,2009:591-594. |
条目包含的文件 | 下载所有文件 | |||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | ||
Fan-2009-Crawling St(266KB) | 会议论文 | 开放获取 | CC BY-NC-SA | 浏览 下载 | ||
Fan-2009-Crawling st(266KB) | 会议论文 | 开放获取 | CC BY-NC-SA | 浏览 下载 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论