中国综合性科技类核心期刊(北大核心)

中国科学引文数据库来源期刊(CSCD)

美国《化学文摘》(CA)收录

美国《数学评论》(MR)收录

俄罗斯《文摘杂志》收录

Message Board

Respected readers, authors and reviewers, you can add comments to this page on any questions about the contribution, review, editing and publication of this journal. We will give you an answer as soon as possible. Thank you for your support!

Name
E-mail
Phone
Title
Content
Verification Code
Issue 5
Sep.  2010
Turn off MathJax
Article Contents
JING Han-xing, CHEN Shao-hong, YU Kun. Automatic web data extraction based on tree alignment[J]. Journal of East China Normal University (Natural Sciences), 2010, (5): 96-102.
Citation: JING Han-xing, CHEN Shao-hong, YU Kun. Automatic web data extraction based on tree alignment[J]. Journal of East China Normal University (Natural Sciences), 2010, (5): 96-102.

Automatic web data extraction based on tree alignment

More Information
  • Corresponding author: CHEN Shao-hong
  • Received Date: 2010-03-01
  • Rev Recd Date: 2010-06-01
  • Publish Date: 2010-09-25
  • This paper proposed a new tree alignment algorithm for determining the optimal matching structure of the input web pages, in order to extract web data automatically. Based on the alignment, the trees were merged into one union tree whose nodes record statistical information obtained from multiple web pages. The algorithm detects repeating patterns on the union tree, and a wrapper built on the most probable content block and the repeating patterns extracts data from web pages. Experimental results showed that the proposed algorithm achieves high extraction accuracy and has steady performance.
  • loading
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索
    Article views (2490) PDF downloads(1195) Cited by()
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return