LiU Electronic Press
Download:
File size:
432 kb
Format:
application/pdf
Author:
Arpteg, Anders (Linköping University, Department of Computer and Information Science, KPLAB - Knowledge Processing Lab) (Linköping University, The Institute of Technology)
Title:
Adaptive Semi-structured Information Extraction
Department:
Linköping University, Department of Computer and Information Science, KPLAB - Knowledge Processing Lab
Linköping University, The Institute of Technology
Publication type:
Licentiate thesis, monograph (Other academic)
Language:
English
Publisher: Institutionen för datavetenskap
Pages:
85
Series:
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971; 1000
Year of publ.:
2003
URI:
urn:nbn:se:liu:diva-5688
Permanent link:
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-5688
ISBN:
91-7373-589-2
Subject category:
Computer Science
SVEP category:
Computer science
Keywords(en) :
Information extraction, Artificial intelligence, Semi-structured data, Reinforced learning, Knowledge management
Abstract(en) :

The number of domains and tasks where information extraction tools can be used needs to be increased. One way to reach this goal is to construct user-driven information extraction systems where novice users are able to adapt them to new domains and tasks. To accomplish this goal, the systems need to become more intelligent and able to learn to extract information without need of expert skills or time-consuming work from the user.

The type of information extraction system that is in focus for this thesis is semistructural information extraction. The term semi-structural refers to documents that not only contain natural language text but also additional structural information. The typical application is information extraction from World Wide Web hypertext documents. By making effective use of not only the link structure but also the structural information within each such document, user-driven extraction systems with high performance can be built.

The extraction process contains several steps where different types of techniques are used. Examples of such types of techniques are those that take advantage of structural, pure syntactic, linguistic, and semantic information. The first step that is in focus for this thesis is the navigation step that takes advantage of the structural information. It is only one part of a complete extraction system, but it is an important part. The use of reinforcement learning algorithms for the navigation step can make the adaptation of the system to new tasks and domains more user-driven. The advantage of using reinforcement learning techniques is that the extraction agent can efficiently learn from its own experience without need for intensive user interactions.

An agent-oriented system was designed to evaluate the approach suggested in this thesis. Initial experiments showed that the training of the navigation step and the approach of the system was promising. However, additional components need to be included in the system before it becomes a fully-fledged user-driven system.

Note:
Report code: LiU-Tek-Lic-2002:73.
Presentation:
2002-12-15, 00:00 (English)
Supervisor:
Sandewall, Erik (Linköping University, Department of Computer and Information Science, CASL - Cognitive Autonomous Systems Laboratory) (Linköping University, The Institute of Technology)
Kulesza, Wlodek
Available from:
2003-01-30
Created:
2003-01-30
Last updated:
2009-04-27
Statistics:
545 hits
FILE INFORMATION
File size:
432 kb
Mimetype:
application/pdf
Type:
fulltext
Statistics:
731 hits