liu.seSearch for publications in DiVA
ReferencesLink to record
Permanent link

Direct link
Adaptive Semi-structured Information Extraction
2003 (English)Licentiatavhandling, monografi (Other academic)
Abstract [en]

The number of domains and tasks where information extraction tools can be used needs to be increased. One way to reach this goal is to construct user-driven information extraction systems where novice users are able to adapt them to new domains and tasks. To accomplish this goal, the systems need to become more intelligent and able to learn to extract information without need of expert skills or time-consuming work from the user.

The type of information extraction system that is in focus for this thesis is semistructural information extraction. The term semi-structural refers to documents that not only contain natural language text but also additional structural information. The typical application is information extraction from World Wide Web hypertext documents. By making effective use of not only the link structure but also the structural information within each such document, user-driven extraction systems with high performance can be built.

The extraction process contains several steps where different types of techniques are used. Examples of such types of techniques are those that take advantage of structural, pure syntactic, linguistic, and semantic information. The first step that is in focus for this thesis is the navigation step that takes advantage of the structural information. It is only one part of a complete extraction system, but it is an important part. The use of reinforcement learning algorithms for the navigation step can make the adaptation of the system to new tasks and domains more user-driven. The advantage of using reinforcement learning techniques is that the extraction agent can efficiently learn from its own experience without need for intensive user interactions.

An agent-oriented system was designed to evaluate the approach suggested in this thesis. Initial experiments showed that the training of the navigation step and the approach of the system was promising. However, additional components need to be included in the system before it becomes a fully-fledged user-driven system.

Place, publisher, year, pages
Institutionen för datavetenskap, 2003. 85 p.
Series
Linköping Studies in Science and Technology. Thesis, ISSN 0280-7971 ; 1000
Keyword [en]
Information extraction, Artificial intelligence, Semi-structured data, Reinforced learning, Knowledge management
National Category
Computer Science
Identifiers
urn:nbn:se:liu:diva-5688 (URN)91-7373-589-2 (ISBN)oai:DiVA.org:liu-5688 (OAI)
Presentation
2002-12-15, 00:00 (English)
Supervisors
Note
Report code: LiU-Tek-Lic-2002:73.Available from2003-01-30 Created:2003-01-30 Last updated:2009-04-27

Open Access in DiVA

fulltext(432 kB)732 downloads
File information
File name FULLTEXT01.pdfFile size 432 kBChecksum SHA-1
e5702c06b15b52e7b4564f08fefaa8cf1e2c3fd4390956952340eec4d949469ef3da7d1f
Typ fulltextMimetype application/pdf

Search in DiVA

By author/editor
Arpteg, Anders
By organisation
KPLAB - Knowledge Processing LabThe Institute of Technology
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Totalt: 732 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available
Totalt: 545 hits
ReferencesLink to record
Permanent link

Direct link