The present thesis addresses machine learning in a domain of naturallanguage phrases that are names of universities. It describes two approaches to this problem and a software implementation that has made it possible to evaluate them and to compare them.
In general terms, the system's task is to learn to 'understand' the significance of the various components of a university name, such as the city or region where the university is located, the scienti c disciplines that are studied there, or the name of a famous person which may be part of the university name. A concrete test for whether the system has acquired this understanding is when it is able to compose a plausible university name given some components that should occur in the name.
In order to achieve this capability, our system learns the structure of available names of some universities in a given data set, i.e. it acquires a grammar for the microlanguage of university names. One of the challenges is that the system may encounter ambiguities due to multi meaning words. This problem is addressed using a small ontology that is created during the training phase.
Both domain knowledge and grammatical knowledge is represented using decision trees, which is an ecient method for concept learning. Besides for inductive inference, their role is to partition the data set into a hierarchical structure which is used for resolving ambiguities.
The present report also de nes some modi cations in the de nitions of parameters, for example a parameter for entropy, which enable the system to deal with cognitive uncertainties. Our method for automatic syntax acquisition, ADIOS, is an unsupervised learning method. This method is described and discussed here, including a report on the outcome of the tests using our data set.
The software that has been implemented and used in this project has been implemented in C.