What Constitutes a Unit of Analysis in Language?
Over the last decade, the study of multi-word units has become increasingly popular and now these
units seem to have reached a status where they cannot be ignored. This paper should be seen as a
recapituation of the discussion around multi-word units, especially focusing on corpus evidence of
such units and what is perceived to be relevant findings. An unsupervised method for extracting
multi-word units from corpora is presented and the findings examined. However, rather than
evaluating the results, this article will raise the question of what constitutes a 'good'
multi-word unit? The article does not claim to give any conclusive answers but perhaps
instead posing a few relevant questions.