( University of Tsukuba, University of Tsukuba )
Keywords: text mining,patent search,compound noun,information retrieval,similarity calculation,morphological analysis
Compound nouns are frequently encountered in the claims of a patent application. We compared the use of compound noun analysis to morphological analysis as a search method for similar documents in patent applications. This paper focused on the claims written in the Jepson format with consideration to Japanese language claims. Our analysis indicated that the co-occurrence frequency between morphemes and compound nouns in claims is significantly different, where the recurrence of compound nouns is significantly less than morphemes. Although this proved to be a useful feature in precision searches, it was necessary to extend the meaning of compound nouns to include a wider range of similar documents. This was accomplished with the construction of a preliminary semantic dictionary. An important feature discovered during the analysis was that the position of a compound noun in a claim affects the meaning of the noun, thus affecting the search results.