Chemical Compounds Knowledge Visualization with Natural Language Processing and Linked Data
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Abstract
Knowledge of chemical compounds is invaliable for developing new materials, new drugs, and so on. Therefore, databases of chemical compounds are being created. For example, CAS, one of the largest databases, includes over 100 million chemical compound information. However, the creation of such databases strongly depends on manual labor since chemical compounds are being produced at every moment. In addition, the database creation mainly focuses on English text. Therefore, in other words, chemical compound information other than English is not good enough to be available. For example, although Japan has one of the largest chemical industries and has large chemical compound information written in Japanese text documents, such information is not exploited well so far. We propose a visualization system based on chemical compound extraction results with Japanese Natural Language Processing and structured databases represented as Linked Data (LD). Figure 1 shows an overview of our system. First, chemical compound names in text are recognized. Then, aliases of chemical compound names are identified. The extraction results and existing chemical compound databases are represented as LD. By combining these LD-based chemical compound knowledge, our system provides different views of chemical compounds.