A Semantic Annotation Pipeline towards the Generation of Knowledge Graphs in Tribology
Within the domain of tribology, enterprises and research institutions are constantly working on new concepts, materials, lubricants, or surface technologies for a wide range of applications. This is also reflected in the continuously growing number of publications, which in turn serve as guidance and benchmark for researchers and developers. Due to the lack of suited data and knowledge bases, knowledge acquisition and aggregation is still a manual process involving the time-consuming review of literature. Therefore, semantic annotation and natural language processing (NLP) techniques can decrease this manual effort by providing a semi-automatic support in knowledge acquisition. The generation of knowledge graphs as a structured information format from textual sources promises improved reuse and retrieval of information acquired from scientific literature. Motivated by this, the contribution introduces a novel semantic annotation pipeline for generating knowledge in the domain of tribology. The pipeline is built on Bidirectional Encoder Representations from Transformers (BERT)—a state-of-the-art language model—and involves classic NLP tasks like information extraction, named entity recognition and question answering. Within this contribution, the three modules of the pipeline for document extraction, annotation, and analysis are introduced. Based on a comparison with a manual annotation of publications on tribological model testing, satisfactory performance is verified.