Abstract:
Text segmentation is an important task of natural language processing (NLP). However, most of the existing works only focus on the overall information of the article or only focus on the local text information and cannot consider the global and local information at the same time. This paper proposes a BERT-based hierarchical adjacent coherence text segmentation model (HAC-BERT), which can model the overall information and local information separately through adding weights while paying attention to both partial information and overall information to achieve better performance. HAC-BERT was trained on large-scale text segmentation corpus and tested on multiple text segmentation datasets in different fields. The experimental results show that the proposed model has good performance and domain adaptability.