Abstract:
The large-scale online service system has a large number of alerts, and the correlations of which are rather complicated, which greatly increases the difficulty of fault diagnosis for operators. To solve this problem, we propose an alert stream compression method based on representation learning. The method included two stages: offline learning stage and online compression stage. In the offline learning stage, the semantic information of the original alert data and the topology information between components were learned and represented through embedding technologies. In the online compression stage, the streaming clustering method was used to associate the alert vectors by representation learning in real-time. Experiments on the synthetic dataset and the real dataset show that the method can meet the real-time and effectiveness requirements of the alert stream compression.