Abstract:
Urban surveillance cameras, with their large number, high density, and fast transmission capabilities, offer new opportunities for high spatiotemporal resolution rainfall monitoring. However, existing research has mainly focused on daytime rainfall monitoring, with nighttime heavy rainfall monitoring remaining relatively weak. In this study, we analyze the spatiotemporal features of nighttime rainfall videos and propose a deep learning model, Vit-Bi-LSTM, which combines Vision Transformer (ViT) and Bidirectional Long Short-Term Memory (Bi-LSTM) networks, for automatic classification of nighttime heavy rainfall levels. A nighttime rainfall video dataset with a maximum rainfall intensity of 80 mm·h
−1 was constructed from 4,500 clips totaling 37.5 hours, which included 41 nighttime rainfall videos collected between 2022 and 2025. Experimental results show that the joint modeling of spatiotemporal features significantly improves the accuracy of rainfall level classification, with the two-layer Bi-LSTM structure achieving an accuracy of 85.6% on the self-constructed dataset. Additionally, the model achieved an accuracy of 76.7% in field observations of two nighttime heavy rainfall events, demonstrating its effectiveness in real-world environments. Although the model showed high accuracy (80.7%–90.6%) for rainfall intensities below 20 mm·h
−1, the accuracy decreased to 68.8%–75.0% for rainfall intensities above 40 mm·h
−1, which still needs further improvement.