Abstract: |
AI-based histopathological image analysis has significantly advanced the field of computer-aided diagnosis. While labeled data can enhance model performance, manual annotation by pathologists is labor-intensive and time-consuming, with variability and reliance on coarse slide-level annotations often introducing noise. To address these challenges, we propose introduces BPAL (Beta Mixture Model and Penalized Regression for Active Learning), a novel active learning framework for histopathological whole-slide image analysis. BPAL aims to reduce expert annotation costs and mitigate the impact of noisy samples during training by autonomously managing highly informative samples in each active learning iteration. Our approach integrates two noise detection modules into active learning frameworks. By incorporating Penalized Regression (PR) with parallel computation capabilities into our framework, we enhance the efficiency of noisy sample detection. Leveraging a Beta Mixture Model (BMM) with prior loss knowledge further augments this process by enabling a comprehensive analysis from various angles within the merged feature and label spaces. This approach maximizes the utilization of information extracted from pathological image samples, ensuring a robust and thorough assessment of data quality. We propose a heuristic sampling strategy based on these enhancements. High-information samples identified by the module are categorized into three types: typical samples with high confidence levels that can receive pseudo labels for training, difficult samples requiring expert re-annotation due to complex features, and mislabeled noisy samples. The iterative addition of training sets retains high-information samples while mitigating the impact of noisy samples. Comparative evaluations demonstrate the superior performance of our approach on breast cancer and prostate cancer classification tasks. © 2025 Elsevier Ltd |