Enhancing Clinical Decision Support through Cost Sensitive CNN and Reliability Calibrated Pneumonia Classification

Authors

  • Danang Danang Universitas Sains dan Teknologi Komputer
  • Toni Wijanarko Adi Putra Universitas Sains dan Teknologi Komputer

DOI:

https://doi.org/10.57214/jusika.v9i1.1126

Keywords:

Chest X-Ray, Cost-Sensitive Learning, Pneumonia, Probability Calibration, Temperature Scaling

Abstract

Pneumonia detection from chest X-ray images is widely used in computer-aided diagnostic systems. However, effective clinical decision support requires not only accurate classification performance but also consideration of unequal error costs, since false negative predictions may lead to more severe consequences than false positives. In addition, prediction probabilities must be well calibrated to support threshold-based medical decisions such as triage and patient escalation. This research investigates asymmetric misclassification costs and probability calibration for binary classification (PNEUMONIA vs. NORMAL) using the Hugging Face dataset hf-vision/chest-xray-pneumonia. The proposed framework utilizes a ResNet-18 architecture integrated with cost-sensitive learning through weighted cross-entropy loss (FN:FP = 5:1), threshold optimization based on validation data to reduce expected cost, and post-hoc temperature scaling for improving probability calibration. Experimental results on the independent test set indicate that the cost-sensitive approach enhances specificity and decreases expected cost compared to the conventional cross-entropy baseline. Furthermore, temperature scaling improves the reliability of probabilistic predictions, as demonstrated by better negative log-likelihood and Brier score values. The study also explores selective prediction strategies to balance prediction coverage and risk reduction, complemented by Grad-CAM visualizations and structured failure-case analysis for qualitative assessment. Overall, the findings demonstrate that incorporating cost-aware decision thresholds and calibrated probability estimates can serve as lightweight yet effective enhancements for chest X-ray classification systems in clinical decision-support applications.

References

Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1–3

Chow, C. K. (1970). On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory, 16(1), 41–46. https://doi.org/10.1109/TIT.1970.1054406

Cui, Y., Jia, M., Lin, T.-Y., Song, Y., & Belongie, S. (2019). Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00949

Danang, D., Wahyono, T., Sembiring, I., Wellem, T., & Dzulkefly, N. H. (2025, August). An adaptive framework integrating ML blockchain and TEE for cloud security. In 2025 4th International Conference on Creative Communication and Innovative Technology (ICCIT) (pp. 1–7). IEEE. https://doi.org/10.1109/ICCIT65724.2025.11167152

DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics, 44(3), 837–845. https://doi.org/10.2307/2531595

Elkan, C. (2001). The foundations of cost-sensitive learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI).

Geifman, Y., & El-Yaniv, R. (2017). Selective classification for deep neural networks. In Advances in Neural Information Processing Systems (NeurIPS).

Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning (ICML).

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 770–778). https://doi.org/10.1109/CVPR.2016.90

Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., Seekins, J., Mong, D. A., Halabi, S. S., Sandberg, J. K., Jones, R., Larson, D. B., Langlotz, C. P., Patel, B. N., Lungren, M. P., & Ng, A. Y. (2019). CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence, 33(1), 590–597. https://doi.org/10.1609/aaai.v33i01.3301590

Johnson, A. E. W., Pollard, T. J., Berkowitz, S., Greenbaum, N. R., Lungren, M. P., Deng, C.-Y., Mark, R. G., & Horng, S. (2019). MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data, 6, 317. https://doi.org/10.1038/s41597-019-0322-0

Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G., & King, D. (2019). Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine, 17, 195. https://doi.org/10.1186/s12916-019-1426-2

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR).

Lhoest, Q., del Moral, V., Jernite, Y., Thakur, A., von Platen, P., Patil, S., et al. (2021). Datasets: A community library for natural language processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 175–184). https://doi.org/10.18653/v1/2021.emnlp-demo.21

Lin, T.-Y., Goyal, P., Girshick, R., He, K., & Dollár, P. (2017). Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 2980–2988). https://doi.org/10.1109/ICCV.2017.324

Niculescu-Mizil, A., & Caruana, R. (2005). Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML) (pp. 625–632). https://doi.org/10.1145/1102351.1102430

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., et al. (2019). PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems (NeurIPS), 32.

Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., Lungren, M., & Ng, A. Y. (2017). CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv. https://arxiv.org/abs/1711.05225

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?”: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135–1144). https://doi.org/10.1145/2939672.2939778

Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 618–626). https://doi.org/10.1109/ICCV.2017.74

Shorten, C., & Khoshgoftaar, T. M. (2019). A survey on image data augmentation for deep learning. Journal of Big Data, 6, 60. https://doi.org/10.1186/s40537-019-0197-0

Topol, E. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25, 44–56. https://doi.org/10.1038/s41591-018-0300-7

Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., & Summers, R. M. (2017). ChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly supervised classification and localization of common thorax diseases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2097–2106). https://doi.org/10.1109/CVPR.2017.369

Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 694–699). https://doi.org/10.1145/775047.775151

Downloads

Published

2025-06-30

How to Cite

Danang Danang, & Toni Wijanarko Adi Putra. (2025). Enhancing Clinical Decision Support through Cost Sensitive CNN and Reliability Calibrated Pneumonia Classification. Jurnal Sains Dan Kesehatan, 9(1), 55–78. https://doi.org/10.57214/jusika.v9i1.1126

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.