Al-Bahir Journal for Engineering and Pure Sciences


The lungs play a vital role in supplying oxygen to every cell, filtering air to prevent harmful substances, and supporting defense mechanisms. However, they remain susceptible to the risk of diseases such as infections, inflammation, and cancer that affect the lungs. Meta-ensemble techniques are prominent methods used in machine learning to enhance the accuracy of classifier learning systems in making predictions. This work proposes a robust predictive model using a meta-ensemble method to identify high-risk individuals with lung cancer, thereby taking early action to prevent long-term problems benchmarked upon the Kaggle Machine Learning practitioners' Lung Cancer Dataset. Three machine learning ensemble models—Random Forest, Adaptive Boosting (AdaBoost), and Gradient Boosting—were used to develop the meta-ensemble models proposed in this paper, whereby the three ensemble models were adopted as base classifiers while one of them was adopted as the meta-classifier. In addition, two of the ensemble models were used as base classifiers, while the third was used as a meta-classifier to evaluate lung cancer risk prediction. Different graphs were evaluated to show that people with these features are liable to develop lung cancer. The proposed model has immensely improved prediction performance. The meta-ensemble models were simulated using the Python simulation environment, and the 5-fold cross-validation technique was used. The model validation was carried out using several known performance evaluation methodologies. The results of the experiments showed that gradient boosting achieved a maximum accuracy of 100%, an area under the curve (AUC), and a precision of 100%. The proposed model was compared with novel machine learning methods and popular state-of-the-art (SOTA) deep learning techniques. It was confirmed from the results that the model in this study had the best accuracy at lung cancer risk prediction. This study's results can be utilized to enhance the performance of actual patient risk prediction systems in the future.


[1] Rahman A, Muniyandi R, Albashish D. Artificial neural network with Taguchi method for robust classification model to improve classification accuracy of breast cancer. 2021. p. 1-27. https://doi.org/10.7717/peerj-cs.344.

[2] Jain S, Nehra M, Kumar R, Dilbaghi N, Hu TY, Kumar S, et al. Internet of Medical Things (IoMT)-integrated biosensors for point-of-care testing of infectious diseases. Biosens Bioelectron 2021;179:113074. https://doi.org/10.1016/j. bios.2021.113074. ISSN 0956-5663.

[3] Ferone G, Lee MC, Sage J, Berns A. Cells of origin of lung cancers: lessons from mouse studies. 2020. p. 1017-32. https://doi.org/10.1101/gad.338228.120.

[4] Suvarchala V, Subbareddy PV, Madala SR. Lung cancer prediction using machine learning methodologiesvol. 8; 2021. p. 1265-72.

[5] Marino P, Mininni M, Deiana G, Marino G, Divella R, Bochicchio I, et al. Healthy Lifestyle and Cancer Risk: Modifiable Risk Factors to Prevent Cancer. Nutrients 2024; 16(6):800.

[6] Li W, Dong S, Wang H, Wu R, Wu H, Tang ZR, et al. Risk analysis of pulmonary metastasis of chondrosarcoma by establishing and validating a new clinical prediction model: a clinical study based on SEER database. BMC Muscoskel Disord 2021;22(1):529.

[7] Danlos FX, Voisin AL, Dyevre V, Michot JM, Routier E, Taillade L, et al. Safety and efficacy of anti-programmed death 1 antibodies in patients with cancer and pre-existing autoimmune or inflammatory disease. Eur J Cancer 2018;91: 21-9.

[8] Zhang JJ, Dong X, Liu GH, Gao YD. Risk and protective factors for COVID-19 morbidity, severity, and mortality. Clin Rev Allergy Immunol 2023;64(1):90-107.

[9] Scobie H. Understanding lung cancer screening participation hannah scobie BSc (Hons), MSc submitted in fulfillment of the requirements for the degree of doctor of philosophy institute of health and wellbeing, college of medical and veterinary life sciences, Univ. 2021.

[10] Lu T, Yang X, Huang Y, Zhao M, Li M, Ma K, et al. Trends in the incidence, treatment, and survival of patients with lung cancer in the last four decades. Cancer Manag Res 2019: 943-53. Jan 21.

[11] Huang J, Mphil YD, Tin MS, Mph VL, Ngai CH, Zhang L, et al. Distribution, Risk Factors, and Temporal Trends for Lung Cancer Incidence and Mortality. Chest 2022;161: 1101-11. https://doi.org/10.1016/j.chest.2021.12.655.

[12] Klein CA. Cancer progression and the invisible phase of metastatic colonization. Nat Rev Cancer 2020;20(11):681-94.

[13] Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA A Cancer J Clin 2018;68(6):394-424.

[14] Nageswaran S, Arunkumar G, Bisht AK, Mewada S, Kumar JNVRS, Jawarneh M, et al. Lung cancer classification and prediction using machine learning and image processing. 2022.

[15] Li W, Liu Y, Liu W, Tang ZR, Dong S, Li W, et al. Machine learning-based prediction of lymph node metastasis among osteosarcoma patients. Front Oncol 2022;12:797103.

[16] Zaman N, Gaur L, Humayun M, editors. Approaches and applications of deep learning in virtual medical care. IGI Global; 2022.

[17] Javaid M, Haleem A, Singh RP, Suman R, Rab S. Significance of machine learning in healthcare: Features, pillars and applications. Int J Intell Netw 2022;3:58-73.

[18] Mokari A, Guo S, Bocklitz T. Exploring the steps of infrared (IR) spectral analysis: Pre-processing,(classical) data modeling, and deep learning. Molecules 2023;28(19):6886.

[19] Jayatilake SM, Ganegoda GU. Involvement of machine learning tools in healthcare decision-making. J Healthc Eng 2021;2021:6679512. 20 pages, https://doi.org/10.1155/2021/ 6679512.

[20] Shehab M, Abualigah L, Shambour Q, Abu-Hashem MA, Shambour MK, Alsalibi AI, et al. Machine learning in medical applications: A review of state-of-the-art methods. Comput Biol Med 2022;145:105458.

[21] Bokefode J, Rao MVP, Komarasamy G. ScienceDirect ScienceDirect Ensemble Deep Learning Models for Lung Cancer Diagnosis in Histopathological Application Images Ensemble Deep Learning Models for Lung Cancer Diagnosis in Histopathological Images. Procedia Comput Sci 2022;215: 471-82. https://doi.org/10.1016/j.procs.2022.12.049.

[22] Nicora G, Rios M, Abu-Hanna A, Bellazzi R. Evaluating pointwise reliability of machine learning prediction. J Biomed Inf 2022;127:103996.

[23] Safiyari A, Javidan R. Predicting lung cancer survivability using ensemble learning methods. 2017. p. 684-8.

[24] Dritsas E, Trigka M. Lung cancer risk prediction with machine learning models. 2022.

[25] Faisal MI, Bashir S, Khan ZS, Khan FH. Trends Eng Sci Technol 2018:1-4.

[26] Setiawan W, Pramudita YD. Mulaab, lung cancer classification using random oversampling and gradient boosted decision tree, vol. 16; 2023. p. 273-9.

[27] Zamzam YF, Saragih TH, Herteno R, Turianto D. Comparison of CatBoost and random forest methods for lung cancer classification using hyperparameter tuning bayesian optimization- based, vol. 6; 2024. p. 125-36. https://doi.org/10. 35882/jeeemi.v6i2.382.

[28] Mamun M, Farjana A, Al Mamun M, Ahammed MS. Lung cancer prediction model using ensemble learning techniques and systematic review analysis. 2022. p. 187-93. https://doi. org/10.1109/AIIOT54504.2022.9817326.

[29] Gregory RH, Roffman DA, Decker R, Deng J. A multiparameterized artificial neural network for lung cancer risk prediction. 2018. p. 1-13.

[30] Hao L, Huang G. An improved AdaBoost algorithm for identification of lung cancer based on electronic nose. Heliyon 2023;9:e13633. https://doi.org/10.1016/j.heliyon.2023. e13633.

[31] Subramanian RR, Mourya RN, Reddy VPT, Reddy BN. Lung cancer prediction using deep learning framework, vol. 13; 2020. p. 154-60.

[32] Bhattacharjee A, Murugan R, Soni B. Ada-GridRF: A Fast and Automated Adaptive Boost Based Grid Search Optimized Random Forest Ensemble model for Lung Cancer Detection. Phys Eng Sci Med 2022;45:981-94. https://doi.org/ 10.1007/s13246-022-01150-2.

[33] Aggarwal P, Marwah N, Kaur R, Mittal A. Lung cancer survival prognosis using a two-stage modeling approach. Multimed Tool Appl 2024:1-28. https://doi.org/10.1007/ s11042-024-18280-2.

[34] Liu S, Yao W. Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection. BMC Bioinf 2022:1-11. https://doi.org/10.1186/s12859-022-04689-9.

[35] Cardis E, Richardson D. Prediction and classification of lung cancer using machine learning techniques prediction and classification of lung cancer using machine learning techniques. 2021. https://doi.org/10.1088/1757-899X/1099/1/ 012059.

[36] Shaik AB, Srinivasan S. A brief survey on random forest ensembles in the classification model. In: International Conference on Innovative Computing and Communications. 2; 2019. p. 253-60.

[37] Mienye ID, Sun Y. A survey of ensemble learning: Concepts, algorithms, applications, and prospects10; 2022. p. 129-49. 99.

[38] Ahmad I, Basheri M, Iqbal MJ, Rahim A. Performance comparison of support vector machine, random forest, and extreme learning machine for intrusion detection30; 2018. p. 33789-95. 6.

[39] Akhiat Y, Manzali Y, Chahhou M, Zinedine A. A new noisy random forest-based method for feature selection. Cybern Inf Technol 2021;21(2):10-28.

[40] Bent_ejac C, Cs€org}o A, Martínez-Mu~noz G. A comparative analysis of gradient boosting algorithms. Artif Intell Rev 2021;54:1937-67.

[41] Zhang Y, Liu J, Shen W. A review of ensemble learning algorithms used in remote sensing applications. Appl Sci 2022; 12(17):8654.

[42] Ribeiro MH, dos Santos Coelho L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl Soft Comput 2020;86: 105837.

[43] Devi TJ, Gopi A. An Efficient Novel Approach for Early Detection of Mental Health Disorders through Distributed Machine Learning Paradigms from Public Societal Communication. Int J Intell Syst Appl Eng 2024;12(2):767-78.

[44] Sathishkumar R, Karthikeyan T, Shamsundar SM. Ensemble Text Classification with TF-IDF Vectorization for Hate Speech Detection in Social Media. In: 2023 International Conference on System, Computation, Automation and Networking (ICSCAN); 2023. p. 1-7.

[45] Wang F, Li Z, He F, Wang R, Yu W, Nie F. Feature learning viewpoint of AdaBoost and a new algorithm, vol. 7; 2019. p. 149890-9.

[46] Mehmood Z, Asghar S. Customizing SVM as a base learner with AdaBoost ensemble to learn from multi-class problems: A hybrid approach AdaBoost-MSVM. Knowl Base Syst 2021; 217:106845.

[47] Baig MM, Awais MM, El-Alfy ES. AdaBoost-bas+ed artificial neural network learning. Neurocomputing 2017;248: 120-6.

[48] Javed A, Zaman M, Uddin MM, Nusrat T. An analysis on python programming language demand and its recent trend in Bangladesh. In: Proceedings of the 2019 8th international conference on computing and pattern recognition; 2019. p. 458-65.

[49] Martelli A, Ravenscroft AM, Holden S, McGuire P. Python in a nutshell. O'Reilly Media, Inc; 2023.

[50] He S, Guo F, Zou Q. MRMD2. 0: a Python tool for machine learning with feature ranking and reduction. Curr Bioinf 2020;15(10):1213-21.

[51] Raschka S, Patterson J. Nolet, Machine learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information 2020;11(4):193.

[52] Heydarian M, Doyle TE, Samavi R. MLCM: multi-label confusion matrix, vol. 10; 2022. p. 19083-95.

[53] AlSlaiman M, Salman MI, Saleh MM, Wang B. Enhancing false negative and positive rates for efficient insider threat detection. Comput Secur 2023;126:103066.

[54] Tharwat A. Classification assessment methods. Appl Comput Inform 2020;17(1):168-92.