Heterogeneous Graph-Augmented Contrastive Learning for Extreme Multi-Class Fiqh Classification

Ali A. Jalil, University of AlKafeelFollow

Abstract

Background/Introduction: Fine-grained text classification in the field of Islamic Jurisprudence (Fiqh) is difficult because of the structural interdependence of the legal concepts and the extremely multi-class long-tail data distribution (667 classes with 5,979 samples, 52.2% of which contain less than 5 samples). The main problem with traditional flat classifiers is that they assume that target classes are independent and orthogonal output neurons which discards very important relational semantics.
Objectives: This paper seeks to remediate this extreme imbalance and maintain structural taxonomy by modeling the structural space of classification label space itself as an object to be learned, while giving a high-resource neighbor a statistical advantage.
Models Used: We propose HeG-Fiqh, a novel dual-encoder retrieval-based framework that integrates a Supervised Contrastive (SupCon) fine-tuned AraBERT text encoder with a Heterogeneous Graph Attention Network (HAN) label encoder. A custom Fiqh Knowledge Graph (HKG) is dynamically constructed from text centroids (semantic lateral edges) and historical domain taxonomies (hierarchical edges) using only training data to guarantee zero data leakage. Both text and graph-encoded label embeddings are projected and aligned in a shared latent metric space. Classification is executed via Maximum Inner Product Search (MIPS) at inference.
Results: The extensive evaluations conducted using stratified 5-fold cross validation protocol demonstrate the Top-1 Accuracy of HeG-Fiqh being 93.42 ± 0.15% and a Macro-F1 of 90.32 ± 0.21%, outperforming flat AraBERT by 25.62% in Accuracy and 54.93% in Macro-F1 (which achieves a Macro-F1 of 35.39 ± 0.40%). Notably, on rare categories with four or fewer training examples, HeG-Fiqh increases the Macro-F1 score from 12.4% (flat AraBERT) to 87.6%. These performance improvements are highly statistically significant ( under McNemar’s test).
Conclusion: Modeling the label space explicitly via heterogeneous graph attention neural networks provides strong representational borrowing for low-resource classes, offering a robust and scalable solution for extreme multi-class classification in structured, highly domain-specific knowledge regions.

Recommended Citation

Jalil, Ali A. (2026) "Heterogeneous Graph-Augmented Contrastive Learning for Extreme Multi-Class Fiqh Classification," Al-Bahir: Vol. 9: Iss. 1, Article 10.
Available at: https://doi.org/10.55810/2313-0083.1141

References

[1] Antoun W, Baly F, Hajj H. AraBERT: transformer-Based model for Arabic language understanding. In: Proceedings of the 4th workshop on open-source Arabic corpora and processing tools (OSACT); 2020. p. 9—15. URL: ACL Anthology | arXiv: 2003.00104.

[2] Abdul-Mageed M, Elmadany A, El-Din EMB. Arbert & MARBERT: deep bidirectional Transformers for Arabic NLU. In: Proceedings of the 59th annual meeting of the Association for computational linguistics (ACL); 2021. p. 7088—105. https://doi.org/10.18653/v1/2021.acl-long.551 | URL: ACL Anthology.

[3] Habash N. Introduction to Arabic natural language processing. Morgan & Claypool Publishers; 2010. https://doi. org/10.2200/S00277ED1V01Y201008HLT010.

[4] Bhatia K, Jain H, Kar P, Jain P, Varma M. Sparse local embeddings for extreme multi-label classification. In: Advances in neural information processing systems (NeurIPS); 2015. p. 730—8. URL: NeurIPS Proceeding.

[5] Chalkidis I, Androutsopoulos A, Aletras N. Neural legal judgment prediction in English. In: Proceedings of the 57th annual meeting of the Association for computational linguistics. ACL); 2019. p. 4310—23. https://doi.org/10.18653/ v1/P19-1424URL:ACL Anthology.

[6] Inoue G, Alhafni B, Baimukan N, Bouamor H, Habash N. The interplay of variant, size, and task type in Arabic pretrained language models. In: Proceedings of the sixth Arabic natural Language processing workshop (WANLP); 2021. p. 92—104. URL: ACL Anthology.

[7] Sengupta N, et al. Jais and Jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models. arXiv preprint arXiv:2308.16149 2023. https:// doi.org/10.48550/arXiv.2308.16149 | arXiv: 2308.16149.

[8] Huang H, et al. AceGPT: localizing large Language models in Arabic. In: Proceedings of the 2024 conference of the north American chapter of the Association for computational linguistics: human Language technologies (NAACLHLT); 2024. p. 8139—63. https://doi.org/10.18653/v1/2024. naacl-long.450 | arXiv: 2309.12053.

[9] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pretraining of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north American chapter of the association for computational linguistics: human Language technologies (NAACL-HLT); 2019. p. 4171—86. https://doi.org/10.18653/ v1/N19-1423URL:ACL Anthology.

[10] AlSabban WH, Alotaibi SS, Farag AT, Rakha OE, Al Sallab AA, Alotaibi M. Automatic categorization of Islamic jurisprudential legal questions using hierarchical deep learning text classifier. International Journal of Computer Science and Network Security 2021;21(2):203—12. URL: IJCSNS.

[11] Prabhu Y, Kag A, Gopinath K, Varma M. Parabel: partitioned label trees for extreme classification with application to dynamic search advertising. In: Proceedings of the 2018 world wide web conference (WWW); 2018. p. 993—1002. https://doi.org/10.1145/3178876.3185998.

[12] You R, Zhang Z, Wang X, Sun L, Mamitsuka H, Zhu S. AttentionXML: label tree-based attention-aware deep model for high-performance extreme multi-label text classification. Adv Neural Inf Process Syst 2019;32:9433—43. arXiv: 1811.01727.

[13] Jiang T, Wang D, Sun L, Yang H, Zhao Z, Zhuang F. LightXML: transformer with dynamic negative sampling for high-performance extreme multi-label text classification. Proc AAAI Conf Artif Intell 2021;35(9):7987—94. https://doi. org/10.1609/aaai.v35i9.16974.

[14] Zhou J, Ma C, Long D, Zhang T, Xu G, Zhang R. HierarchyAware global model for hierarchical text classification. In: Proceedings of the 58th annual meeting of the association for computational linguistics (ACL); 2020. p. 1106—17. https://doi.org/10.18653/v1/2020. acl-main.104 | URL: ACL Anthology.

[15] Xiao L, Huang X, Chen B, Jing L. Label-Specific document representation for multi-label text classification. In: Proceedings of the 2019 conference on empirical methods in natural Language processing and the 9th international joint conference on natural Language processing. EMNLPIJCNLP); 2019. p. 466—75. https://doi.org/10.18653/v1/D19- 1044 | URL: ACL Anthology.

[16] Kipf TN, Welling M. Semi-Supervised classification with graph convolutional networks. In: Proceedings of the 5th international conference on learning representations (ICLR); 2017. p. 02907. arXiv: 1609.

[17] Yao L, Mao C, Luo Y. Graph convolutional networks for text classification. Proc AAAI Conf Artif Intell 2019;33:7370—7. https://doi.org/10.1609/aaai.v33i01.33017370.

[18] Chen ZM, Wei XS, Wang P, Guo Y. Multi-Label image recognition with graph convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR); 2019. p. 5177—86. https:// doi.org/10.1109/CVPR.2019.00532.

[19] Wang X, Ji H, Shi C, Wang B, Cui P, Yu PS, Ye Y. Heterogeneous graph attention network. In: Proceedings of the 2019 world wide web conference (WWW); 2019. p. 2022—32. https://doi.org/10.1145/3308558.3313562.

[20] Hu L, Yang T, Shi C, Ji H, Li X. Heterogeneous graph attention networks for semi-supervised short text classification. In: Proceedings of the 2019 conference on empirical methods in natural Language processing and the 9th international joint conference on natural Language processing. EMNLP-IJCNLP); 2019. p. 4839—48. https://doi.org/ 10.18653/v1/D19-1488 | URL: ACL Anthology.

[21] Park J-H, Choi H-S, Jo S, Kim J. Disease prediction with heterogeneous graph of electronic health records and toxicogenomics data. In: Proceedings of the 12th international conference on data science, technology and applications (DATA); 2023. p. 310—8. https://doi.org/10.5220/0012096400 003541.

[22] Azmi AM, Al-Qabbany AO, Hussain A. Computational and natural language processing based studies of hadith literature: a survey. Artif Intell Rev 2019;52:1369—414. https://doi. org/10.1007/s10462-018-9635-y.

[23] Malhas R, Mansour W, Elsayed T. Qur’an QA 2022: overview of the first shared task on question answering over the Holy qur’an. In: Proceedings of the 5th workshop on opensource Arabic Corpora and processing tools (OSACT5) at LREC; 2022. p. 79—87. URL: ACL Anthology.

[24] Bashir MH, Azmi AM, Shakir M, Al-Khatib W. Arabic natural language processing for qur’anic research: a systematic review. Artif Intell Rev 2023;56:4801—52. https://doi.org/ 10.1007/s10462-022-10313-2.

[25] Malhas R, Elsayed T. AyaTEC: building a reusable versebased test collection for Arabic question answering on the Holy Qur’an. ACM Trans Asian Low-Resour Lang Inf Process 2022;21(6):1—25. https://doi.org/10.1145/3519313.

[26] Al-Kabi MN, Kanaan G, Al-Shalabi R, Al-Sinjilawi SI, AlMustafa RS. Al-Hadith text classifier. J Appl Sci 2005;5(3): 584—7. https://doi.org/10.3923/jas.2005.584.587.

[27] Altammami S, Atwell E, Alsalka A. Constructing a bilingual Hadith Corpus using a segmentation tool. In: Proceedings of the twelfth Language resources and evaluation conference (LREC); 2020. p. 3432—8. URL: ACL Anthology.

[28] Aleid HA, Azmi AM. Hajj-FQA: a benchmark Arabic dataset for developing question-answering systems on Hajj fatwas. J King Saud Univ Comput Inf Sci 2025;37(135). https://doi. org/10.1016/j.jksuci.2025.101850, article no..

[29] Abdelaal A, Al Haffar MN, Fawzi M, Magdy W. IslamicMMLU: a benchmark for evaluating LLMs on Islamic knowledge. arXiv preprint arXiv:2603.23750 2026. https:// doi.org/10.48550/arXiv.2603.23750. arXiv: 2603.23750.

[30] Alyemny O, Al-Khalifa H, Mirza A. A data-driven exploration of a new Islamic fatwas dataset for Arabic NLP tasks. Data 2023;8(10):153. https://doi.org/10.3390/data8100153.

[31] Al-Qahtani MA, Alkhamees BF, Ykhlef M. MAFQA: a dataset for benchmarking multi-hop Arabic fatwa question answering. Data 2026;11(3):64. https://doi.org/10.3390/ data11030064.

[32] Azmi AM, Alkhalifah F, Alsaeed A, Barnawi Y. Using nonconventional search schemes to retrieve Hadiths. In: Proceedings of the 5th international conference on Arabic Language processing (CITALA); 2014. p. 143—56. URL: ResearchGate.

[33] Elmahjub E, Iqbal W, Mushtaq A, Qadir J, Mubarak H, Darwish K. IslamicLegalBench: evaluating LLMs knowledge and reasoning of Islamic law across 1,200 years of Islamic pluralist legal traditions 2026. https://doi.org/10.48550/ arXiv.2602.21226. arXiv preprint arXiv:2602.21226.

[34] Khosla P, Teterwak P, Wang C, Sarna A, Tian Y, Isola P, Maschinot A, Liu C, Krishnan D. Supervised contrastive learning. In: Advances in neural information processing systems (NeurIPS); 2020. p. 18661—73. arXiv: 2004.11362.

[35] Einea O, Elnagar A, Al-Debsi R. SANAD: Single-Label Arabic news articles dataset for automatic text categorization. Data Brief 2019;25:104076. https://doi.org/10.1016/j. dib.2019.104076.

[36] Al-Anzi FS, AbuZeina D. Toward an enhanced Arabic text classification using cosine similarity and latent semantic indexing. J King Saud Univ Comput Inf Sci 2017;29(2): 189—95. https://doi.org/10.1016/j.jksuci.2016.04.001.

[37] Dietterich TG. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 1998;10(7):1895—923. https://doi.org/10.1162/08997669830 0

Download