Arabic Dialect NLP: A Unified Taxonomic, Methodological, and Trend‑Driven Survey
DOI:
https://doi.org/10.70715/jitcai.2026.v3.i2.051Keywords:
Arabic Dialect, Dialect applications, Nature Language Processing (NLP), TaxonomyAbstract
Background: Natural language processing of Arabic dialects has faced significant challenges due to the language's complex nature, while Modern Standard Arabic enjoys widespread support in this field. The rapid spread of social media has also shifted research focus towards Arabic dialects, thus creating a critical need to criticize this massive body of research. Objective: This study aims to provide a survey and application-oriented review of the Arabic Dialectal NLP landscape. The primary goal is to map the relationship between foundational tasks, while benchmarking the resources and methodologies that have defined the field. Participants and Setting: The study analyzes a comprehensive dataset of 400 research articles published between 2020 and 2025. Methods: A survey was conducted, utilizing a multi-taxonomic clustering approach. Research papers were categorized into eight functional clusters. Trends were analyzed by year, geographic focus, and algorithm type (Traditional Machine Learning vs. Deep Learning vs. Transformers and LLMs). Results: The study analysis reveals that Sentiment Analysis category is the dominant application about 32% of the literature, followed by 21% for resource building group. Identification and Code-Switching is 10%. Research output peaked in 2022-2025, marking a definitive shift from traditional machines learning model to Transformer-based architectures like AraBERT and MARBERT. Regional coverage is broad, with a notable trend toward the identification and handling of code-switched text, which has emerged as the current state-of-the-art. Conclusions: The survey demonstrates that dialect identification is no longer a standalone goal but a prerequisite for sentiment and translation systems. The field has progressed notably in many areas, such as SA, but future work must prioritize under-resourced dialects, reproducible benchmarks, and cross-dialect transfer learning, and bond these specific dialectal models with the zero-shot capabilities of generative LLMs.
Downloads
References
[1] M. Mashaabi, S. Al-Khalifa, and H. Al-Khalifa, A Survey of Large Language Models for Arabic Language and its Dialects. 2024.
[2] I. Guellil, H. Saâdane, F. Azouaou, B. Gueni, and D. Nouvel, "Arabic natural language processing: An overview," Journal of King Saud University - Computer and Information Sciences, vol. 33, no. 5, pp. 497-507, 2021/06/01/ 2021, doi: https://doi.org/10.1016/j.jksuci.2019.02.006. DOI: https://doi.org/10.1016/j.jksuci.2019.02.006
[3] A. Ahmed, N. Ali, M. Alzubaidi, W. Zaghouani, A. A. Abd-alrazaq, and M. Househ, "Freely Available Arabic Corpora: A Scoping Review," Computer Methods and Programs in Biomedicine Update, vol. 2, p. 100049, 2022/01/01/ 2022, doi: https://doi.org/10.1016/j.cmpbup.2022.100049. DOI: https://doi.org/10.1016/j.cmpbup.2022.100049
[4] S. Harrat, K. Meftouh, and K. Smaïli, "Machine translation for Arabic dialects (survey)," Information Processing & Management, vol. 56, pp. 262-273, 08/01 2017, doi: 10.1016/j.ipm.2017.08.003. DOI: https://doi.org/10.1016/j.ipm.2017.08.003
[5] I. Hamed, C. Sabty, S. Abdennadher, N. T. Vu, T. Solorio, and N. Habash, "A Survey of Code-switched Arabic NLP: Progress, Challenges, and Future Directions," Abu Dhabi, UAE, January 2025: Association for Computational Linguistics, in Proceedings of the 31st International Conference on Computational Linguistics, pp. 4561-4585. [Online]. Available: https://aclanthology.org/2025.coling-main.307/.
[6] A. Shoufan and S. Alameri, "Natural Language Processing for Dialectical Arabic: A Survey," Beijing, China, July 2015: Association for Computational Linguistics, in Proceedings of the Second Workshop on Arabic Natural Language Processing, pp. 36-48, doi: 10.18653/v1/W15-3205. [Online]. Available:https://doi.org/10.18653/v1/W15-3205 DOI: https://doi.org/10.18653/v1/W15-3205
[7] M. J. Althobaiti, Automatic Arabic Dialect Identification Systems for Written Texts: A Survey. 2020.
[8] A. Elnagar, S. M. Yagi, A. B. Nassif, I. Shahin, and S. A. Salloum, "Systematic Literature Review of Dialectal Arabic: Identification and Detection," IEEE Access, vol. 9, pp. 31010-31042, 2021, doi: 10.1109/ACCESS.2021.3059504. DOI: https://doi.org/10.1109/ACCESS.2021.3059504
[9] Y. Matrane, F. Benabbou, and N. Sael, "A systematic literature review of Arabic dialect sentiment analysis," Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 6, p. 101570, 2023/06/01/ 2023, doi: https://doi.org/10.1016/j.jksuci.2023.101570. DOI: https://doi.org/10.1016/j.jksuci.2023.101570
[10] A. Elsaid, A. Mohammed, L. Fattouh, and M. Sakre, "A Comprehensive Review of Arabic Text Summarization," IEEE Access, vol. 10, pp. 1-1, 01/01 2022, doi: 10.1109/ACCESS.2022.3163292. DOI: https://doi.org/10.1109/ACCESS.2022.3163292
[11] A. Dhouib, A. Othman, O. El Ghoul, M. K. Khribi, and A. Al Sinani, "Arabic Automatic Speech Recognition: A Systematic Literature Review," Applied Sciences, vol. 12, no. 17, p. 8898doi: 10.3390/app12178898. DOI: https://doi.org/10.3390/app12178898
[12] H. Rahab, A. Zitouni, and M. Djoudi, "Arabic Fake News and Spam Handling: Methods, Resources and Opportunities," in 2021 International Conference on Artificial Intelligence for Cyber Security Systems and Privacy (AI-CSP), 20-21 Nov. 2021 2021, pp. 1-7, doi: 10.1109/AI-CSP52968.2021.9671174. DOI: https://doi.org/10.1109/AI-CSP52968.2021.9671174
[13] M. Labied and A. Belangour, "Moroccan Dialect “Darija” Automatic Speech Recognition: A Survey," in 2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML), 16-18 July 2021 2021, pp. 208-213, doi: 10.1109/PRML52754.2021.9520690. DOI: https://doi.org/10.1109/PRML52754.2021.9520690
[14] S. Brachemi-Meftah and F. Barigou, "Algerian Dialect Sentiment Analysis: Sate of Art," 2020 21st International Arab Conference on Information Technology (ACIT), pp. 1-7, 2020. DOI: https://doi.org/10.1109/ACIT50332.2020.9300060
[15] H. Hejazi and A. Khamees, "Opinion mining for Arabic dialect in social media data fusion platforms: A systematic review," Fusion: Practice and Applications, vol. 9, pp. 08-28, 01/01 2022, doi: 10.54216/FPA.090101. DOI: https://doi.org/10.54216/FPA.090101
[16] F. Alqahtani and M. Dohler, "Survey of Authorship Identification Tasks on Arabic Texts," ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 22, no. 4, apr, articleno = 93 , numpages = 24 2023, doi: 10.1145/3564156. DOI: https://doi.org/10.1145/3564156
[17] M. Al-Ayyoub, A. Nuseir, K. Alsmearat, Y. Jararweh, and B. Gupta, "Deep learning for Arabic NLP: A survey," Journal of Computational Science, vol. 26, pp. 522-531, 2018/05/01/ 2018, doi: https://doi.org/10.1016/j.jocs.2017.11.011. DOI: https://doi.org/10.1016/j.jocs.2017.11.011
[18] K. Darwish et al., "A Panoramic Survey of Natural Language Processing in the Arab World," Communications of the ACM, vol. 64, no. 4, pp. 72-81, 2021, doi: 10.1145/3447735. DOI: https://doi.org/10.1145/3447735
[19] I. Abu Farha and W. Magdy, "A comparative study of effective approaches for Arabic sentiment analysis," Information Processing & Management, vol. 58, no. 2, p. 102438, 2021, doi: 10.1016/j.ipm.2020.102438. DOI: https://doi.org/10.1016/j.ipm.2020.102438
[20] S. Al Katat, I. Bensalem, P. Rosso, and S. Chikhi, "Natural Language Processing for Arabic Sentiment Analysis: A Systematic Literature Review," IEEE Transactions on Big Data, vol. 10, no. 5, 2024, doi: 10.1109/TBDATA.2024.3363633. DOI: https://doi.org/10.1109/TBDATA.2024.3366083
[21] A. Dahou et al., "A Survey on Dialect Arabic Processing and Analysis: Recent Advances and Future Trends," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 24, 07/03 2025, doi: 10.1145/3747290. DOI: https://doi.org/10.1145/3747290
[22] A. M. Alayba, "Arabic Natural Language Processing (NLP): A Comprehensive Review of Challenges, Techniques, and Emerging Trends," Computers, vol. 14, no. 11, p. 497, 2025, doi: 10.3390/computers14110497. DOI: https://doi.org/10.3390/computers14110497
[23] L. Moudjari and K. Akli-Astouati, "An Experimental Study on Sentiment Classification of Algerian Dialect Texts," Procedia Computer Science, vol. 176, pp. 1151-1159, 2020/01/01/ 2020, doi: https://doi.org/10.1016/j.procs.2020.09.111. DOI: https://doi.org/10.1016/j.procs.2020.09.111
[24] R. H. Aljuhani, A. Alshutayri, and S. Alahdal, "Arabic Speech Emotion Recognition From Saudi Dialect Corpus," IEEE Access, vol. 9, pp. 127081-127085, 2021, doi: 10.1109/ACCESS.2021.3110992. DOI: https://doi.org/10.1109/ACCESS.2021.3110992
[25] H. Alhammi and K. Haddar, "Building a Libyan Dialect Lexicon-Based Sentiment Analysis System Using Semantic Orientation of Adjective-Adverb Combinations," International Journal of Computer Theory and Engineering, vol. 12, pp. 145-150, 01/01 2020, doi: 10.7763/IJCTE.2020.V12.1280. DOI: https://doi.org/10.7763/IJCTE.2020.V12.1280
[26] K. Bousmaha, K. Hamadouche, I. Gourara, and L. Belguith, "DZ-OPINION: Algerian Dialect Opinion Analysis Model with Deep Learning Techniques," Revue d'Intelligence Artificielle, vol. 36, pp. 897-903, 12/31 2022, doi: 10.18280/ria.360610. DOI: https://doi.org/10.18280/ria.360610
[27] K. Lounnas, M. Abbas, M. Lichouri, M. Hamidi, H. Satori, and H. Teffahi, "Enhancement of spoken digits recognition for under-resourced languages: case of Algerian and Moroccan dialects," International Journal of Speech Technology, vol. 25, no. 2, pp. 443-455, 2022/06/01 2022, doi: 10.1007/s10772-022-09971-y. DOI: https://doi.org/10.1007/s10772-022-09971-y
[28] A. Slim, A. Melouah, U. Faghihi, and K. Sahib, "Improving Neural Machine Translation for Low Resource Algerian Dialect by Transductive Transfer Learning Strategy," Arabian Journal for Science and Engineering, vol. 47, no. 8, pp. 10411-10418, 2022/08/01 2022, doi: 10.1007/s13369-022-06588-w. DOI: https://doi.org/10.1007/s13369-022-06588-w
[29] S. Mihi et al., "MSTD: Moroccan Sentiment Twitter Dataset," International Journal of Advanced Computer Science and Applications, vol. 11, p. 10, 01/01 2020. DOI: https://doi.org/10.14569/IJACSA.2020.0111045
[30] D. Al-Ghadhban and N. Al-Twairesh, "Nabiha: An Arabic Dialect Chatbot," International Journal of Advanced Computer Science and Applications, vol. 11, 01/01 2020, doi: 10.14569/IJACSA.2020.0110357. DOI: https://doi.org/10.14569/IJACSA.2020.0110357
[31] Z. Muhammad Zain, "Ranking Beauty Clinics in Riyadh using Lexicon-Based Sentiment Analysis and Multiattribute-Utility Theory," International Journal of Advanced Computer Science and Applications, vol. 11, pp. 66-75, 11/01 2020, doi: 10.14569/IJACSA.2020.0111009. DOI: https://doi.org/10.14569/IJACSA.2020.0111009
[32] M. A. Sghaier and M. Zrigui, "Rule-Based Machine Translation from Tunisian Dialect to Modern Standard Arabic," Procedia Computer Science, vol. 176, pp. 310-319, 2020/01/01/ 2020, doi: https://doi.org/10.1016/j.procs.2020.08.033. DOI: https://doi.org/10.1016/j.procs.2020.08.033
[33] A. Bayazed, O. Torabah, R. Alsulami, D. Alahmadi, and K. Saeedi, "SDCT: Multi-Dialects Corpus Classification for Saudi Tweets," International Journal of Advanced Computer Science and Applications, vol. 11, 01/01 2020, doi: 10.14569/IJACSA.2020.0111128. DOI: https://doi.org/10.14569/IJACSA.2020.0111128
[34] S. Alotaibi, R. Mehmood, I. Katib, O. Rana, and A. Albeshri, "Sehaa: A Big Data Analytics Tool for Healthcare Symptoms and Diseases Detection Using Twitter, Apache Spark, and Machine Learning," Applied Sciences, vol. 10, no. 4, p. 1398doi: 10.3390/app10041398. DOI: https://doi.org/10.3390/app10041398
[35] W. Farhan et al., "Unsupervised dialectal neural machine translation," Information Processing & Management, vol. 57, no. 3, p. 102181, 2020/05/01/ 2020, doi: https://doi.org/10.1016/j.ipm.2019.102181. DOI: https://doi.org/10.1016/j.ipm.2019.102181
[36] A. Fashwan and S. Alansary, "A Morphologically Annotated Corpus and a Morphological Analyzer for Egyptian Arabic," Procedia Computer Science, vol. 189, pp. 203-210, 2021/01/01/ 2021, doi: https://doi.org/10.1016/j.procs.2021.05.084. DOI: https://doi.org/10.1016/j.procs.2021.05.084
[37] I. Guellil et al., "A Semi-supervised Approach for Sentiment Analysis of Arab(ic+izi) Messages: Application to the Algerian Dialect," SN Computer Science, vol. 2, no. 2, p. 118, 2021/02/27 2021, doi: 10.1007/s42979-021-00510-1. DOI: https://doi.org/10.1007/s42979-021-00510-1
[38] A. Slim, A. Melouah, Y. Faghihi, and K. Sahib, "Algerian Dialect Translation Applied on COVID-19 Social Media Comments," in Artificial Intelligence and Renewables Towards an Energy Transition, Cham, M. Hatti, Ed., 2021// 2021: Springer International Publishing, pp. 716-726. DOI: https://doi.org/10.1007/978-3-030-63846-7_68
[39] A. J. Askar and N. Nur, "Annotated Corpus of Mesopotamian-Iraqi Dialect for Sentiment Analysis in Social Media," International Journal of Advanced Computer Science and Applications, vol. 12, 01/01 2021, doi: 10.14569/IJACSA.2021.0120413. DOI: https://doi.org/10.14569/IJACSA.2021.0120413
[40] A. Hussein and I. Moawad, "Arabic Sentiment Analysis for Multi-dialect Text using Machine Learning Techniques," International Journal of Advanced Computer Science and Applications, vol. 12, 01/01 2021, doi: 10.14569/IJACSA.2021.0121286. DOI: https://doi.org/10.14569/IJACSA.2021.0121286
[41] I. Touahri and A. Mazroui, "Enhancement of a multi-dialectal sentiment analysis system by the detection of the implied sarcastic features," Knowledge-Based Systems, vol. 227, p. 107232, 2021/09/05/ 2021, doi: https://doi.org/10.1016/j.knosys.2021.107232. DOI: https://doi.org/10.1016/j.knosys.2021.107232
[42] M. Garouani and J. Kharroubi, "Towards a New Lexicon-Based Features Vector for Sentiment Analysis: Application to Moroccan Arabic Tweets," in Advances in Information, Communication and Cybersecurity, Cham, Y. Maleh, M. Alazab, N. Gherabi, L. a. Tawalbeh, and A. A. Abd El-Latif, Eds., 2022// 2022: Springer International Publishing, pp. 67-76. DOI: https://doi.org/10.1007/978-3-030-91738-8_7
[43] M. A. Djebbi and R. Ouersighni, "TunTap: A Tunisian Dataset for Topic and Polarity Extraction in Social Media," in Computational Collective Intelligence, Cham, N. T. Nguyen, Y. Manolopoulos, R. Chbeir, A. Kozierkiewicz, and B. Trawiński, Eds., 2022// 2022: Springer International Publishing, pp. 507-519. DOI: https://doi.org/10.1007/978-3-031-16014-1_40
[44] T. Omran, B. Sharef, C. Grosan, and Y. Li, "Transfer Learning and Sentiment Analysis of Bahraini Dialects Sequential Text Data Using Multilingual Deep Learning Approach," SSRN Electronic Journal, 01/01 2022, doi: 10.2139/ssrn.4111929. DOI: https://doi.org/10.2139/ssrn.4111929
[45] H. Al-Khalifa, L. Aldhubayi, F. Alzahrani, R. Alrowais, S. Alowa, and H. qawara, A Dataset for Detecting Humor in Arabic Text. 2022.
[46] A. C. Mazari and H. Kheddar, "Deep Learning-based Analysis of Algerian Dialect Dataset Targeted Hate Speech, Offensive Language and Cyberbullying," International Journal of Computing and Digital Systems, vol. 13, pp. 965-972, 04/16 2023, doi: 10.12785/ijcds/130177. DOI: https://doi.org/10.12785/ijcds/130177
[47] K. Lounnas, M. Lichouri, and M. Abbas, "Analysis of the Effect of Audio Data Augmentation Techniques on Phone Digit Recognition For Algerian Arabic Dialect," in 2022 International Conference on Advanced Aspects of Software Engineering (ICAASE), 17-18 Sept. 2022 2022, pp. 1-5, doi: 10.1109/ICAASE56196.2022.9931574. DOI: https://doi.org/10.1109/ICAASE56196.2022.9931574
[48] T. Alqurashi, "Applying a Character-Level Model to a Short Arabic Dialect Sentence: A Saudi Dialect as a Case Study," Applied Sciences, vol. 12, no. 23, p. 12435doi: 10.3390/app122312435. DOI: https://doi.org/10.3390/app122312435
[49] A. Masmoudi, C. Aloulou, A. G. S. Abdellahi, and L. H. Belguith, "Automatic diacritization of Tunisian dialect text using SMT model," International Journal of Speech Technology, vol. 25, no. 1, pp. 89-104, 2022/03/01 2022, doi: 10.1007/s10772-021-09864-6. DOI: https://doi.org/10.1007/s10772-021-09864-6
[50] A. Safieh, I. Alhaol, and G. Rawan, "End-to-end Jordanian dialect speech-to-text self-supervised learning framework," Frontiers in Robotics and AI, vol. 9, 12/22 2022, doi: 10.3389/frobt.2022.1090012. DOI: https://doi.org/10.3389/frobt.2022.1090012
[51] M. AbdelHamid, A. Jafar, and Y. Rahal, "Levantine hate speech detection in twitter," Social Network Analysis and Mining, vol. 12, no. 1, p. 121, 2022/08/29 2022, doi: 10.1007/s13278-022-00950-4. DOI: https://doi.org/10.1007/s13278-022-00950-4
[52] M. Garouani and J. Kharroubi, "MAC: An Open and Free Moroccan Arabic Corpus for Sentiment Analysis," in Innovations in Smart Cities Applications Volume 5, Cham, M. Ben Ahmed, A. A. Boudhir, İ. R. Karaș, V. Jain, and S. Mellouli, Eds., 2022// 2022: Springer International Publishing, pp. 849-858. DOI: https://doi.org/10.1007/978-3-030-94191-8_68
[53] R. Tachicart and K. Bouzoubaa, "Moroccan Arabic vocabulary generation using a rule-based approach," Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 10, Part A, pp. 8538-8548, 2022/11/01/ 2022, doi: https://doi.org/10.1016/j.jksuci.2021.02.013. DOI: https://doi.org/10.1016/j.jksuci.2021.02.013
[54] S. Hajbi, S. ChHajbi, Y. ihab, R. Ed-Dali, and R. Korchiyne, "Natural Language Processing Based Approach to Overcome Arabizi and Code Switching in Social Media Moroccan Dialect," in Advances in Information, Communication and Cybersecurity, Cham, Y. Maleh, M. Alazab, N. Gherabi, L. a. Tawalbeh, and A. A. Abd El-Latif, Eds., 2022// 2022: Springer International Publishing, pp. 57-66. DOI: https://doi.org/10.1007/978-3-030-91738-8_6
[55] A. Emna, S. Kchaou, and R. Boujelban, "Neural Machine Translation of Low Resource Languages: Application to Transcriptions of Tunisian Dialect," in Intelligent Systems and Pattern Recognition, Cham, A. Bennour, T. Ensari, Y. Kessentini, and S. Eom, Eds., 2022// 2022: Springer International Publishing, pp. 234-247. DOI: https://doi.org/10.1007/978-3-031-08277-1_20
[56] J. Younes, H. Achour, E. Souissi, and A. Ferchichi, "Romanized Tunisian dialect transliteration using sequence labelling techniques," Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 3, pp. 982-992, 2022/03/01/ 2022, doi: https://doi.org/10.1016/j.jksuci.2020.03.008. DOI: https://doi.org/10.1016/j.jksuci.2020.03.008
[57] A. Mekki, I. Zribi, M. Ellouze, and L. H. Belguith, "Sentence boundary detection of various forms of Tunisian Arabic," Language Resources and Evaluation, vol. 56, no. 1, pp. 357-385, 2022/03/01 2022, doi: 10.1007/s10579-021-09538-4. DOI: https://doi.org/10.1007/s10579-021-09538-4
[58] F. Husain, H. Al-Ostad, and H. Omar, A Weak Supervised Transfer Learning Approach for Sentiment Analysis to the Kuwaiti Dialect. 2022, pp. 161-173. DOI: https://doi.org/10.18653/v1/2022.wanlp-1.15
[59] A. C. Mazari and A. Djeffal, "Sentiment Analysis of Algerian Dialect Using Machine Learning and Deep Learning with Word2vec," Informatica, vol. 46, 07/29 2022, doi: 10.31449/inf.v46i6.3340. DOI: https://doi.org/10.31449/inf.v46i6.3340
[60] A. A. Al Shamsi and S. Abdallah, "Sentiment Analysis of Emirati Dialect," Big Data and Cognitive Computing, vol. 6, no. 2, p. 57doi: 10.3390/bdcc6020057. DOI: https://doi.org/10.3390/bdcc6020057
[61] B. Hdioud and M. E. H. Tirari, "Sentiment Analysis of Moroccan Dialect Using Deep Learning," in Proceedings of the 5th International Conference on Big Data and Internet of Things, Cham, M. Lazaar, C. Duvallet, A. Touhafi, and M. Al Achhab, Eds., 2022// 2022: Springer International Publishing, pp. 457-466. DOI: https://doi.org/10.1007/978-3-031-07969-6_34
[62] S. Jaballi, S. Zrigui, M. Sghaier, D. Berchech, and M. Zrigui, Sentiment Analysis of Tunisian Users on Social Networks: Overcoming the Challenge of Multilingual Comments in the Tunisian Dialect. 2022, pp. 176-192. DOI: https://doi.org/10.1007/978-3-031-16014-1_15
[63] O. Tirosh-Becker and O. Becker, "TAJA Corpus: Linguistically Tagged Written Algerian Judeo-Arabic Corpus," Journal of Jewish Languages, vol. 10, pp. 24-53, 06/01 2022, doi: 10.1163/22134638-bja10020. DOI: https://doi.org/10.1163/22134638-bja10020
[64] A. Messaoudi, H. Haddad, C. Fourati, M. B. Hmida, A. B. Elhaj Mabrouk, and M. Graiet, "Tunisian Dialectal End-to-end Speech Recognition based on DeepSpeech," Procedia Computer Science, vol. 189, pp. 183-190, 2021/01/01/ 2021, doi: https://doi.org/10.1016/j.procs.2021.05.082. DOI: https://doi.org/10.1016/j.procs.2021.05.082
[65] M. Mhamed et al., "A deep CNN architecture with novel pooling layer applied to two Sudanese Arabic sentiment data sets," Journal of Information Science, vol. 52, no. 1, pp. 285-306, 2026, doi: 10.1177/01655515231188341. DOI: https://doi.org/10.1177/01655515231188341
[66] H. Mahdhaoui, A. Mars, and M. Zrigui, "Active Learning with AraGPT2 for Arabic Named Entity Recognition," in Advances in Computational Collective Intelligence, Cham, N. T. Nguyen et al., Eds., 2023// 2023: Springer Nature Switzerland, pp. 226-236. DOI: https://doi.org/10.1007/978-3-031-41774-0_18
[67] R. Kora and A. Mohammed, "An enhanced approach for sentiment analysis based on meta-ensemble deep learning," Social Network Analysis and Mining, vol. 13, no. 1, p. 38, 2023/03/02 2023, doi: 10.1007/s13278-023-01043-6. DOI: https://doi.org/10.1007/s13278-023-01043-6
[68] Y. Abdelwahab, M. Kholief, and A. A. H. Sedky, "An Experimental Survey of ASA on DL Classifiers Using Multi-dialect Arabic Texts," in Advances in Information and Communication, Cham, K. Arai, Ed., 2023// 2023: Springer Nature Switzerland, pp. 52-64. DOI: https://doi.org/10.1007/978-3-031-28076-4_6
[69] M. J. Althobaiti, "An open-source dataset for arabic fine-grained emotion recognition of online content amid COVID-19 pandemic," Data in Brief, vol. 51, p. 109745, 2023/12/01/ 2023, doi: https://doi.org/10.1016/j.dib.2023.109745. DOI: https://doi.org/10.1016/j.dib.2023.109745
[70] S. AlMuhaideb, Y. AlNegheimish, T. AlOmar, R. AlSabti, M. AlKathery, and G. AlOlyyan, "Analyzing Arabic Twitter-Based Patient Experience Sentiments Using Multi-Dialect Arabic Bidirectional Encoder Representations from Transformers," Computers, Materials and Continua, vol. 76, no. 1, pp. 195-220, 2023/06/09/ 2023, doi: https://doi.org/10.32604/cmc.2023.038368. DOI: https://doi.org/10.32604/cmc.2023.038368
[71] I. Touahri, "AraBERT with GANs for High Performance Fine-Grained Dialect Classification," in Proceedings of the 6th International Conference on Big Data and Internet of Things, Cham, M. Lazaar, E. M. En-Naimi, A. Zouhair, M. Al Achhab, and O. Mahboub, Eds., 2023// 2023: Springer International Publishing, pp. 160-170. DOI: https://doi.org/10.1007/978-3-031-28387-1_15
[72] H. Saleh, A. Mohammad, K. Jafar, M. Solieman, B. Ahmad, and S. Hasan, "Arabic Text-to-Speech Service with Syrian Dialect," in Intelligent Decision Technologies, Singapore, I. Czarnowski, R. J. Howlett, and L. C. Jain, Eds., 2023// 2023: Springer Nature Singapore, pp. 109-127. DOI: https://doi.org/10.1007/978-981-99-2969-6_10
[73] S. M. Alsubhi, A. M. Alhothali, and A. A. AlMansour, "AraBig5: The Big Five Personality Traits Prediction Using Machine Learning Algorithm on Arabic Tweets," IEEE Access, vol. 11, pp. 112526-112534, 2023, doi: 10.1109/ACCESS.2023.3297981. DOI: https://doi.org/10.1109/ACCESS.2023.3297981
[74] M. Abdelhakim, B. Liu, and C. Sun, "Ar-PuFi: A short-text dataset to identify the offensive messages towards public figures in the Arabian community," Expert Systems with Applications, vol. 233, p. 120888, 2023/12/15/ 2023, doi: https://doi.org/10.1016/j.eswa.2023.120888. DOI: https://doi.org/10.1016/j.eswa.2023.120888
[75] A. Benali, M. H. Maaloul, and L. H. Belguith, "Automatic Processing of Algerian Dialect: Corpus Construction and Segmentation," SN Computer Science, vol. 4, no. 5, p. 597, 2023/08/04 2023, doi: 10.1007/s42979-023-02097-1. DOI: https://doi.org/10.1007/s42979-023-02097-1
[76] R. Rachidi, M. A. Ouassil, E. Mouaad, B. Cherradi, S. Hamida, and S. Hassan, "Classifying toxicity in the Arabic Moroccan dialect on Instagram: a machine and deep learning approach," Indonesian Journal of Electrical Engineering and Computer Science, vol. 31, pp. 588-598, 07/01 2023, doi: 10.11591/ijeecs.v31.i1.pp588-598. DOI: https://doi.org/10.11591/ijeecs.v31.i1.pp588-598
[77] S. El Ouahabi, S. El Ouahabi, and E. W. Dadi, "Contribution to the Moroccan Darija sentiment analysis in social networks," Social Network Analysis and Mining, vol. 13, no. 1, p. 138, 2023/10/20 2023, doi: 10.1007/s13278-023-01129-1. DOI: https://doi.org/10.1007/s13278-023-01129-1
[78] H. N. Moussa and A. Mourhir, "DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect," Data in Brief, vol. 48, p. 109234, 2023/06/01/ 2023, doi: https://doi.org/10.1016/j.dib.2023.109234. DOI: https://doi.org/10.1016/j.dib.2023.109234
[79] S. Jaballi, M. J. Hazar, S. Zrigui, H. Nicolas, and M. Zrigui, "Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Tunisian Dialectical Facebook Content During the Spread of the Coronavirus Pandemic," in Advances in Computational Collective Intelligence, Cham, N. T. Nguyen et al., Eds., 2023// 2023: Springer Nature Switzerland, pp. 96-109. DOI: https://doi.org/10.1007/978-3-031-41774-0_8
[80] A. H. Dahou and M. A. Cheragui, "DzNER: A large Algerian Named Entity Recognition dataset," Natural Language Processing Journal, vol. 3, p. 100005, 2023/06/01/ 2023, doi: https://doi.org/10.1016/j.nlp.2023.100005. DOI: https://doi.org/10.1016/j.nlp.2023.100005
[81] S. Nasr, R. Duwairi, and M. Quwaider, "End-to-End Speech Recognition For Arabic Dialects," Arabian Journal for Science and Engineering, vol. 48, no. 8, pp. 10617-10633, 2023/08/01 2023, doi: 10.1007/s13369-023-07670-7. DOI: https://doi.org/10.1007/s13369-023-07670-7
[82] A. A. Al Shamsi and S. Abdallah, "Ensemble Stacking Model for Sentiment Analysis of Emirati and Arabic Dialects," Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 8, p. 101691, 2023/09/01/ 2023, doi: https://doi.org/10.1016/j.jksuci.2023.101691. DOI: https://doi.org/10.1016/j.jksuci.2023.101691
[83] N. T. Mohammed, E. A. Mohammed, and H. H. Hussein, "Evaluating Various Classifiers for Iraqi Dialectic Sentiment Analysis," in Next Generation of Internet of Things, Singapore, R. Kumar, P. K. Pattnaik, and J. M. R. S. Tavares, Eds., 2023// 2023: Springer Nature Singapore, pp. 71-78. DOI: https://doi.org/10.1007/978-981-19-1412-6_6
[84] A. Abdedaiem, A. Dahou, and C. Mohamed Amine, "Fake News Detection in Low Resource Languages using SetFit Framework," Inteligencia Artificial, vol. 26, pp. 178-201, 09/20 2023, doi: 10.4114/intartif.vol26iss72pp178-201. DOI: https://doi.org/10.4114/intartif.vol26iss72pp178-201
[85] S. Kchaou, R. Boujelbane, and L. Belguith, "Hybrid Pipeline for Building Arabic Tunisian Dialect-standard Arabic Neural Machine Translation Model from Scratch," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, 11/02 2022, doi: 10.1145/3568674. DOI: https://doi.org/10.1145/3568674
[86] A. H. Dahou and M. A. Cheragui, "Impact of Normalization and Data Augmentation in NER for Algerian Arabic Dialect," in Modelling and Implementation of Complex Systems, Cham, S. Chikhi, G. Diaz-Descalzo, A. Amine, A. Chaoui, D. E. Saidouni, and M. K. Kholladi, Eds., 2023// 2023: Springer International Publishing, pp. 249-262. DOI: https://doi.org/10.1007/978-3-031-18516-8_18
[87] S. Jamal et al., "In the Identification of Arabic Dialects: A Loss Function Ensemble Learning Based-Approach," in Model and Data Engineering, Cham, P. Fournier-Viger, A. Hassan, and L. Bellatreche, Eds., 2023// 2023: Springer Nature Switzerland, pp. 89-101. DOI: https://doi.org/10.1007/978-3-031-21595-7_7
[88] A. M. Mostafa, M. Aljasir, M. Alruily, A. Alsayat, and M. Ezz, "Innovative Forward Fusion Feature Selection Algorithm for Sentiment Analysis Using Supervised Classification," Applied Sciences, vol. 13, no. 4, p. 2074doi: 10.3390/app13042074. DOI: https://doi.org/10.3390/app13042074
[89] M. Errami, M. A. Ouassil, R. Rachidi, B. Cherradi, S. Hamida, and A. Raihani, "Investigating the Performance of BERT Model for Sentiment Analysis on Moroccan News Comments," in 2023 3rd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), 18-19 May 2023 2023, pp. 1-8, doi: 10.1109/IRASET57153.2023.10152965. DOI: https://doi.org/10.1109/IRASET57153.2023.10152965
[90] H. Xinyuan, N. Verma, B. Odoom, U. Pradeep, M. Wiesner, and S. Khudanpur, JHU IWSLT 2023 Multilingual Speech Translation System Description. 2023, pp. 302-310. DOI: https://doi.org/10.18653/v1/2023.iwslt-1.28
[91] G. Bourahouat, M. Abourezq, and N. Daoudi, "Leveraging Moroccan Arabic Sentiment Analysis Using AraBERT and QARIB," in Innovations in Smart Cities Applications Volume 6, Cham, M. Ben Ahmed, A. A. Boudhir, D. Santos, R. Dionisio, and N. Benaya, Eds., 2023// 2023: Springer International Publishing, pp. 299-310. DOI: https://doi.org/10.1007/978-3-031-26852-6_29
[92] M. Al-Fetyani, M. Al-Barham, G. Abandah, A. Alsharkawi, and M. Dawas, "MASC: Massive Arabic Speech Corpus," in 2022 IEEE Spoken Language Technology Workshop (SLT), 9-12 Jan. 2023 2023, pp. 1006-1013, doi: 10.1109/SLT54892.2023.10022652. DOI: https://doi.org/10.1109/SLT54892.2023.10022652
[93] A. H. Dahou and M. A. Cheragui, "Named Entity Recognition for Algerian Arabic Dialect in Social Media," in 12th International Conference on Information Systems and Advanced Technologies “ICISAT 2022”, Cham, M. R. Laouar, V. E. Balas, B. Lejdel, S. Eom, and M. A. Boudia, Eds., 2023// 2023: Springer International Publishing, pp. 135-145. DOI: https://doi.org/10.1007/978-3-031-25344-7_13
[94] K. Essefar, H. Ait Baha, A. El Mahdaouy, A. El Mekki, and I. Berrada, "OMCD: Offensive Moroccan Comments Dataset," Language Resources and Evaluation, vol. 57, no. 4, pp. 1745-1765, 2023/12/01 2023, doi: 10.1007/s10579-023-09663-2. DOI: https://doi.org/10.1007/s10579-023-09663-2
[95] H. Alostad, S. Dawiek, and H. Davulcu, "Q8VaxStance: Dataset Labeling System for Stance Detection towards Vaccines in Kuwaiti Dialect," Big Data and Cognitive Computing, vol. 7, no. 3, p. 151doi: 10.3390/bdcc7030151. DOI: https://doi.org/10.3390/bdcc7030151
[96] N. Habbat, H. Nouri, H. Anoun, and L. Hassouni, "Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning," Engineering Applications of Artificial Intelligence, vol. 126, p. 106999, 2023/11/01/ 2023, doi: https://doi.org/10.1016/j.engappai.2023.106999. DOI: https://doi.org/10.1016/j.engappai.2023.106999
[97] E. Mouaad, M. A. Ouassil, R. Rachidi, B. Cherradi, S. Hamida, and A. Raihani, "Sentiment Analysis on Moroccan Dialect based on ML and Social Media Content Detection," International Journal of Advanced Computer Science and Applications, vol. 14, pp. 315-325, 04/01 2023, doi: 10.14569/IJACSA.2023.0140347. DOI: https://doi.org/10.14569/IJACSA.2023.0140347
[98] N. Z. Alhazzani, I. M. Al-Turaiki, and S. A. Alkhodair, "Text Classification of Patient Experience Comments in Saudi Dialect Using Deep Learning Techniques," Applied Sciences, vol. 13, no. 18, p. 10305doi: 10.3390/app131810305. DOI: https://doi.org/10.3390/app131810305
[99] T. Omran, B. Sharef, C. Grosan, and Y. Li, "The Impact of Data Augmentation on Sentiment Analysis of Translated Textual Data," in 2023 International Conference on IT Innovation and Knowledge Discovery (ITIKD), 8-9 March 2023 2023, pp. 1-4, doi: 10.1109/ITIKD56332.2023.10099851. DOI: https://doi.org/10.1109/ITIKD56332.2023.10099851
[100] P. Deng, S. Chen, W. Zhang, J. Zhang, and L. Dai, "The USTC’s Dialect Speech Translation System for IWSLT 2023," Toronto, Canada (in-person and online), July 2023: Association for Computational Linguistics, in Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pp. 102-112, doi: 10.18653/v1/2023.iwslt-1.5. [Online]. Available: https://doi.org/10.18653/v1/2023.iwslt-1.5 DOI: https://doi.org/10.18653/v1/2023.iwslt-1.5
[101] K. Y. Zergat, S. A. Selouani, A. Amrouche, Y. Kahil, and T. Merazi-Meksen, "The voice as a material clue: a new forensic Algerian Corpus," Multimedia Tools and Applications, vol. 82, no. 19, pp. 29095-29113, 2023/08/01 2023, doi: 10.1007/s11042-023-14412-2. DOI: https://doi.org/10.1007/s11042-023-14412-2
[102] A. Mekki, I. Zribi, M. Ellouze, and L. H. Belguith, "Tokenization of Tunisian Arabic: A Comparison between Three Machine Learning Models," ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 22, no. 7, p. Article 194, 2023, doi: 10.1145/3599234. DOI: https://doi.org/10.1145/3599234
[103] H. Haddad et al., "TunBERT: Pretraining BERT for Tunisian Dialect Understanding," SN Computer Science, vol. 4, no. 2, p. 194, 2023/02/03 2023, doi: 10.1007/s42979-022-01541-y. DOI: https://doi.org/10.1007/s42979-022-01541-y
[104] H. Hallawi, H. Ragheb, Z. Abdullah, N. Al-Shakarchy, and D. Al-Nasrawi, "User identification based on short text using recurrent deep learning," IAES International Journal of Artificial Intelligence (IJ-AI), vol. 12, pp. 1812-1820, 12/01 2023, doi: 10.11591/ijai.v12.i4.pp1812-1820. DOI: https://doi.org/10.11591/ijai.v12.i4.pp1812-1820
[105] M. A. Almeqren, L. Almuqren, F. Alhayan, A. I. Cristea, and D. Pennington, "Using deep learning to analyze the psychological effects of COVID-19," (in eng), Front Psychol, vol. 14, p. 962854, 2023, doi: 10.3389/fpsyg.2023.962854. DOI: https://doi.org/10.3389/fpsyg.2023.962854
[106] A. Ibrahim, A. Hosseini, H. Helmy, W. Lakhdhar, and A. Serag, Bridging Dialectal Gaps in Arabic Medical LLMs through Model Merging. 2025, pp. 338-346. DOI: https://doi.org/10.18653/v1/2025.arabicnlp-main.27
[107] K. Shaalan, S. Siddiqui, M. Alkhatib, and A. Monem, "Challenges in Arabic Natural Language Processing," 2018, pp. 59-83. DOI: https://doi.org/10.1142/9789813229396_0003
[108] N. A. Ghumeid and M. Essgaer, "Addressing the Libyan Arabic Dialect Identification: A Comparative Study of Ensemble Classification Methods," in 2024 IEEE 4th International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA), 19-21 May 2024 2024, pp. 579-584, doi: 10.1109/MI-STA61267.2024.10599739. DOI: https://doi.org/10.1109/MI-STA61267.2024.10599739
[109] A. Charfi, M. Bessghaier, A. Atalla, R. Akasheh, S. Al-Emadi, and W. Zaghouani, "MARASTA: A Multi-dialectal Arabic Cross-domain Stance Corpus," pp. 11060-11069, May 2024. [Online]. Available: https://aclanthology.org/2024.lrec-main.964/. DOI: https://doi.org/10.63317/39rrei99qj8g
[110] W. M. S. Yafooz, "Enhancing Arabic Dialect Detection on Social Media: A Hybrid Model with an Attention Mechanism," Information, vol. 15, no. 6, p. 316doi: 10.3390/info15060316. DOI: https://doi.org/10.3390/info15060316
[111] H. Elgibreen et al., "An Incremental Approach to Corpus Design and Construction: Application to a Large Contemporary Saudi Corpus," IEEE Access, vol. 9, pp. 88405-88428, 2021, doi: 10.1109/ACCESS.2021.3089924. DOI: https://doi.org/10.1109/ACCESS.2021.3089924
[112] F. Alwajih, G. Bhatia, and M. Abdul-Mageed, Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic. 2024, pp. 320-336. DOI: https://doi.org/10.18653/v1/2024.arabicnlp-1.27
[113] E. Alqulaity, W. Yafooz, A. Alourani, and A. Jaradat, "Arabic Dialect Identification in Social Media: A Comparative Study of Deep Learning and Transformer Approaches," Intelligent Automation & Soft Computing, vol. 39, pp. 1-10, 01/01 2024, doi: 10.32604/iasc.2024.055470. DOI: https://doi.org/10.32604/iasc.2024.055470
[114] A. A. Alsuwaylimi, "Arabic dialect identification in social media: A hybrid model with transformer models and BiLSTM," Heliyon, vol. 10, no. 17, p. e36280, 2024/09/15/ 2024, doi: https://doi.org/10.1016/j.heliyon.2024.e36280. DOI: https://doi.org/10.1016/j.heliyon.2024.e36280
[115] F. Qarah and T. Alsanoosy, "Evaluation of Arabic Large Language Models on Moroccan Dialect," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 22478-22485, 06/04 2025, doi: 10.48084/etasr.10331. DOI: https://doi.org/10.48084/etasr.10331
[116] A. Alabdullah, L. Han, and C. Lin, Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation. 2025. DOI: https://doi.org/10.21203/rs.3.rs-7510599/v1
[117] H. Elsafty, T. Deußer, M. Pielka, C. Bauckhage, and R. Sifa, "ArDia: Improving Arabic Dialectal Language Classification Using a Novel Dataset," Proceedings of the International AAAI Conference on Web and Social Media, vol. 19, pp. 2413-2422, 06/07 2025, doi: 10.1609/icwsm.v19i1.35944. DOI: https://doi.org/10.1609/icwsm.v19i1.35944
[118] K. Almeman, "Automated Building of a Multidialectal Parallel Arabic Corpus Using Large Language Models," Data, vol. 10, no. 12, p. 208doi: 10.3390/data10120208. DOI: https://doi.org/10.3390/data10120208
[119] H. Zaidani, R. Koulali, A. Maizate, and M. Ouzzif, "Augmentation and Classification of Requests in Moroccan Dialect to Improve Quality of Public Service: A Comparative Study of Algorithms," Future Internet, vol. 17, no. 4, p. 176doi: 10.3390/fi17040176. DOI: https://doi.org/10.3390/fi17040176
[120] S. Zaid, A. H. Alharbi, and H. Samra, "Multi-Aspect Sentiment Classification of Arabic Tourism Reviews Using BERT and Classical Machine Learning," Data, vol. 10, no. 11, p. 168doi: 10.3390/data10110168. DOI: https://doi.org/10.3390/data10110168
Downloads
Published
Data Availability Statement
The data that support the findings of this study are available from the corresponding author, [S], upon reasonable request.
Issue
Section
Categories
License
Copyright (c) 2026 Dr. Hani Iwidat, Dr. Mamoun Abu Helou (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.








