Named Entity Recognition for Hindi Current Landscape and Emerging Trends

SHALINI SHARMA; Piyush P. Singh

doi:10.70715/jitcai.2025.v2.i2.021

Authors

SHALINI SHARMA JAWAHARLAL NEHRU UNIVERSITY NEW DELHI Author https://orcid.org/0009-0007-0888-6618
Dr. Piyush Pratap Singh Author
- Funding Acquisition

DOI:

https://doi.org/10.70715/jitcai.2025.v2.i2.021

Keywords:

NLP, NAMED ENTITY RECOGNITION, Artificial Intelligence, Machine learning

Abstract

Named Entity Recognition (NER) plays a crucial role in Natural Language Processing (NLP) by automatically identifying and classifying entities such as names of people, places, organizations, dates, and numerical values within unstructured text. While NER has seen major advancements in resource-rich languages like English, building robust NER systems for Indian languages—particularly Hindi—remains a significant challenge. Hindi presents unique linguistic complexities such as rich morphology, free word order, the absence of capitalization cues, and widespread use of code-mixed text, which complicate the task further. Over the years, researchers have explored a wide range of approaches to address these challenges, starting with rule-based and statistical models and progressing to sophisticated deep learning and transformer-based techniques. Multilingual models like mBERT, IndicBERT, and MuRIL have shown promise in improving accuracy and generalizability for Hindi NER. This review offers an in-depth look at the current state of Hindi NER, including the available annotated datasets, computational models, and performance benchmarks. It highlights the gaps that persist, such as the scarcity of high-quality annotated data, difficulties in handling informal and domain-specific language, and limited adaptability across different text types. The paper also outlines future directions for research, emphasizing the need for low-resource learning strategies, domain adaptation, and better handling of noisy and code-mixed data. As Hindi continues to dominate communication in various digital spaces, advancing NER systems for this language is more relevant than ever.

Downloads

Download data is not yet available.

Author Biography

Dr. Piyush Pratap Singh

Professor in the School of Computer and System Sciences (SCSS) in Jawaharlal Nehru University, New Delhi.

References

[1] A. Mansouri, L. S. Affendey, A. Mamat, and R. A. Kadir, “Semantically factoid question answering using fuzzy SVM named entity recognition,” in 2008 International Symposium on Information Technology, vol. 2, 2008, pp. 1–7 DOI: 10.1109/ITSIM.2008.4631684.

[2] A. Goyal, V. Gupta, and M. K., “Deep learning-based named entity recognition system using hybrid embedding,” Cybernetics and Systems, vol. 55, no. 2, pp. 279–301, 2024. [Online]. Available: https://doi.org/10.1080/01969722.2022.2111506.

[3] S. Srivastava, M. Sanglikar, and D. Kothari, “Named entity recognition system for Hindi language: a hybrid approach,” International Journal of Computational Linguistics (IJCL), vol. 2, no. 1, pp. 10–23, 2011.

[4] P. Deshmukh, N. Kulkarni, S. Kulkarni, K. Manghani, P. A. Khadkikar, and R. Joshi, “Named entity recognition for Indic languages: A comprehensive survey,” in 2024 1st International Conference on Trends in Engineering Systems and Technologies (ICTEST). IEEE, 2024, pp. 1–6 DOI: 10.1109/ICTEST60614.2024.10576183.

[5] Eftimov, T., Seljak, B. K., & Korošec, P. (2017). A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations. PLOS ONE, 12(6), e0179488. https://doi.org/10.1371/journal.pone.0179488.

[6] Sharma, R., Morwal, S., Agarwal, B. et al. A deep neural network-based model for named entity recognition for Hindi language. Neural Comput & Applic 32, 16191–16203 (2020). https://doi.org/10.1007/s00521-020-04881-z.

[7] R. Sharma, S. Morwal, and B. Agarwal, “Named entity recognition using neural language model and CRF for Hindi language,” Computer Speech & Language, vol. 74, p. 101356, 2022, https://doi.org/10.1016/j.csl.2022.101356.

[8] D. Chopra, N. Jahan, and S. Morwal, “Hindi named entity recognition by aggregating rule-based heuristics and hidden Markov model,” International Journal of Information, vol. 2, no. 6, pp. 43–52, 2012.

[9] Abdallah, S., Shaalan, K., Shoaib, M. (2012). Integrating a Rule-Based System with Classification for Arabic Named Entity Recognition. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2012. Lecture Notes in Computer Science, vol. 7181. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28604-9_26. pp. 311–322.

[10] X. Qu, Y. Gu, Q. Xia, Z. Li, Z. Wang, and B. Huai, “A survey on Arabic named entity recognition: Past, recent advances, and future trends,” IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 3, pp. 943–959, 2024, doi: 10.1109/TKDE.2023.3303136.

[11] N. Shah and J. Pareek, “Optimized Hindi negation detection using a hybrid rule-based and BERT model,” in 2024 International Conference on IoT-Based Control Networks and Intelligent Systems (ICICNIS), 2024, doi: 10.1109/ICICNIS64247.2024.10823144.

[12] Gali, K., Surana, H., Vaidya, A., Shishtla, P. M., & Sharma, D. M. (2008). Aggregating machine learning and rule-based heuristics for named entity recognition. In Proceedings of the IJCNLP-08 Workshop on Named Entity Recognition for South and Southeast Asian Languages.

[13] J. Wang, W. Xu, X. Fu, G. Xu, and Y. Wu, “Astral: adversarial trained LSTM-CNN for named entity recognition,” Knowledge-based systems, vol. 197, p. 105842, 2020. https://doi.org/10.1016/j.knosys.2020.105842.

[14] V. Athavale, S. Bharadwaj, M. Pamecha, A. Prabhu, and M. Shrivastava, “Towards deep learning in Hindi NER: An approach to tackle the labeled data scarcity,” arXiv preprint arXiv:1610.09756, 2016. https://doi.org/10.48550/arXiv.1610.09756

[15] L. Luo, Z. Yang, P. Yang, Y. Zhang, L. Wang, H. Lin, and J. Wang, “An attention-based BILSTM-CRF approach to document-level chemical named entity recognition,” Bioinformatics, vol. 34, no. 8, pp. 1381–1388, 2018. https://doi.org/10.1093/bioinformatics/btx761.

[16] N. P. Desai and V. K. Dabhi, “Taxonomic survey of Hindi language NLP systems,” arXiv preprint arXiv:2102.00214, 2021. https://doi.org/10.48550/arXiv.2102.00214

[17] S. Dandapat, P. Biswas, M. Choudhury, and K. Bali, “Complex linguistic annotation—no easy way out! a case from Bangla and Hindi POS labeling tasks,” in Proceedings of the third linguistic annotation workshop (LAW III), 2009, pp. 10–18.

[18] Jain, A., Tayal, D.K., Yadav, D., Arora, A. (2020). Research Trends for Named Entity Recognition in Hindi Language. In: Hemanth, J., Bhatia, M., Geman, O. (eds.) Data Visualization and Knowledge Engineering. Lecture Notes on Data Engineering and Communications Technologies, vol. 32. Springer, Cham. https://doi.org/10.1007/978-3-030-25797-2_10. pp. 223–248, 2019.

[19] B. Shah and S. K. Kopparapu, “A deep learning approach for Hindi named entity recognition,” arXiv preprint arXiv:1911.01421, 2019.

[20] A. A. Choure, R. B. Adhao, and V. K. Pachghare, “NER in Hindi language using the transformer model: XLM-Roberta,” in 2022 IEEE International Conference on Blockchain and Distributed Systems Security (ICBDS). IEEE, 2022, pp. 1–5. doi: 10.1109/ICBDS53701.2022.9935841.

[21] Barua, A., Thara, S., Premjith, B., Soman, K.P. (2021). Analysis of Contextual and Non-contextual Word Embedding Models for Hindi NER with Web Application for Data Collection. In: Garg, D., Wong, K., Sarangapani, J., Gupta, S.K. (eds) Advanced Computing. IACC 2020. Communications in Computer and Information Science, vol 1367. Springer, Singapore. https://doi.org/10.1007/978-981-16-0401-0_14.

[22] Ghosal, S. S. (2024). Enhancing Few-Shot Performance on Low-Resource Indic Languages [Preprint]. arXiv. https://arxiv.org/abs/2412.05710.

[23] Sankaran, A. N., Farahbakhsh, R., & Crespi, N. (2025). Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning. In Proceedings of COLING 2025. ACL Anthology. https://aclanthology.org/2025.coling-main.373.pdf.

[24] Mundra, S. (2025). A prototypical network-based few-shot learning to detect Hindi-English code-mixed offensive text. Social Network Analysis and Mining, 15(1). https://doi.org/10.1007/s13278-025-01431-0.

Named Entity Recognition for Hindi Current Landscape and Emerging Trends

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

References

Downloads

Published

Issue

Section

License

How to Cite

Share

Make a Submission

About US

ISSN

Policies

Indexing

Keywords

Latest publications