•  
  •  
 

Al-Bahir Journal for Engineering and Pure Sciences

Abstract

With the increasing availability of textual information in various languages via the Internet in homes and companies through Internet and intranet services, there is an urgent need for the technologies and tools necessary to process this information, phonetic representation, and voice interaction. For example voice to voice machine translation need to phonetic mapping and similarity among the languages especially for names and foreign words. This one example of the importance of phonetic mapping and similarity. This article aims to describe, in detail, the recent surge in interest and advancements in phonetic similarity (PS), phonetic representation, and phonetic mapping researches. PS and phonetic representation are a fundamental elements in information technology, supporting applications such as search engines, speech recognition and voice to voice MT systems. The importance of PS is demonstrated, the main characteristics of phonetic processing are highlighted, and the standardization aspects in converting text to phonetic representation are clarified. The current study presented a survey of previous studies for a period of time (2000/ 2024). By utilizing advanced AI algorithms and diverse linguistic datasets, new avenues for comprehending the dynamics and alterations of extinct languages over time, as well as, their pronunciation mechanisms, can be explored. This change presents a breakthrough in language learning and cross-cultural communication in addition to being a technical advancement. Additionally, various linguistic resources and approaches used in the PS field are explained. Also, the features of common tools are described, and standard evaluation metrics are illustrated. The article also reviews the current state of art for PS research and converting text to phonetic representation. Finally, we present our analysis and conclusions. Disseminating these findings is crucial for scholars focused on phonetic similarity and its related methodologies.

References

[1] Chomsky N. Language and mind. Cambridge University Press; 2006.

[2] El-Imam YA. Phonetization of Arabic: rules and algorithms. Comput Speech Lang 2004;18:339e73.

[3] AMaH Mahdi, Sabah Asaad, editors. The current state of linked data-based recommender systems. 2021 2nd Information Technology to Enhance e-learning and Other Application (IT-ELA). IEEE; 2021.

[4] Ladefoged P, editor. The measurement of phonetic similarity. International Conference on Computational Linguistics COLING 1969; 1969. Preprint No 57.

[5] Weide R. The CMU pronunciation dictionary, release 0.6. Pittsburgh, PA: Carnegie Mellon University; 1998.

[6] Gayar NaS El, Yee Ching. Computational linguistics, speech and image processing for Arabic language. World Scientific; 2018.

[7] Zouhar Ve, Chang K, Cui C, Carlson N, Robinson N, Sachan M, et al. Pwesuite: Phonetic word embeddings and tasks they facilitate. arXiv preprint arXiv:230402541. 2023.

[8] Libovicky J, Fraser A. Neural string edit distance. arXiv preprint arXiv:210408388 2022.

[9] Halpern J, editor. CJKI Arabic romanization system. Abu Dhabi: the international Symposium on Arabic transliteration standard: challenges and solutions; 2009.

[10] Bakar JA, Omar K, Nasrudin MF, Murah MZ, Ahmad CW, editors. Implementation of Buckwalter transliteration to Malay corpora. 2013 13th international conference on intellient systems design and applications; 2013.

[11] Toma S-A, Stan A, Pura M-L, B'arsan T, editors. MaR[1]ePhoRdan open access machine-readable phonetic dictionary for Romanian. 2017 international conference on speech technology and human-Computer Dialogue (SpeD); 2017.

[12] Out-of-the-box universal romanization tool uroman. In: Hermjakob U, May J, Knight K, editors. Proceedings of ACL 2018, system demonstrations; 2018.

[13] Epitran: Precision G2P for many languages. In: Mortensen DR, Dalmia S, Littell P, editors. Proceedings of the Eleventh international conference on language resources and evaluation (LREC 2018); 2018.

[14] JLLaLFEAaMEGaYL-SaSMaAWaADMaK Gorman, editor. Massively multilingual pronunciation modeling with WikiPron. International conference on language resources and evaluation; 2020.

[15] Chootrakool P, Wuttiwiwatchai C, Kosawat K. A large pronunciation dictionary for Thai speech processing. Proc of ASIALEX 2009.

[16] Panphon: A resource for mapping IPA segments to articulatory feature vectors. In: Mortensen DR, Littell P, Bharadwaj A, Goyal K, Dyer C, Levin L, editors. Proceedings of COLING 2016, the 26th international conference on computational linguistics. Technical Papers; 2016.

[17] KaB Batsuren. G'abor and Giunchiglia, Fausto and others. Cognet: a large-scale cognate database. ACL 2019 the 57th Annual Meeting of the Association for computational linguistics: Proceedings of the Conference. Association for Computational Linguistics; 2019.

[18] Pratap V, Xu Q, Sriram A, Synnaeve G, Collobert R. Mls: A large-scale multilingual dataset for speech research. arXiv preprint arXiv:201203411 2020.

[19] Batsuren K, Bella Ga, Giunchiglia F. A large and evolving cognate database. Comput Humanit 2022:1e25.

[20] Mohammed ZR, Aliwy AH, editors. English-Arabic phonetic dataset construction. BIO Web of Conferences; 2024.

[21] Artetxe M, Labaka G, Lopez-Gazpio I, Agirre E. Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation. arXiv preprint arXiv:180902094 2018.

[22] El-Geish M. Learning joint acoustic-phonetic word embeddings. arXiv preprint arXiv:190800493 2019. [23] Feng X, Wang L, editors. Application of Word2vec in phoneme recognition. Proceedings of the 2020 12th international conference on machine learning and computing; 2020.

[24] Kanojia D, Patel K, Bhattacharyya P, Kulkarni M, Haffari G. Utilizing wordnets for cognate detection among indian languages. arXiv preprint arXiv:211215124 2021.

[25] Sharma R, Dhawan K, Pailla B. Phonetic word embeddings. arXiv preprint arXiv:210914796 2021.

[26] Altinok D. Towards Turkish ASR: Anatomy of a rule-based Turkish g2p. arXiv preprint arXiv:160103783 2016.

[27] Harrat S, Meftouh K, Abbas M, Sma"ili K. Grapheme to phoneme conversion-an Arabic dialect case. In: The 4th international workshop on spoken language technologies for under-resourced languages (SLTU'14); 2014: spoken Language technologies for under-resourced languages; 2014.

[28] Nehar A, Bellaouar S, Ziadi D, Omar KM. Arabic Personal Name Matching: Names Written using Latin Alphabet. J Comput Sci 2021;17:776e88.

[29] Bisani M, Ney H. Joint-sequence models for grapheme-tophoneme conversion. Speech Commun 2008;50:434e51.

[30] Sindran F. Automatic phonetic transcription of standard Arabic with applications in the NLP domain. Ph.D. Dissertation]; 2021.

[31] How to compare automatically two phonological strings: Application to intelligibility measurement in the case of atypical speech. In: Ghio A, Lalain M, Giusti L, Fredouille C, Woisard V, editors. 12th conference on language resources and evaluation (LREC 2020); 2020.

[32] Hixon B, Schneider E, Epstein SL, editors. Phonemic similarity metrics to compare pronunciation methods. INTERSPEECH; 2011.

[33] Jucksriporn C, Sornil O. Thai Phonetic Distance Using Phonetic Transcription. Int J Inf Process Manag 2014;5:133.

[34] Investigating Phoneme Similarity with Artificially Accented Speech. In: Masson M, Carson-Berndsen J, editors. Proceedings of the 20th SIGMORPHON workshop on computational research in phonetics, phonology, and Morphology; 2023.

[35] Kantor A, Hasegawa-Johnson M. Hmm-based pronunciation dictionary generation. New tools and methods for very large scale phonetics research. University of Pennsylvania; 2011.

[36] Nahar K, Al-Muhtaseb H, Al-Khatib W, Elshafei M, Alghamdi M. Arabic Phonemes Transcription using Data Driven Approach. Int Arab J Inf Technol 2015;12. (continued ) No. Papers dataset language Application/Task Technique category 53 [104] google-transliteration multi Languages Transliteration LSTM 54 [105] CMU Multi Languages G2P LSTM 55 [106] collect a corpus EANames with English and Arabic names Arabic-English Transliteration LSTM 56 [107] Private dataset Multi Languages Cognate Transformer 57 [108] Private dataset Multi Languages Cognate Transformer AL-BAHIR JOURNAL FOR ENGINEERING AND PURE SCIENCES 2024;5:140e159 157

[37] Construction of a Persian Letter-to-Sound Conversion System. In: Arab MM, Azimizadeh A, editors. Proceedings of the third workshop on computational approaches to Arabic-Script-based languages (CAASL3); 2009.

[38] Khan M. Selection of discriminative features for Arabic phoneme’s mispronunciation detection. Pakistan J Sci 2015; 67.

[39] Maqsood M, Habib H, Anwar S, Ma Ghazanfar. A Comparative Study of Classifier Based Mispronunciation Detection System for Confusing. Nucleus 2017;2:114e20.

[40] Nazir F, Majeed MN, Ghazanfar MA, Maqsood M. An Arabic mispronunciation detection system based on the frequency of mistakes for Asian speakers. Mehran Univ Res J Eng Technol 2021;40:279e97.

[41] Johnson K. Aligning phonetic transcriptions with their citation forms. Acoust Res Lett Online 2004;5:19e24.

[42] Ibrahim AB, Seddiq YM, Meftah AH, Alghamdi M, Selouani S-A, Qamhan MA, et al. Optimizing arabic speech distinctive phonetic features and phoneme recognition using genetic algorithm. IEEE Access 2020;8:200395e411.

[43] Bi-directional conversion between graphemes and phonemes using a joint n-gram model. In: Galescu L, Allen JF, editors. 4th ISCA Tutorial and research workshop (ITRW) on speech synthesis; 2001.

[44] Suyanto S, Sunyoto A, Ismail RN, Rachmawati E, Maharani W. Stemmer and phonotactic rules to improve ngram tagger-based indonesian phonemicization. J King Saud Univer -Computer Infor Sci 2022;34:3807e14.

[45] Lexicon-driven approach to the recognition of Arabic named entities. In: Halpern J, editor. Proceedings of the second international conference on Arabic language resources and tools; 2009. [46] Mahmudi A, Veisi H. Automated grapheme-to-phoneme conversion for central kurdish based on optimality theory. Comput Speech Lang 2021;70:101222.

[47] Bhagat R, Hovy EH, editors. Phonetic models for generating spelling Variants. IJCAI; 2007.

[48] Algabri M, Mathkour H, Alsulaiman M, Bencherif MA. Mispronunciation detection and diagnosis with articulatory-level feedback generation for non-native arabic speech. Mathematics 2022;10:2727.

[49] Asif A, Mukhtar H, Alqadheeb F, Ahmad HF, Alhumam A. An approach for pronunciation classification of classical Arabic phonemes using deep learning. Appl Sci 2021;12:238.

[50] Nazir F, Majeed MN, Ghazanfar MA, Maqsood M. Mispronunciation detection using deep convolutional neural network features and transfer learning-based model for Arabic phonemes. IEEE Access 2019;7:52589e608.

[51] Ziafat N, Ahmad HF, Fatima I, Zia M, Alhumam A, Rajpoot K. Correct pronunciation detection of the arabic alphabet using deep learning. Appl Sci 2021;11:2508.

[52] Barman PP, Boruah A. A RNN based Approach for next word prediction in Assamese Phonetic Transcription. Pro[1]cedia Comput Sci 2018;143:117e23.

[53] Behbahani YM, Babaali B, Turdalyuly M. Persian sentences to phoneme sequences conversion based on recurrent neural networks. Open Computer Science 2016;6: 219e25.

[54] Jakobi DN. Grapheme-to-Phoneme mapping in text [Master's thesis]. University of Zurich; 2022.

[55] Qamhan MA, Alotaibi YA, Seddiq YM, Meftah AH, Selouani SA. Sequence-to-sequence acoustic-to-phonetic conversion using spectrograms and deep learning. IEEE Access 2021;9:80209e20.

[56] Alashban AA, Alotaibi YA. A Deep Learning Approach for Identifying and Discriminating Spoken Arabic Among Other Languages. IEEE Access 2023;11:11613e28.

[57] Yao K, Zweig G. Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. arXiv preprint arXiv:150600196 2015.

[58] Zia HB, Raza AA, Athar A. PronouncUR: An urdu pronunciation lexicon generator. arXiv preprint arXiv: 180100409 2018.

[59] Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval. In: Meng HM, Lo W-K, Chen B, Tang K, editors. IEEE workshop on automatic speech recognition and understanding, 2001 ASRU'01; 2001.

[60] Yousef AH. Cross-language personal name mapping. arXiv preprint arXiv:14056293 2014. [61] Rao SMCaS. Rule-Based Phonetic Matching Approach for Hindi and Marathi. Int J Res Soc Sci 2011;1:26e41.

[62] Blum F, List J-M. Trimming phonetic alignments improves the inference of sound correspondence patterns from multilingual wordlists. arXiv preprint arXiv:230317932 2023.

[63] Collapsed consonant and vowel models: New approaches for English-Persian transliteration and back-transliteration. In: Karimi S, Scholer F, Turpin A, editors. Proceedings of the 45th annual Meeting of the Association of computational linguistics; 2007.

[64] Phoneme alignment using the information on phonological processes in continuous speech. In: Kocharov D, editor. Proceedings of the Tenth international conference on language resources and evaluation (LREC'16); 2016.

[65] Kondrak G, editor. A new algorithm for the alignment of phonetic sequences. 1st Meeting of the North American Chapter of the Association for Computational Linguistics; 2000.

[66] LexStat: Automatic detection of cognates in multilingual wordlists. In: List J-M, editor. Proceedings of the EACL 2012 joint workshop of LINGVIS & UNCLH; 2012.

[67] List J-M, Forkel R, Hill NW. A new framework for fast automated phonological reconstruction using trimmed alignments and sound correspondence patterns. arXiv preprint arXiv:220404619 2022.

[68] The SIGTYP 2022 shared task on the prediction of cognate reflexes. In: List J-M, Vylomova E, Forkel R, Hill N, Cotterell R, editors. Proceedings of the 4th workshop on research in computational linguistic Typology and multilingual NLP; 2022.

[69] Phonetic vector representations for sound sequence alignment. In: Sofroniev P, Coltekin C, editors. Proceedings of the Fifteenth workshop on computational research in phonetics, phonology, and Morphology; 2018.

[70] Ahmed T, Suffian M, Khan MY, Bogliolo A. Discovering lexical similarity using articulatory feature-based phonetic edit distance. IEEE Access 2021;10:1533e44.

[71] HaaKA Al-Dhlan. Auto-Extracting Method of Cognates Words in Arabic and English Languages. Int J Adv Stud Comput Sci Eng 2017:1e13.

[72] Dautriche I, Mahowald K, Gibson E, Christophe A, Piantadosi ST. Words cluster phonetically beyond phonotactic regularities. Cognition 2017;163:128e45.

[73] Droppo J, Acero A, editors. Context dependent phonetic string edit distance for automatic speech recognition. 2010 IEEE International Conference on Acoustics, Speech and Signal Processing; 2010. [74] Eden SE. Measuring phonological distance between languages. 2018.

[75] Cognate production using character-based machine translation. In: Beinborn L, Zesch T, Gurevych I, editors. Proceedings of the sixth international joint conference on natural language processing; 2013.

[76] An expectation maximisation algorithm for automated cognate detection. In: MacSween R, Caines A, editors. Proceedings of the 24th Conference on computational natural language learning; 2020.

[77] Rama T, List J-M, editors. An automated framework for fast cognate detection and Bayesian phylogenetic inference in computational historical linguistics; 2019.

[78] Multiple word alignment with profile hidden Markov models. In: Bhargava A, Kondrak G, editors. Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, companion volume: student research workshop and doctoral consortium; 2009. 158 AL-BAHIR JOURNAL FOR ENGINEERING AND PURE SCIENCES 2024;5:140e159

[79] Shao Y. Machine Transliteration of Names from Different Language Origins into Chinese. Uppsala University [Master’s Thesis]; 2014.

[80] Learning phonetic similarity for matching named entity translations and mining new translations. In: Lam W, Huang R, Cheung P-S, editors. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval; 2004.

[81] Poetic sound similarity vectors using phonetic features. In: Parrish A, editor. Proceedings of the AAAI conference on artificial intelligence and interactive digital entertainment; 2017.

[82] Using support vector machines and state-of-the-art algorithms for phonetic alignment to identify cognates in multilingual wordlists. In: Jager G, List J-M, Sofroniev P, editors. Proceedings of the 15th conference of the {E}uropean Chapter of the Association for computational linguistics: Volume 1, Long papers. Valencia, Spain: Association for Computational Linguistics; 2017.

[83] Mani I, Yeh A, Condon S. Learning to match names across languages. Multi-source. Multilingual Information Extraction and Summarization. 2013. p. 53e71.

[84] Cross linguistic name matching in English and Arabic. In: Freeman A, Condon S, Ackerman C, editors. Proceedings of the human language technology conference of the NAACL, main conference; 2006.

[85] Davis CI, editor. Tajik-farsi Persian transliteration using statistical machine translation. LREC; 2012.

[86] Loots L, Niesler T, editors. Data-driven phonetic comparison and conversion between south african, british and american English pronunciations. INTERSPEECH; 2009.

[87] Knight K, Graehl J. Machine transliteration. Comput Ling 1998;24:599e612.

[88] Stalls BG, Knight K, editors. Translating names and technical terms in Arabic text. Computational Approaches to Semitic Languages; 1998.

[89] Fourrier Ce. Neural Approaches to Historical Word Reconstruction. 2022.

[90] Li P, MacWhinney B. PatPho: A phonological pattern generator for neural networks. Behav Res Methods Instrum Comput 2002;34:408e15.

[91] Li X, Dalmia S, Li J, Lee M, Littell P, Yao J, et al., editors. Universal phone recognition with a multilingual allophone system. ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP); 2020.

[92] Marjou X. OTEANN: Estimating the transparency of orthographies with an artificial neural network. arXiv preprint arXiv:191213321 2019.

[93] Goswami K, Rani P, Fransen T, McCrae JP. Weakly-supervised Deep Cognate Detection Framework for LowResourced Languages Using Morphological Knowledge of Closely-Related Languages. arXiv preprint arXiv:231105155 2023.

[94] Siamese convolutional networks for cognate identification. In: Rama T, editor. Proceedings of COLING 2016, the 26th international conference on computational linguistics. Technical Papers; 2016.

[95] Yolchuyeva S. Novel NLP Methods for Improved Text-ToSpeech Synthesis. Ph.D. Dissertation]; 2021. [96] Cheng S, Ding Z, Yan S. English-to-chinese transliteration with phonetic back-transliteration. arXiv preprint arXiv: 211210321 2021.

[97] Predicting historical phonetic features using deep neural networks: A case study of the phonetic system of ProtoIndo-European. In: Hartmann F, editor. Proceedings of the 1st international workshop on computational approaches to historical language change; 2019.

[98] English-to-chinese transliteration with phonetic auxiliary task. In: He Y, Cohen SB, Linguist A, editors. Proceedings of the 1st conference of the Asia-Pacific Chapter of the Association for computational linguistics and the 10th international joint conference on natural language processing. Suzhou, China: Association for Computational Linguistics; 2020.

[99] Huda MN, Hasan MM, Hassan F, Kotwal MR, Muhammad G, Rahman CM. Articulatory feature extraction for speech recognition using neural network. Int Rev Comp Software 2011;6:25e31.

[100] Younes J, Souissi E, Achour H, Ferchichi A. A sequence-tosequence based approach for the double transliteration of Tunisian dialect. Procedia Comput Sci 2018;142:238e45.

[101] Yusuf B, Cernock`y J, Sara çlar M. End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing; 2023.

[102] Datta A, Zhao G, Ramabhadran B, Weinstein E. LSTM Acoustic Models Learn to Align and Pronounce with Graphemes. arXiv preprint arXiv:200806121 2020.

[103] Mu D, Sun W, Xu G, Li W. Japanese Pronunciation Evaluation Based on DDNN. IEEE Access 2020;8:218644e57.

[104] Rosca M, Breuel T. Sequence-to-sequence neural network models for transliteration. arXiv preprint arXiv:161009565 2016.

[105] Sokolov A, Rohlin T, Rastrow A. Neural machine translation for multilingual grapheme-to-phoneme conversion. arXiv preprint arXiv:200614194 2020.

[106] Tian Y, Lou R, Pang X, Wang L, Jiang S, Song Y, editors. Improving English-Arabic transliteration with phonemic memories. Findings of the Association for computational linguistics: EMNLP 2022; 2022.

[107] Akavarapu V, Bhattacharya A. Automated Cognate Detection as a Supervised Link Prediction Task with Cognate Transformer. arXiv preprint arXiv:240202926 2024.

[108] Mahesh Akavarapu V, Bhattacharya A. Cognate Transformer for Automated Phonological Reconstruction and Cognate Reflex Prediction. arXiv e-prints 2023. arXiv– 2310.

[109] That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages. In: P Dehak, editor. 21st annual conference of the international speech communication Association. Shanghai: ISCA; 2020.

[110] Kanojia D. Investigations into distributional semantics for cognate detection and phylogenetics. 2021.

[111] Kondrak G, editor. Identification of cognates and recurrent sound correspondences in word lists. Traitement Automatique des Langues, vol. 50; 2009. Num{y'e}ro 2: Langues anciennes [Ancient Languages].

[112] Farooq MU, Hain T, editors. Investigating the impact of cross-lingual acoustic-phonetic similarities on multilingual speech recognition. Interspeech 2022-23rd Annual Conference of the International Speech Communication Association; 2022.

[113] Farooq MU, Hain T. Learning cross-lingual mappings for data augmentation to improve low-resource speech recognition. arXiv preprint arXiv:230608577. 2023.

[114] Translating English names to Arabic using phonotactic rules. In: Alshuwaier F, Areshey A, editors. Proceedings of the 25th Pacific Asia conference on language, information and computation; 2011.

Share

COinS