Retrieval-augmented generation analysis of user reviews in Saudi mobile banking apps: a comparative user experience study of iOS and Android
- Published
- Accepted
- Received
- Academic Editor
- Luca Ardito
- Subject Areas
- Human-Computer Interaction, Graphics, Social Computing, Software Engineering, Sentiment Analysis
- Keywords
- Human–computer interaction, User experience, Usability, Mobile banking, iOS and Android applications, Digital financial services, Saudi Arabia, Retrieval-augmented generation
- Copyright
- © 2026 Albesher and Alsanousi
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
- Cite this article
- 2026. Retrieval-augmented generation analysis of user reviews in Saudi mobile banking apps: a comparative user experience study of iOS and Android. PeerJ Computer Science 12:e3565 https://doi.org/10.7717/peerj-cs.3565
Abstract
Mobile banking in Saudi Arabia has experienced rapid growth driven by the government’s digital transformation efforts under Saudi Vision 2030, which has prompted the adoption of innovative technologies to improve service delivery across digital financial transformation. Analyzing user reviews is instrumental in gauging the success of mobile banking applications (apps). Utilizing sentiment analysis allows banking providers to convert customer feedback into actionable insights, which can help improve their services, attract new customers, and retain current customers. In this study, we conducted a comparative sentiment analysis of mobile banking apps in Saudi Arabia across iOS and Android platforms. We systematically analyzed 15,459 iOS and 230,734 Android user reviews in multiple languages from 10 leading Saudi banks. Utilizing a pre-trained sentiment model, we classified user sentiments into three categories: positive, negative, and neutral. To deepen the evaluation, Retrieval-Augmented Generation (RAG) with GPT-4o was applied to extract fine-grained user concerns, which were visualized through heatmaps of the most frequent usability issues. Our findings revealed clear differences in user satisfaction across mobile banking apps on iOS and Android. On Android, Alrajhi showed the strongest positive sentiment. Both Banque Saudi Fransi and Saudi Investment Bank received consistently negative reviews on iOS and Android. Arab National Bank received strong positive sentiment on iOS but weak positive sentiment on Android. The comparison between 2022 and 2023 showed mixed outcomes: Saudi National Bank improved on Android and Saudi Awwal Bank on iOS, but Alinma on iOS and Arab National Bank on Android worsened. Beyond sentiment, usability was examined through the ISO 9241-11 model, which highlighted persistent issues in effectiveness, efficiency, and satisfaction. The results uncovered recurring problems in instability (e.g., crashes), performance slowness (e.g., delayed responsiveness), User Interface (UI) and User Experience (UX) shortcomings (e.g., confusing layouts), authentication difficulties (e.g., login failures), feature limitations (e.g., missing functions), and insufficient customer support (e.g., unresponsive assistance). Saudi banks should continuously enhance their apps by systematically analyzing user feedback, identifying recurring usability issues, and fixing key emerging problems. Early resolution of critical issues in mobile banking apps, coupled with continuous refinement, enhances customer loyalty and sustains competitiveness within the rapidly evolving digital financial ecosystem.
Introduction
Mobile banking refers to the use of portable devices that enable a variety of financial operations and allow users to access banking services anytime and anywhere. The revolutionized technologies have encouraged the banking sector to provide customers with a convenient way to perform transactions (e.g., account management, fund transfers, bill payments, and balance inquiries) without the need for visiting physical branches (Saeed & Donkoh, 2024). Rapid expansion of smartphones, combined with mobile internet connectivity, prompted a noticeable migration of consumers toward digital finance (Shaikh et al., 2023; Husainah et al., 2023). The outcome is elevated satisfaction among customers, which extends financial inclusion to previously underserved communities (Benjamin, Amajuoyi & Adeusi, 2024; Ahmed et al., 2024; Guerra-Leal, Arredondo-Trapero & Vázquez-Parra, 2023; Islam et al., 2024).
Saudi mobile banking has experienced rapid growth due to the attributed to the government’s push for digital transformation. Saudi Vision 2030 has prompted economic diversification by moving away from oil dependency. It encourages the development of high-quality banking services that attract local as well as foreign investments. The initiative emphasizes enhancing user satisfaction as well as adopting innovative technologies to improve service delivery by increasing the sector’s competitiveness (Ishfaq, Al Hajieh & Alharthi, 2020). The adoption of mobile banking services in the Saudi is supported by a young tech population that is increasingly comfortable with digital transactions (Alotaibi & Aljaafari, 2024; Najafi, Amra & Najafi, 2024). Furthermore, regulatory directives from the Saudi Central Bank produced a supportive context that enables financial institutions to enhance mobile-based solutions (Aysan, Ozturk & Selim, 2025). There is a noticeable surge in mobile banking engagement around the globe. However, it is uniquely shaped by cultural and economic factors within Saudi Arabia, including a preference for convenience and security in financial transactions (Ghali, 2021; Alnemer, 2022).
Mobile applications deliver an essential gateway for financial institution clients. Many apps extend a broad range of tools such as account administration, transaction logs, fund transfers, plus personalized insights for monetary planning (Kala Kamdjoug et al., 2021). A focus on intuitive design proves highly influential on customer loyalty, given its link to user satisfaction (Shahid et al., 2022; Febrian, Simanjuntak & Hasanah, 2021; Kamboj, Sharma & Sarmah, 2022; Zhou et al., 2021). Competitive pressures in the banking arena prompt numerous providers to advance their mobile interfaces. Efforts prioritize expanded capabilities, refined interaction schemes that address evolving consumer requirements, then strategic differentiation in a crowded marketplace (Ubam, Hipiny & Ujir, 2021; Wu & Ho, 2022).
In mobile banking platforms, customer reviews provide a foundational source of information that molds user experience. They also determine overarching success. Such remarks disclose immediate impressions linked to operational efficiency plus trustworthiness. Developers utilize this evidence to refine critical features. Financial groups leverage those insights to evaluate customer contentment. Encouraging feedback correlates with stronger user recruitment together with retention (Zhang et al., 2021; Torabi & Bélanger, 2021; Al-Abbadi et al., 2022), so internet-based assessments exert a major influence over consumer preferences. Interested parties peruse abundant appraisals, which enables well-informed judgments about proposed banking products (Leem & Eum, 2021). Furthermore, customer reviews contribute to the iterative improvement of mobile banking apps. Reviews often highlight specific areas for enhancement. This feedback loop allows developers to address shortcomings promptly and to adapt the app to more reliably meet user expectations, leading to increased user satisfaction (Ogundipe, Odejide & Edunjobi, 2024; Desmal et al., 2023; Runsewe, Osundare & Folorunsho, 2024). Thus, integrating customer feedback into the app development process enhances usability and strengthens customer loyalty (Molinillo et al., 2022, 2020), since satisfied users are more likely to recommend the app to others.
Previous studies have extensively examined mobile banking apps across multiple dimensions. Several works have addressed customer experience and usability aspects (Kala Kamdjoug et al., 2021; Shahid et al., 2022), including accessibility and interface-level evaluation (Zhou et al., 2021; Ubam, Hipiny & Ujir, 2021), as well as post-adoption satisfaction and service quality (Lopes, Façanha & Viana, 2022; De Leon, Atienza & Susilo, 2020; Poromatikul et al., 2019; Majumdar & Pujari, 2022; Alismail & Albesher, 2023). Other studies have focused on adoption behavior and influencing factors such as perceived usefulness and performance expectations (Saprikis, Avlogiaris & Katarachia, 2022; Ivanova & Kim, 2022; Thusi & Maduku, 2020; Borowski-Beszta & Kiermas, 2019; Roy & Shaw, 2023), alongside psychological constructs like trust and risk (Ivanova & Kim, 2022; Thusi & Maduku, 2020; Khan, Rana & Hosen, 2022). Additional research investigated the impact of app interface attributes on user ratings (Pal Kapoor et al., 2020) and analyzed developer responses to user feedback (Alismail & Albesher, 2023). However, few studies have systematically examined user reviews to uncover usability breakdowns, spontaneous satisfaction trends, or sentiment differences across platforms.
Alhejji et al. (2022) examined the usability of Saudi mobile banking apps using ISO 9241 criteria, revealing major issues related to customer support. However, their approach relied on manual interpretation and limited usability dimensions. More recent studies have begun to leverage user reviews for analyzing mobile banking user experience, using techniques like sentiment analysis, topic modeling, and text mining. These include work on neo-banking platforms (Gupta & Srivastava, 2025), UX sentiment trends in Canadian banking apps (Amirkhalili & Wong, 2025), KH-coder-based thematic analysis in Indonesia (Sulistiyani, Handani & Nurchayati, 2024), and platform-specific perception shifts driven by customer reviews (Bateh & Klaus, 2023). Jamadar et al. (2024) further demonstrated how customer support, features, and app performance shape user sentiment. Despite these advances, there remains a lack of focused research in the Saudi context that systematically compares user sentiment across iOS and Android platforms. The present study fills this gap by providing a structured sentiment analysis of customer reviews for Saudi mobile banking apps, uncovering cross-platform UX trends and informing best practices for service improvement.
The present study investigated ten banking applications in Saudi Arabia. The sample includes Alrajhi Bank, Arab National Bank (ANB), Alinma Bank, Saudi Awwal Bank (SAB), Riyad Bank, Banque Saudi Fransi (BSF), Bank Albilad, Saudi National Bank (SNB), Arab National Bank (ANB), and the Saudi Investment Bank (SAIB). Inclusion of diverse bank apps reveals subtle differences in user experiences to provide beneficial insights for users, developers, and bank providers. The primary objectives of the research are:
-
(1)
To provide a clear comparison of user satisfaction levels for mobile banking apps in Saudi Arabia across two platforms, iOS and Android.
-
(2)
To analyze sentiment trends in user feedback.
-
(3)
To explore differences in user expectations between iOS and Android users during 2022 and 2023.
-
(4)
To identify and highlight the major issues observed within the tested apps.
-
(5)
To provide actionable recommendations for bank providers alongside app developers to improve the overall user experience in mobile banking apps.
This research will benefit multiple stakeholders including banking institutions, app developers, and customers in Saudi Arabia. Banks can utilize the findings to enhance their mobile apps, thereby improving user satisfaction and increasing customer loyalty. App developers can gain insights into user preferences and pain points, leading to better design and functionality. Consumers will ultimately benefit from improved mobile banking experiences that address their needs and expectations.
The article is structured as follows: ‘Background’ provides the background of the study which covers the ISO 9241-11 usability standard and the Retrieval-Augmented Generation (RAG) framework. ‘Literature Review’ reviews the related literature by highlighting previous research findings and their implications. ‘Methodology’ details the methodology employed for data collection and analysis. ‘Results’ presents the results of the sentiment analysis and usability assessment for both iOS and Android apps. ‘Discussion’ discusses the implications of the findings. ‘Limitations and Future Work’ identifies the limitations and provides suggestions for future work. ‘Recommendations’ provides some recommendations to improve the user experience for the apps. Finally, ‘Conclusions’ concludes the research and highlights some insights for future studies in the field.
Background
The ISO 9241-11 usability standard
In our study, we applied ISO 9241-11 to identify the usability issues associated with each app. According to ISO 9241-11, usability is defined as the degree to which specified users can achieve specified goals effectively, efficiently, and with satisfaction in a given context of use. In this framework, effectiveness concerns how accurately and completely users reach their goals while limiting errors or other adverse outcomes; efficiency relates to the resources required, such as time, effort, cost, and materials, to finish a task; satisfaction captures user comfort plus positive attitudes during interaction (ISO 9241-11, 2018; Bevan, Carter & Harker, 2015). The ISO 9241-11 usability standard has been widely recognized and applied as a leading framework for evaluating the usability of mobile apps across multiple studies (Alhejji et al., 2022; Alsanousi et al., 2023; Alghareeb, Albesher & Asif, 2023).
Retrieval-augmented generation
Retrieval-augmented generation (RAG) has been applied to many tasks such as user review queries, scientific literature search, and open-domain question answering (Byun et al., 2024). For user review queries, RAG can yield more accurate results by retrieving relevant documents from an extensive collection of user reviews (Dong et al., 2025). For example, a RAG system can be designed to answer queries on user reviews such as “What are the most common usability issues?” (Byun et al., 2024). RAG techniques can be effective for user reviews queries because they enhance contextual understanding by grounding responses in relevant review evidence (Vizniuk et al., 2025). It can also handle noisy queries, which is common in real-world user reviews. For instance, user queries may contain spelling errors, ambiguities, or multiple intents, which can make it challenging for traditional language models to accurately answer the querie (Dong et al., 2025). Overall, RAG has the potential to transform the analysis of user reviews by enabling large-scale evaluation that yields accurate insights and exposes fine-grained issues in mobile apps.
Literature review
Mobile banking apps have become a central tool for delivering financial services in the digital era. As the use of smartphones continues to rise, research has increasingly focused on evaluating mobile banking apps in terms of usability, user satisfaction, security, and performance. Many studies have investigated factors influencing user adoption and continued usage, including trust, ease of use, perceived usefulness, and app design. While several approaches have relied on surveys and conceptual models, fewer studies have examined actual user feedback collected from app stores. This section reviews recent research that focuses specifically on mobile banking apps, highlighting key findings related to user experience, service quality, and evaluation methods, which form the basis for our comparative analysis of user reviews in the Saudi context.
Shahid et al. (2022) investigated consumer experience (CX) in mobile banking apps using the Stimulus–Organism–Response (S–O–R) framework. They identified convenience, trust, and social influence as key drivers of a positive user experience with the app, while finding that app attributes had limited impact. Although the study contributes to understanding post-adoption behavior from actual users, it treats app design features as high-level constructs and does not systematically assess interface-level usability or interaction quality. Moreover, it does not differentiate between types of usability breakdowns or compare usability across multiple apps.
Zhou et al. (2021) focused explicitly on mobile banking apps by examining how service quality dimensions such as interface design, system quality, and perceived security influence user loyalty. Using SEM analysis, they demonstrated that app interface design indirectly affects loyalty intention by shaping perceptions of security and service quality. While their model offers valuable insights into user attitudes toward app-based banking services, it remains centered on a single app and addresses usability only at a conceptual level, without evaluating specific interaction design issues or task-level usability challenges.
Ubam, Hipiny & Ujir (2021) conducted a focused usability study of a mobile banking app tailored to senior citizens in Sarawak, Malaysia. Their work combined heuristic evaluation with user testing to identify age-related design issues, including small touch targets, unclear iconography, and insufficient guidance. By addressing accessibility needs in a specific demographic, the study contributes valuable insights to inclusive UI/UX design. However, its narrow scope, a single prototype app for one user group, limits generalizability. Additionally, the evaluation is geared toward redesign, with less emphasis on cross-app usability patterns or comparative interface analysis across existing commercial platforms.
Saprikis, Avlogiaris & Katarachia (2022) examined mobile banking app adoption by comparing behavioral factors influencing adopters vs. non-adopters, using an extended UTAUT model with additional constructs such as reward, anxiety, and recommendation. Their use of SEM provides a strong empirical basis to identify psychological and contextual influences on app uptake. However, the study’s scope is limited to pre-adoption perceptions and does not evaluate the usability of app interfaces or user experience post-adoption. As such, while it contributes to understanding motivational factors behind app adoption, it offers no insight into design quality, interface-level usability, or interaction challenges faced by users during actual app use.
Ivanova & Kim (2022) investigated the adoption and use of mobile banking apps among university students in Central Asia using a modified Unified Theory of Acceptance and Use of Technology (UTAUT) framework enriched with constructs like perceived trust, security, and risk. The study used SEM to validate relationships among behavioral, psychological, and contextual factors influencing app usage. While it provides empirical evidence on the acceptance of mobile banking technology in a developing economy context, it remains limited to survey-based self-reports and does not examine actual app usability, interface interactions, or design-related barriers. As such, it sheds light on adoption intent but lacks insight into task-level usability challenges that affect real-world usage.
Thusi & Maduku (2020) developed an integrated model combining UTAUT2, institution-based trust, and multidimensional perceived risk to examine factors influencing mobile banking app acceptance in South Africa. Their use of PLS-SEM provides strong explanatory power for intention and behavior constructs. However, the study is limited to adoption drivers and does not investigate actual user experiences or issues reflected in real-world app usage. It offers theoretical insight into behavioral intention but not into usability or satisfaction as expressed in actual user feedback or app reviews.
Lopes, Façanha & Viana (2022) conducted a focused accessibility evaluation of mobile banking apps in Brazil, combining automated testing (Accessibility Scanner) with manual inspection using TalkBack and a national accessibility guideline. Their findings reveal widespread violations related to navigation, labeling, and interaction feedback, highlighting critical barriers for blind users. While the study contributes significantly to the literature on inclusive design, it is centered on technical accessibility compliance, not on user-perceived usability or experiential feedback typically found in app store reviews.
De Leon, Atienza & Susilo (2020) investigated the post-adoption service quality of mobile banking apps using the Self-Service Technology service quality (SSTQUAL) framework, which includes seven dimensions: functionality, enjoyment, security/privacy, assurance, design, convenience, and customization. Their structural equation modeling analysis showed that service quality significantly affects both perceived value and customer satisfaction, emphasizing the importance of multi-dimensional app design. While this study advances understanding of how users perceive service quality after adoption, it relies solely on structured survey responses rather than real-world user feedback or experiential data drawn from app store reviews, which limits insight into spontaneous usability issues and pain points.
Poromatikul et al. (2019) investigated continuance intention in mobile banking app usage using the European Customer Satisfaction Index (ECSI) framework, with data from Thai users. They identified satisfaction, trust, and expectancy confirmation as key drivers, and revealed latent user segments influenced differently by performance vs. trust-related factors. While this study contributes to understanding long-term user retention and segmentation, it is grounded in perceptual survey data and does not assess actual interaction experiences or usability concerns as expressed in organic user reviews.
Majumdar & Pujari (2022) explored mobile banking app usage in the UAE, applying the Technology Acceptance Model (TAM) and a categorical regression approach to differentiate user behavior by level of app usage. The study identified perceived usefulness and information availability as the main drivers of adoption. A notable contribution lies in the segmentation of users based on actual usage levels, rather than the traditional user/non-user dichotomy. However, the study relies solely on structured survey data, offering limited insight into users’ post-adoption experiences or organic usability concerns reflected in app store reviews.
Khan, Rana & Hosen (2022) examined how trustworthiness, measured through ability, benevolence, and integrity, influences mobile banking app usage among users in Dhaka, Bangladesh. Using structural equation modeling, the study confirmed a positive relationship between these trust components and app usage. While the research highlights important psychological drivers of adoption, it relies on self-reported perceptions and does not assess interface design, usability, or post-adoption satisfaction based on real user interactions. Thus, it contributes to understanding user sentiment, but not to evaluating practical experience or app performance.
Pal Kapoor et al. (2020) investigated the impact of mobile banking app interface attributes on user ratings in Indian app stores, using a mixed-methods approach. They identified six key factors: login time, visual design, navigational design, information design, collaboration, and service quality that significantly influenced app ratings. Unlike traditional adoption studies, their focus was on post-adoption behavior using behavioral (not self-reported) data from rating scores, providing practical insights for improving app perception in competitive markets. However, the study did not assess actual user experience during interactions, nor did it include sentiment or contextual review content.
Borowski-Beszta & Kiermas (2019) conducted a large-scale quantitative study on the adoption and usage patterns of mobile banking apps in Poland, using a Computer-Assisted Web Interview (CAWI) survey of 1,012 internet users. The findings revealed that more than half of respondents had a mobile banking app installed, with three-quarters using it regularly, primarily for checking balances and making transfers. The study also noted a high satisfaction rate and a strong upward trend in adoption from 2014 to 2018. While informative from a market penetration and user behavior standpoint, the study did not analyze motivational, usability, or trust-related factors, making it descriptive rather than explanatory in addressing why or how certain features drive usage.
Roy & Shaw (2023) proposed a fuzzy multi-criteria decision-making (MCDM) framework for evaluating mobile banking apps, addressing the growing complexity in app selection due to the proliferation of options in the market. Drawing on expert input, they applied the fuzzy best–worst method (fuzzy-BWM) to weight the evaluation criteria and fuzzy-TOPSIS to rank app alternatives. Their empirical results identified performance quality, functionality, and clarity as the most influential attributes in app selection. While their model contributes a structured decision-support approach for stakeholders, the study does not assess actual user experience or post-adoption usability, limiting its insights into user-centered factors that shape real-world perceptions of mobile banking apps.
Hanif & Lallie (2021) investigated the role of cybersecurity perceptions in shaping the intention to use mobile banking apps among UK users aged 55 and above, a group known for low adoption rates. Employing a modified UTAUT framework enriched with constructs from the Mobile TAM, the study incorporated perceived cybersecurity risk, trust, and overall security. Using a mixed-methods design and data from 191 participants, they found that performance expectancy and perceived cybersecurity risk significantly influenced intention to use, while effort expectancy, trust, and general security perceptions were not statistically significant. Their model explained 87% of the variance in usage intention, outperforming UTAUT and UTAUT2. While the study contributes a valuable lens on older adults’ perceptions, its age-specific focus and lack of direct usability or satisfaction metrics limit its broader generalizability to typical app store users or younger demographics.
Alhejji et al. (2022) evaluated the usability of mobile banking applications in Saudi Arabia by analyzing interface design, content clarity, and performance consistency across apps using the ISO 9241 standard. Their evaluation focused on measuring effectiveness, efficiency, and user satisfaction via heuristic assessment and user reviews. While their study offers insights into usability from a structural and design perspective, it did not explore user sentiment trends, platform-specific feedback, or temporal shifts in user satisfaction. Our study complements this work by employing a large-scale sentiment analysis approach to capture user perceptions directly from app reviews, offering quantified sentiment insights across platforms and over time.
Alismail & Albesher (2023) analyzed developer responses to app reviews for mobile banking apps in both Saudi Arabia and the United States, using content analysis of over 80,000 responses gathered from Google Play and the Apple App Store. They categorized responses into three interaction types—interactive, semi-interactive, and no response—and found significantly stronger engagement from U.S. developers in terms of both quantity and response quality. While this research importantly examines user–developer dynamics within actual app store ecosystems, it focuses on developer behavior rather than the content of reviews themselves or usability issues experienced by users.
Recent research has shifted toward leveraging large-scale user-generated content to better understand user experience in mobile banking. Jamadar et al. (2024) applied sentiment analysis and topic modeling on over 3,000 Indian mobile banking app reviews to uncover drivers of positive and negative sentiment, such as app responsiveness and customer support. While the study effectively extracts recurring UX concerns, it is limited to Android platforms and lacks cross-market comparison, making its findings context-specific and less generalizable across platforms or regions.
Sulistiyani, Handani & Nurchayati (2024) employed KH-Coder to analyze 5,720 user reviews from Indonesia, identifying both complaints (e.g., bugs and slowness) and satisfaction signals (e.g., feature richness, ease of use). The study offers a structured, data-driven perspective on UX factors but is narrowly scoped to a single national context and does not systematically compare multiple apps or platforms. Additionally, it does not perform sentiment classification or link review content to quantitative outcomes.
Bateh & Klaus (2023) conducted a survey-based experiment to investigate how exposure to online reviews affects users’ perceived experience with a mobile banking app. Their findings indicate that reviews shift perceptions related to structured assurance but not perceived simplicity or usefulness. While the work provides novel insight into the influence of reviews on perception formation, it does not analyze actual review content or sentiment patterns, limiting its applicability for understanding real-time user concerns at scale.
Amirkhalili & Wong (2025) focused on Canadian mobile banking apps and qualitatively examined usability aspects such as accessibility, navigation, and trust. Their work provides regional design insights and highlights user concerns related to security and service clarity. However, it is based on a small number of apps and lacks a sentiment-based or large-scale analytical framework, making its findings less comprehensive compared to data-driven studies.
Gupta & Srivastava (2025) evaluated neo-banking UX by comparing user feedback across Android and iOS platforms. Their study highlights platform-specific UX differences and identifies gaps in design consistency and trust-building mechanisms. While this research contributes to the growing body of cross-platform usability evaluation studies, it focuses exclusively on neo-banks and does not examine sentiment progression over time or the impact of bank type and national context.
These studies collectively signal a growing interest in using online reviews to assess mobile banking app quality. However, most are limited in scale, geography, or analytic scope. The current research advances this line of inquiry by applying large-scale sentiment analysis across both iOS and Android platforms in the Saudi banking context, offering a comparative and quantifiable assessment of user satisfaction, usability breakdowns, and platform-specific sentiment trends.
Methodology
To compare user experiences with mobile banking apps in Saudi Arabia, our approach, illustrated in Fig. 1, begins with step A, wherein we selected ten widely used apps: Albilad App, Alinma Bank, AlJazira SMART, Alrajhi Bank, ANB, Fransi Mobile, Riyad Bank Mobile, SAB Mobile, SAIB, and SNB Mobile (Alhejji et al., 2022). These apps were chosen from both major platforms, namely Android and iOS. In step B, we crawled all available data in multiple languages from January 2022 to December 2023 using a Python script. The script leveraged two publicly available Python libraries—google-play-scraper (Google, 2024) for Google Play and app-store-scraper (App-Store-Scraper, 2020) for the App Store—accessible through their respective application programming interfaces (APIs). We collected 15,526 user reviews for iOS and 248,395 user reviews for Android. Subsequently, in step C, we applied automated preprocessing techniques leveraging the Natural Language Toolkit (NLTK) library to yield meaningful content and prepare the collected data for the analysis model by removing noise (e.g., emoji symbols, numbers, duplicates, and empty reviews (Alsanousi et al., 2023; Alsanousi, Ludi & Do, 2024)). After the cleaning process, the dataset comprised 15,459 reviews for iOS and 230,734 for Android. Table 1 lists the app names along with the corresponding number of user reviews for both stores.
Figure 1: Methodology overview.
Image/icon credit: Reshot (https://www.reshot.com/license/).| App name | iOS reviews | Android reviews | Total reviews |
|---|---|---|---|
| Albilad App | 237 | 1,769 | 2,006 |
| Alinma Bank | 746 | 6,256 | 7,002 |
| AlJazira SMART | 92 | 1,222 | 1,314 |
| Alrajhi Bank | 9,524 | 185,403 | 194,927 |
| ANB—Arab National Bank | 654 | 1,218 | 2,172 |
| Fransi Mobile | 255 | 957 | 1,212 |
| Riyad Bank Mobile | 537 | 5,033 | 5,570 |
| SAB Mobile | 1,264 | 4,473 | 5,737 |
| SAIB | 246 | 1,296 | 1,542 |
| SNB Mobile | 1,904 | 23,107 | 25,011 |
| Total | 15,459 | 230,734 | 246,493 |
In step D, we used the “twitter-roberta-base-sentiment-latest” model (Carfiffnlp, 2022) for an automated analysis of user sentiment. This model, pre-trained on approximately 124 million tweets, was accessed through TweetNLP. It classified the user reviews into three categories: positive, negative, or neutral. The model achieved approximately 71.3% accuracy on the TweetEval sentiment benchmark (Carfiffnlp, 2022) and has proven effective across multiple peer-reviewed studies (Alhadlaq & Alnuaim, 2023; Patel et al., 2023; Schmidt et al., 2023). We further validate the model’s predictions; three PhD researchers in Computer Science manually annotated the sentiment (positive, negative, or neutral) of 30 user reviews, which were randomly selected from the complete dataset. The outcomes showed substantial agreement between the model’s predictions and expert annotations (Cohen’s κ = 0.748), indicating strong reliability in identifying sentiment patterns. We integrated the model into our approach using a Python script. Since the sentiment analysis model supported only English text, we translated non-English reviews into English. To do this, we used GPT-3.5 Turbo by OpenAI (OpenAI, 2023), which yielded promising results for translation, particularly for Arabic (Al-Khalifa, Al-Khalefah & Haroon, 2024). We accessed the GPT-3.5 Turbo through an API that automatically translated all non-English reviews based on a simple prompt, namely “Translate the following text to English,” (Liu et al., 2023) before passing them to the sentiment analysis model for classification. To avoid bias in translation, we sampled 10 non-English user reviews that had already been translated using our proposed approach. Then, two reviewers compared each translation user review with its original version to ensure the meaning, tone, and context were correctly preserved. We found that each translated review remained accurate. After the text was translated and classified, After classifying the text, we calculated the percentage of each category by dividing the sentiment count for each category by the total number of user reviews.
Then, in step E, we evaluated the usability issues of each app by concentrating exclusively on reviews with negative sentiment, as such feedback often reveals app-related problems (Byun et al., 2024). We then applied an automated classifier achieving 96% accuracy to label each review with one or more ISO 9241-11 usability factors—satisfaction, effectiveness, and efficiency. The tool assigns a value of 1 when a usability issue is detected and 0 otherwise (Carfiffnlp, 2022). Then, in step E, we evaluated the usability issues of each app by concentrating exclusively on reviews with negative sentiment, as such feedback often reveals app-related problems (Byun et al., 2024). We then applied an automated classifier achieving 96% accuracy to label each review with one or more ISO 9241-11 usability factors—satisfaction, effectiveness, and efficiency. The tool assigns a value of 1 when a usability issue is detected and 0 otherwise (Carfiffnlp, 2022). To validate the model’s predictions, three PhD researchers in Computer Science manually annotated the usability categorization—satisfaction, effectiveness and efficiency—of 30 negative user reviews that we randomly selected from the complete dataset. The results demonstrated strong agreement between the model’s predictions and expert annotations (Cohen’s κ = 0.787), which proved high reliability in detecting usability issues. Our approach is in line with prior studies that have utilized ISO 9241-11 to analyze app-review usability problems (Alhejji et al., 2022; Alsanousi et al., 2023; Alghareeb, Albesher & Asif, 2023). For each app, we calculated the percentage for each usability factor by dividing the number of detected issues within that factor by the total number of user reviews (for example, the number of satisfaction issues divided by the total number of user reviews). Finally, we calculated an overall usability score for each app by combining the percentages of satisfaction, effectiveness, and efficiency, in line with prior studies (Alsanousi et al., 2023).
In step F, we employed Retrieval-Augmented Generation (RAG), which performs fresh retrieval for every query to systematically identify usability issues across the ten mobile banking apps on both Android and iOS platforms (Gao et al., 2024). In this step, only user reviews with negative sentiment were retained to highlight actionable problems (Wang et al., 2022). Then, the text was normalized and embedded using Sentence-Transformers (all-MiniLM-L6-v2) to prepare it for efficient similarity searches (HuggingFace, 2024). The embeddings were indexed in Facebook AI Similarity Search (FAISS) (Douze et al., 2024) using a Hierarchical Navigable Small World (HNSW) graph to support efficient cosine similarity searches (Malkov & Yashunin, 2018). We then executed a simple query by embedding the phrase ‘the most common issues’ to identify usability issues, and the system retrieved the top-K matches using a dynamic similarity threshold (Lewis et al., 2020). After that, we used a large language model (GPT-4o) to categorize the retrieved reviews into meaningful groups (Aracena et al., 2025) with a temperature setting of 0.7, while preserving review identifiers to ensure traceability. Finally, we visualized the distribution of these issues across apps using heatmaps, which allowed us to identify dominant problem areas at a glance.
To demonstrate the feasibility of the proposed RAG-based approach, we executed a query focusing on login-related issues within the Fransi Mobile application. Our approach retrieved the most frequently mentioned reviews that were directly relevant to login problems and categorized them into meaningful issue types. For example, one review stated, ‘There is a problem with your program! It refuses to log in even though the information is correct! Fix this,’ categorized under Login Failures (Review ID: 52b1822a-fa89-4c62-a235-904f8e334153). Another review noted, ‘I try all the ways to log in but always receive a slow response and cannot log in. Please fix this bug as soon as possible,’ categorized under Slow Login (Review ID: 9dd99090-031e-4df7-b1a1-7bade9c55cf4). This case study demonstrated that our approach can extract and categorize domain-specific usability concerns in a traceable, even without large-scale evaluation.
Results
The results section is organized to provide a clear analysis of user feedback on mobile banking apps in Saudi Arabia, differentiated by year and platform to provide insights into user experiences. This section begins by providing a comprehensive overview of iOS apps for the period between 2022 and 2023. Subsequently, the analysis shifts to Android apps for the same period. Each platform’s results are further delineated into individual years to enable a nuanced comparison of user sentiment.
iOS apps (2022 and 2023)
The analysis of iOS mobile banking app reviews in Saudi Arabia between 2022 and 2023 provided meaningful insights into user experience. Figure 2 highlights the overall positive sentiment expressed in the reviews, while Fig. 3 illustrates the overall negative sentiment. As presented in Fig. 2, the positive percentages range from a high of 79% to a low of 4%. The distribution of positive sentiment revealed that the Arab National Bank app stood out with the highest percentage of positive feedback, at 79%. This indicated a high level of satisfaction among its users. This is closely followed by the SAB bank app, which reported positive feedback of 72%. The Alrajhi Bank app also demonstrated a commendable positive rating of 70%. The Alinma and Riyad bank apps fell in the middle, with positive percentages of 57% and 45%, respectively. In contrast, the SNB and AlJazira apps exhibited lower positive feedback percentages of 10% and 13%, respectively. Lastly, the Fransi and SAIB apps had the lowest positive feedback at only 4%, followed by the Albilad app at 5%. These results highlight significant user experience challenges in the iOS apps, which recorded both the highest and the lowest levels of positive feedback.
Figure 2: Overall positive sentiment for banking apps on iOS.
Figure 3: Overall negative sentiment for banking apps on iOS.
The negative sentiment in Fig. 3 reveals that the Fransi app had the highest user dissatisfaction, with 94% of its feedback classified as negative. The SAIB, Albilad, and SNB apps also recorded notable negative sentiments, at 85%, 80%, and 78%, respectively. This is followed by Aljazira at 68%, Riyad at 37%, Alinma at 30%, and both Alrajhi and SAB at 15%. Overall, the comparison highlights a wide disparity in negative sentiment across Saudi mobile banking apps and reveals challenges in user experience.
We also conducted further analysis using the ISO 9241-11 usability standard to identify the most frequent usability issues across the 10 Android banking apps. Table 2 presents the usability issues related to the three ISO 9241-11 factors—satisfaction, effectiveness, and efficiency—together with the percentage distribution for each factor. It also reports the composite usability score, which reveals clear differences in usability performance across the apps. Fransi Mobile and SAIB recorded the highest usability scores, at 145.5% and 138.6% respectively. These findings indicated that the highest scores arose from persistent dissatisfaction—59% for both apps—coupled with considerable effectiveness issues for the Fransi 46% and SAIB 49% apps; moreover, increased efficiency problems for Fransi 40% and SAIB 30% further intensified the overall usability concerns. This evaluation revealed that the apps exhibited significant usability challenges, as users were dissatisfied, tasks frequently failed, as users often could not complete their intended actions successfully, and app interactions required noticeable time and effort, thereby revealing weaknesses across effectiveness, efficiency, and satisfaction.
| App name | Satisfaction | % | Effectiveness | % | Efficiency | % | Usability score % |
|---|---|---|---|---|---|---|---|
| Albilad App | 111 | 47 | 124 | 52 | 36 | 15 | 114.3 |
| Alinma Bank | 129 | 17 | 155 | 21 | 54 | 7 | 45.3 |
| AlJazira SMART | 27 | 29 | 39 | 42 | 14 | 15 | 87 |
| Alrajhi Bank | 511 | 5 | 572 | 6 | 146 | 2 | 12.9 |
| ANB—Arab National Bank | 24 | 4 | 43 | 7 | 13 | 2 | 12.2 |
| Fransi Mobile | 151 | 59 | 118 | 46 | 102 | 40 | 145.5 |
| Riyad Bank Mobile | 117 | 22 | 134 | 25 | 30 | 6 | 52.3 |
| SAB Mobile | 96 | 8 | 119 | 9 | 43 | 3 | 20.4 |
| SAIB | 146 | 59 | 120 | 49 | 75 | 30 | 138.6 |
| SNB Mobile | 957 | 50 | 936 | 49 | 266 | 14 | 113.4 |
Further, we identified the most frequent issues using RAG with GPT-4.0. The results are presented as heatmaps in Fig. 4, which illustrate the usability problems of iOS mobile banking applications. Across the 10 iOS apps, performance issues stood out as the most persistent concern. Crashes occurred across all 10 apps, thus indicating notable instability. The most severe slowness appeared in Alrajhi, ANB, Riyad, and SAIB apps, which also faced the most frequent connectivity failures. In addition, UI and UX issues were evident across multiple apps and were most intense in the Alinma, ANB, and SNB apps. In addition, customer support shortcomings repeatedly emerged in Riyad, SAB, and SNB apps. Access and authentication difficulties surfaced in the Alrajhi and Fransimobile apps; these were also found in the SNB app, although at a lower level of severity. Feature limitations were most evident in the Fransimobile and SAB apps, while update and maintenance complaints remained relatively minor. Overall, reliability, stability, and responsiveness represented the greatest usability concerns while interface design quality and support services added further frustration for users.
Figure 4: The most common usability issues of iOS mobile banking apps.
Android Apps (2022 and 2023)
The analysis of user reviews for Android mobile banking apps in Saudi Arabia for the period between 2022 and 2023 provided important insights into user experience. Figure 5 illustrates the overall positive sentiment toward Android banking apps, while Fig. 6 depicts the overall negative sentiment. Figure 5 presents the distribution of positive sentiment, with the Alrajhi Bank app leading at 77% favorable reviews, followed by the Alinma Bank app at 67% and the Aljazira Bank app at 64%. Other notable apps included SAB and Riyad Bank, which scored 55% and 53%, respectively. The SAIB app received the lowest positive score at 11%, followed by the Fransi Bank app at 26%, ANB app at 28%, Albilad app at 37%, and SNB app at 39%. Overall, the Alrajhi app led in positive review metrics, thus reflecting the highest level of user satisfaction among all bank apps, whereas the SAIB app recorded the lowest level of user satisfaction.
Figure 5: Overall positive sentiment for banking apps on Android.
Figure 6: Overall negative sentiment for banking apps on Android.
In contrast, Fig. 6 displays the distribution of negative sentiment across the evaluated apps, with the SAIB app showing the highest level of negative sentiment at 81%. The Fransi and ANB banks’ apps also showed significant dissatisfaction at 65% and 59%, respectively. The Albilad and SNB apps followed with negative rates of 52% and 50%. The SAB and Riyad apps showed moderate negative feedback at 37% and 38%, respectively. The Alinma Bank and Aljazira SMART apps both presented lower negative sentiment rates of 25%. Finally, Alrajhi Bank achieved a notably low negative review percentage of only 16%, distinguishing it from its competitors by a substantial margin. This dual analysis emphasized the diverse user experiences across Android banking apps and highlighted both strengths and areas for improvement within Saudi Arabia’s mobile banking sector.
Further, we conducted additional analysis to identify usability issues within each of the 10 Android banking apps. Table 3 presents the factor-level percentages for satisfaction, effectiveness, and efficiency, along with a composite usability score that distinguishes overall performance. The results revealed two apps with the weakest composite usability score: SAIB recorded the highest usability score at 126.5%, followed by Fransi Mobile at 94.9%. The analysis also revealed that SAIB’s app exhibited rather high dissatisfaction at 54%, significant effectiveness issues at 48%, and the highest efficiency level at 25%. Moreover, Fransi’s app exhibited elevated dissatisfaction at 38% and effectiveness problems at 38%, with efficiency issues also evident at 20%. ANB 82.1% was mainly affected by high effectiveness issues at 43% and dissatisfaction at 32%. The Albilad app also showed notable issues with satisfaction at 73%, with effectiveness issues at 35% and dissatisfaction at 31% as the main problems, while efficiency remained low at 7%. The results indicated that SNB 69.6% had a balanced but elevated distribution across dissatisfaction at 30% and effectiveness at 29%, with lower efficiency at 10%. Apps in the middle usability issue score range—Riyad 53.2% and SAB 52.6%—had relatively low efficiency at 4–7%, but still showed notable levels of effectiveness and dissatisfaction, with Riyad at 26% and 23% and SAB at 24% and 21%, respectively. The AlJazira SMART 35.4% and Alinma 33.5% apps fell within the lower usability issue score range and performed more modestly across all factors, with effectiveness and dissatisfaction typically between 15% and 17% and efficiency remaining low at 3–4%. At the leading edge, Alrajhi 21.4% achieved the lowest composite usability issue score, with effectiveness at 9%, satisfaction at 9%, and efficiency at 3%, all of which were minimal levels. This evaluation revealed that the apps faced considerable usability issues, as users expressed dissatisfaction, tasks that frequently failed or appeared unreliable, and interactions that demanded noticeable time and effort, thereby highlighting weaknesses in effectiveness, efficiency, and satisfaction.
| App name | Satisfaction | % | Effectiveness | % | Efficiency | % | Usability score % |
|---|---|---|---|---|---|---|---|
| Albilad App | 548 | 31 | 613 | 35 | 130 | 7 | 73 |
| Alinma Bank | 780 | 12 | 1,064 | 17 | 254 | 4 | 33.5 |
| AlJazira SMART | 180 | 15 | 210 | 17 | 42 | 3 | 35.4 |
| Alrajhi Bank | 16,880 | 9 | 17,085 | 9 | 5,761 | 3 | 21.4 |
| ANB—Arab National Bank | 387 | 32 | 520 | 43 | 93 | 8 | 82.1 |
| Fransi Mobile | 359 | 38 | 359 | 38 | 190 | 20 | 94.9 |
| Riyad Bank Mobile | 1,153 | 23 | 1,312 | 26 | 215 | 4 | 53.2 |
| SAB Mobile | 936 | 21 | 1,092 | 24 | 325 | 7 | 52.6 |
| SAIB | 695 | 54 | 616 | 48 | 328 | 25 | 126.5 |
| SNB Mobile | 7,038 | 30 | 6,657 | 29 | 2,377 | 10 | 69.6 |
Additionally, we identified the most frequent issues using RAG with GPT-4.0; the results are visualized as heatmaps in Fig. 7 and highlight the usability challenges of Android mobile banking apps. Across the 10 Android banking apps, crashes and instability emerged as the clearest system-wide pain point. Severity was high in almost every app and the peak appeared in the SAB app. Moreover, access and authentication problems appeared pervasively at a moderate level across the apps, with notable presence in the ANB and SNB apps. Moreover, UI and UX complaints appeared widely at a moderate level, with the ANB app showing a clear spike and the Alrajhi app showing noticeably lower intensity, which suggested uneven design quality. Performance slowness was concentrated in SAB, thereby indicating a local issue rather than an industry-wide latency problem. Feature and functionality gaps were most visible in the ANB and Fransimobile apps; transaction and operational failures stood out in the Riyad app; updates and maintenance were notable only in the Alrajhi app; customer support pressure was central in the Fransimobile app, with lighter signals in the Alinma and SAB apps; comparisons and benchmarking remained minor issues in the SAIB app; and general dissatisfaction appeared only in the SNB app.
Figure 7: The most common usability issues of Android mobile banking apps.
iOS 2022 vs. iOS 2023
The analysis of iOS mobile banking app user reviews in Saudi Arabia during 2022 and 2023 revealed notable changes that provide important insights into user experience. Figure 8 provides an overview of the sentiment analysis of iOS user reviews for Saudi mobile banking apps in 2022, while Fig. 9 depicts the sentiment analysis for 2023. A comparison of user experiences between these two years revealed notable shifts in sentiment. A chi-square test was performed to examine whether there was a significant variation in the distribution of sentiment categories (positive, neutral, and negative) across all 10 Android banking. The results indicated a highly statistically significant difference— (18, N = 4,321.36) = 4,321.36, p < 0.001, Cramér’s V = 0.427. The outcome confirmed that the sentiment distribution changed meaningfully over time and across apps—with the overall weighted normalized sentiment (inverse-N) being positive in 18.73% of the cases, neutral in 13.99% of the cases, and negative in 67.28% of the cases. The ANB Bank app showed a significant improvement, as its positive sentiment rose from approximately 17% to almost 79%, which placed it among the top performing apps. The SAB and Alrajhi apps maintained high levels of positive reviews in both years. The Riyadh app revealed a modest improvement in positive sentiment as it rose from 43% to 51%, which may reflect gradual progress in addressing user concerns. AlJazira also recorded a modest improvement when comparing the 2 years. In contrast, the Alinma app showed slight reductions in positive sentiment. Moreover, SNB, Albilad, SAIB, and Fransi Mobile—ranked descending order—consistently recorded low levels of positive feedback across both years, thus placing them among the least favored applications.
Figure 8: Sentiment analysis of iOS user reviews in 2022.
Figure 9: Sentiment analysis of iOS user reviews in 2023.
At the top of the best-performing apps, the percentage of negative reviews for the ANB app dropped sharply from 70% to 1%, which reflected a major improvement in user satisfaction. In addition, there was a slight increase in the negative reviews for the Fransi app, as the percentage moved from 93% to 95%. In contrast, the SAB app witnessed a decline in negative sentiment, decreasing from 17% to 12%. Similarly, the SAIB app experienced a modest decline in negative sentiment from 85% to 83%. In contrast, the SNB app faced an increase in negative reviews from 76% to 80%. The Alrajhi, Alinma Bank, Riyad Bank, and AlJazira SMART apps also experienced a rise in negative sentiment compared to 2022, thus reflecting a substantial decline in user satisfaction. At the same time, the SNB, Albilad, SAIB and Fransi Mobile apps showed consistently high levels of negative reviews across both years. Further, the analysis of iOS user reviews from 2022 and 2023 revealed several improvements in certain apps, while others either declined or remained at similar levels. The ANB app demonstrated a significant reduction in negative sentiment, which corresponded with a rise in positive feedback.
Android 2022 vs. 2023
The analysis of Android mobile banking app user reviews in Saudi Arabia for the period between 2022 and 2023 revealed key differences between the apps and across the 2 years, thereby providing insight into the overall user experience. Figure 10 provides a detailed breakdown for the year 2022, and Fig. 11 provides a succinct overview for the year 2023. Al Rajhi Bank app recorded a slight decrease in positive reviews from 65% to 61% but remained among the top-performing apps in both years. Moreover, SNB Mobile app’s positive reviews revealed a slight improvement, as the percentage increased from 36% to 47%. The AlJazira SMART app also witnessed a decrease in positive reviews from 70% to 52%, while the reviews for the Riyad Bank app dropped from 63% to 38%. The reviews for SAB mobile app declined from 65% to 38%, while those for the Albilad app fell from 47% to 21%. The positive reviews for Fransi Mobile fell from 34% to 13%, while ANB recorded a decrease from 31% to 25%. The worst-performing bank app over the two years, the SAIB app, witnessed its positive reviews drop from 12% to 9%. In contrast, SNB Mobile app’s positive reviews revealed a slight improvement from 36% to 47%.
Figure 10: Sentiment analysis of Android user reviews in 2022.
Figure 11: Sentiment analysis of Android user reviews in 2023.
The trend in negative reviews for Saudi mobile banking apps on Android from 2022 to 2023 reveals a notable increase in dissatisfaction among users. The percentage of Al Rajhi Bank app’s negative reviews rose from 24% to 31%, while Alinma Bank app’s negative reviews increased significantly from 15% to 56%. The AlJazira SMART app experienced a rise in satisfaction from 20% to 38%, and Riyad Bank app showed an increase from 28% to 53%. SAB Mobile’s negative reviews rose from 28% to 52%, and the Albilad app climbed from 42% to 71%. The SNB mobile app’s negative reviews declined sharply from 53% to 41%, while Fransi mobile app’s negative reviews increased from 54% to 82%. In addition, the satisfaction with the ANB app also recorded a rise from 55% to 62%. Lastly, SAIB’s negative reviews declined slightly from 80% to 82%. Overall, the analysis of Android user reviews during 2022 and 2023 revealed slight improvements in certain apps, while others either declined or remained at similar levels, which indicated a challenging landscape for user satisfaction in mobile banking apps during this period.
Discussion
The analysis of Saudi mobile banking apps across Android and iOS for the period between 2022 and 2023 revealed a few interesting findings. One finding revealed varying performance levels and user sentiment among the banks’ apps. On Android, the Alrajhi Bank app led with a notably strong positive sentiment, which indicates a high quality of user experience. The Alinma app also performed well and achieved significant positive feedback. Conversely, the Fransi Bank app and SAIB app struggled on both platforms because of having a high volume of negative sentiment. These findings align with those of previous research. For example, Alhejji et al. (2022) revealed that the apps of Alrajhi and Alinma Banks were on the top, while the apps of the Fransi and SAIB Banks were among those with the lowest ratings on both platforms.
Another finding of this study demonstrated that the ANB app ranked the best in performance on iOS. However, this bank’s app showed the opposite performance on Android because it had the highest negative percentage. This disparity can be attributed to one or more factors. The first factor could be the device fragmentation for Android apps. The wide range of screen sizes, hardware capabilities, and OS versions makes it challenging to optimize performance across all devices. In contrast, iOS operates on a limited number of Apple devices (Lamhaddab, Lachgar & Elbaamrani, 2019). Thus, Android’s fragmentation could have increased complexity and hindered ANB’s ability to avoid poor customer experience. The second factor could be the development complexity. Android apps typically require 30–40% more time to develop due to the need for compatibility across various devices (Lamhaddab, Lachgar & Elbaamrani, 2019). This complexity may have created implementation challenges for ANB’s Android version and negatively impacted interaction quality. The third factor could be the platform-specific UI and UX differences. The design strategies optimized for iOS may not have translated effectively to the Android platform. iOS apps generally follow strict Apple design guidelines, which promote a smoother user experience. In contrast, Android allows for greater customization, which can lead to inconsistencies that undesirably affect both user experience and performance (Kuusinen & Mikkonen, 2014).
Another finding of this study revealed that the SNB and SAIB apps ace serious difficulties due to high customer dissatisfaction on both iOS and Android platforms. Both banks must address the shortcomings in their apps. Studies indicate that update-related problems—such as app crashes, incompatibility with devices, and loss of key functionalities—are major concerns for users of Saudi banks (Alhejji et al., 2022). Research indicates that decreasing app crashes and improving response times can increase retention rates by up to 30% (Majumder, 2025). Additionally, Alhejji et al. (2022) mentioned the lack of customer support as a major factor affecting UX outcomes. The study proposed the live chat as a solution to better understand customer concerns regarding the app. This aligns with the findings of another study (Alismail & Albesher, 2023) that emphasized the essential role of developer responses in shaping user sentiment for mobile banking apps. Overall, this research demonstrates that ignoring user feedback increases negative user perceptions.
Broader implications and ethical considerations
Understanding the success of mobile banking apps in Saudi Arabia requires attention not only to technical usability but also to broader socio-cultural and ethical dimensions. Our findings, drawn from sentiment analysis of user reviews and expert evaluations, revealed patterns shaped by gendered expectations, language accessibility, and trust in platform behavior factors that are strongly embedded in the Saudi socio-cultural context. Prior research supports these observations. For example, Alnemer (2022) highlighted the significance of perceived ease of use and trust in determining Saudi users’ acceptance of digital banking, while Baabdullah et al. (2019) revealed how cultural values such as collectivism and power distance influence m-banking behaviors in the Kingdom.
Ethical concerns emerge in parallel, particularly as sentiment analysis and AI tools are increasingly used to process and interpret user-generated content. Although our methodology respected user privacy by analyzing only publicly available reviews, ethical data use extends beyond legal accessibility. In the Saudi context, the National Data Governance Interim Regulations highlight accountability, transparency, informed consent, and data minimization as key principles for protecting personal data (Saudi Data and AI Authority (SDAIA), 2020). These principles require that data processing be proportionate to its intended purpose, respect individual privacy, and remain subject to institutional oversight. Even publicly available content can raise privacy concerns when aggregated at scale, making responsible governance essential to maintain compliance and user trust. Recent scholarship on data ethics reinforces this perspective. Asgarinia et al. (2023) emphasize that effective anonymization, limited retention, and clear consent mechanisms are vital to preserve both privacy and public confidence in open data ecosystems. Similarly, Lichtenauer et al. (2023) argue that transparent data-sharing frameworks and ethical stewardship practices strengthen accountability and user trust in AI-driven analytics. Together, these findings support the view that privacy in sentiment analysis requires not only technical safeguards but also governance structures grounded in fairness, cultural sensitivity, and legal responsibility.
The broader implications of AI-led sentiment classification warrant discussion. Beyond privacy, two additional ethical aspects warrant attention: translation bias and AI interpretability. Recent evidence shows that sentiment analysis of Arabic reviews can be affected by translation and dialectal bias. Large language models trained primarily on Modern Standard Arabic (MSA) tend to misinterpret dialectal expressions, idioms, and culturally grounded sentiments, leading to polarity drift or loss of meaning (Al-Owais & Elnagar, 2025; Al-Monef et al., 2025). Alanazi (2024) further demonstrates that Arabic speakers are native in their regional dialects rather than in MSA, which behaves as a second linguistic system; this structural divergence often causes semantic distortion in translation tasks. Addressing this challenge requires dialect-balanced datasets, bilingual evaluation, and iterative validation on Saudi-Arabic corpora to ensure fairness and accuracy in model outputs.
Equally important is the issue of AI interpretability. Sentiment and retrieval-augmented models should provide transparent rationales for how results are generated. Explainable-AI methods such as SHAP, LIME, and attention visualization help reveal which linguistic or contextual features drive polarity predictions and enable human auditing of automated outcomes (Hassija et al., 2024). Embedding interpretability within analytical pipelines not only supports reproducibility and accountability but also aligns with the transparency and fairness principles of the Saudi National Data Governance Regulations (Saudi Data and AI Authority (SDAIA), 2020). Taherdoost & Madanchian (2023) emphasized how sentiment algorithms can misrepresent nuanced emotional cues when not trained on culturally grounded data. In the Saudi context, where language dialects and cultural expressions vary, ethical handling of sentiment data is crucial to avoid biases and misinterpretations.
Our findings also suggest implications for app developers and policy-makers in the Kingdom. To promote inclusivity, mobile banking interfaces must reflect local preferences for layout, security messaging, and language accessibility. For instance, observed complaints related to limited Arabic interface customization point to a need for greater localization. In line with Asgarinia et al. (2023) recommendations, developers must integrate AI personalization features responsibly balancing automation with transparency and fairness. Policymakers should further guide these efforts by ensuring that AI-driven financial tools comply with national data protection standards while addressing the diverse needs of Saudi users.
Limitations and future work
Although this work sheds light on critical user experiences facets of mobile banking apps in Saudi Arabia, a careful consideration of its limitations is necessary to fully understand the scope of our findings and point to open doors for future studies. First, we utilized scraper libraries that are publicly available for crawling large-scale user reviews. These scrapers offered a functional way to collect data but imposed constraints on collecting all the data (Khder, 2021). This resulted in fewer reviews for some apps and significantly more for others, which might have affected our results. For instance, iOS apps such as Fransi Mobile, SAIB, and Alazira SMART were limited to 92 to 255 reviews, while Alrajhi Bank and SNB Mobile accumulated thousands. On the Android platform, the situation differed markedly, with Alrajhi Bank and SNB Mobile garnering tens of thousands of reviews, whereas apps such as Fransi Mobile obtained fewer than a thousand. This imbalance may have influenced the reliability of cross-application comparisons. In addition, our findings were based on app store user reviews, which may not reveal the perceptions of all the app users’ base. To achieve a more comprehensive evaluation, future research should consider analyzing additional data sources beyond user review analysis of Saudi mobile banking apps, such as direct user interviews (Mahmoud et al., 2021), app performance analytics (Madhavan, 2024), and structured questionnaires (Rosala, 2024). Furthermore, the sentiment analysis model used in this study might introduced some false-positive classifications that did not reflect the exact user’s meaning due to the domain knowledge. To mitigate this concern, future works could possibly fine-tune or leveraging other advanced models that might train in the same domain (Ploscă, Curiac & Curiac, 2024). Additionally, as reviews in languages other than English were translated to ensure compatibility with the sentiment analysis tool, this may have introduced inaccurate outcomes that could have affected the original tone of the users’ feedback (Mohammad, Salameh & Kiritchenko, 2016; Shalunts, Backfried & Commeignes, 2016). To mitigate this limitation, we manually validated a subset of translations. Leveraging multilingual sentiment analysis tool (Anwar et al., 2024) alongside manual validation (Corizzo & Hafner, 2024) can mitigate the potential of inaccuracies cases. In the end, harnessing the power of AI for feedback analysis can uncover emerging challenges while enabling proactive improvements to enhance user satisfaction (Lee, Chakraborty & Banerjee, 2023).
Recommendations
Based on the analysis of UX with mobile banking in Saudi Arabia, the following points offer strategic guidance for bank providers to enhance user satisfaction:
-
(1)
Pay attention to user insights to detect and solve app obstacles.
-
(2)
Develop strategies for each platform by customizing the app to align with the specific expectations of users on each mobile app’s platform.
-
(3)
Regularly engage real users in testing sessions to uncover essential details about their interactions with the app.
-
(4)
Consistently review competitors’ apps to understand user expectations, identify service gaps, and drive innovation to stay ahead in the market.
-
(5)
Analyze sentiment changes over time, especially during major app updates and changes in banking regulations.
-
(6)
Involve real-time monitoring and response to user reviews to foster a more engaged customer base and improve overall user satisfaction.
Conclusions
This article presented a large-scale comparison of user sentiment in Saudi mobile banking apps across iOS and Android. Results show clear platform differences: Alrajhi earned the highest satisfaction, while Banque Saudi Fransi and SAIB scored lowest. Using ISO 9241-11, we identified recurring usability issues related to satisfaction, effectiveness, and efficiency. We also applied RAG with GPT-4o to pinpoint the most frequent problems for each app. Our results underscored the necessity of ongoing refinements in Saudi mobile banking to accommodate evolving user demands and contribute to Saudi Vision 2030’s objectives. Ongoing future investigations utilizing the ability of AI for feedback analysis, including user sentiment, will facilitate the rapid detection of the app’s emerging challenges.
Supplemental Information
Codebook.
Translations for non-English text in Google_Play.zip and and iOS.zip.










