Hybrid TF–IDF and user-based collaborative filtering for product recommendation in an offline Indonesian grocery store


Abstract

Most Indonesian micro retail grocery stores lack the digital infrastructure to capture and analyse customer purchase behaviour. As a result, customers rarely receive personalized product suggestions and many items remain under-exposed on the shelves. This problem is particularly evident in mom-and-pop stores such as Toko Solo Latri, where all interactions are recorded only as semi-digital transaction logs without explicit ratings or reviews, leading to sparse and implicit feedback that challenges traditional recommender algorithms. This study proposes a hybrid recommendation model tailored for small offline grocery retailers, combining Term Frequency–Inverse Document Frequency (TF–IDF)–based Content-Based Filtering (CBF) with User-Based Collaborative Filtering (UBCF) within the CRISP–DM framework. Product descriptions are constructed from name, brand, category, packaging, and price information and transformed into TF–IDF vectors to compute content similarity via cosine distance. Customer purchase histories are converted into user–item frequency matrices to estimate behavioural similarity between customers. To mitigate sparsity and improve stability, K-Means clustering is applied for customer segmentation. The outputs of CBF and UBCF are then integrated into a weighted hybrid scoring function. The model is evaluated on real transaction data from Toko Solo Latri comprising 102,735 transaction records, 320 products, and 200 customers. Performance is assessed using Precision@k, Recall@k, F1-Score@k, and NDCG@k under both global (80:20 train–test split) and per-user evaluation schemes. Despite the highly sparse and implicit nature of the data, the hybrid model exhibits stable ranking performance. In the global 80:20 evaluation, the system achieves Precision@5 = 0.1574, Recall@5 = 0.0103, F1-Score@5 = 0.0193, and NDCG@5 = 0.1835, with comparable trends in the per-user setting. While the absolute scores are modest, they are consistent with prior findings on low-density transactional datasets, and the hybrid approach outperforms pure CBF and pure UBCF in terms of ranking quality. These results demonstrate that combining content similarity with behavioural similarity offers a practical and deployable solution for micro–retail grocery recommendation under severe data sparsity and implicit feedback. For Indonesian UMKM undergoing digital transformation, the proposed TF–IDF-based hybrid model, implemented with lightweight tooling and a Flask-based web interface, provides a feasible path towards data-driven product personalization. Future work may explore deep learning, matrix factorization, or graph-based methods to further improve recommendation accuracy in similar low-resource retail settings.
Ask to review this manuscript

Notes for potential reviewers

  • Volunteering is not a guarantee that you will be asked to review. There are many reasons: reviewers must be qualified, there should be no conflicts of interest, a minimum of two reviewers have already accepted an invitation, etc.
  • This is NOT OPEN peer review. The review is single-blind, and all recommendations are sent privately to the Academic Editor handling the manuscript. All reviews are published and reviewers can choose to sign their reviews.
  • What happens after volunteering? It may be a few days before you receive an invitation to review with further instructions. You will need to accept the invitation to then become an official referee for the manuscript. If you do not receive an invitation it is for one of many possible reasons as noted above.

  • PeerJ Computer Science does not judge submissions based on subjective measures such as novelty, impact or degree of advance. Effectively, reviewers are asked to comment on whether or not the submission is scientifically and technically sound and therefore deserves to join the scientific literature. Our Peer Review criteria can be found on the "Editorial Criteria" page - reviewers are specifically asked to comment on 3 broad areas: "Basic Reporting", "Experimental Design" and "Validity of the Findings".
  • Reviewers are expected to comment in a timely, professional, and constructive manner.
  • Until the article is published, reviewers must regard all information relating to the submission as strictly confidential.
  • When submitting a review, reviewers are given the option to "sign" their review (i.e. to associate their name with their comments). Otherwise, all review comments remain anonymous.
  • All reviews of published articles are published. This includes manuscript files, peer review comments, author rebuttals and revised materials.
  • Each time a decision is made by the Academic Editor, each reviewer will receive a copy of the Decision Letter (which will include the comments of all reviewers).

If you have any questions about submitting your review, please email us at [email protected].