Enhancing healthcare data privacy and interoperability with federated learning

Adil Akhmetov; Zohaib Latif; Benjamin Tyler; Adnan Yazici

doi:10.7717/peerj-cs.2870

Enhancing healthcare data privacy and interoperability with federated learning

Adil Akhmetov, Zohaib Latif , Benjamin Tyler, Adnan Yazici

Department of Computer Science, Nazarbayev University, Astana, Kazakhstan

DOI: 10.7717/peerj-cs.2870

Published: 2025-05-08
Accepted: 2025-04-10
Received: 2024-12-03

Academic Editor: Ana Maguitman

Subject Areas: Artificial Intelligence, Cryptography, Data Mining and Machine Learning, Security and Privacy, Internet Of Things
Keywords: Artificial intelligence, Data interoperability, Federated learning, Internet of Things, Distributed & parallel computing, Emerging technologies, Scientific computing & simulation, Data science, Data mining & machine learning, Wearable sensors

Copyright: © 2025 Akhmetov et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits using, remixing, and building upon the work non-commercially, as long as it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

Cite this article: Akhmetov A, Latif Z, Tyler B, Yazici A. 2025. Enhancing healthcare data privacy and interoperability with federated learning. PeerJ Computer Science 11:e2870 https://doi.org/10.7717/peerj-cs.2870

The authors have chosen to make the review history of this article public.

Abstract

This article explores the application of federated learning (FL) with the Fast Healthcare Interoperability Resources (FHIR) protocol to address the underutilization of the huge volumes of healthcare data generated by the digital health revolution, especially those from wearable sensors, due to privacy concerns and interoperability challenges. Despite advances in electronic medical records, mobile health applications, and wearable sensors, current digital health cannot fully exploit these data due to the lack of data analysis and exchange between heterogeneous systems. To address this gap, we present a novel converged platform combining FL and FHIR, which enables collaborative model training that preserves the privacy of wearable sensor data while promoting data standardization and interoperability. Unlike traditional centralized learning (CL) solutions that require data centralization, our platform uses local model learning, which naturally improves data privacy. Our empirical evaluation demonstrates that federated learning models perform as well as, or even numerically better than, centralized learning models in terms of classification accuracy, while also performing equally well in regression, as indicated by metrics such as accuracy, area under the curve (AUC), recall, and precision, among others, for classification, and mean absolute error (MAE), mean squared error (MSE), and root mean square error (RMSE) for regression. In addition, we developed an intuitive AutoML-powered web application that is FL and CL compatible to illustrate the feasibility of our platform for predictive modeling of physical activity and energy expenditure, while complying with FHIR data reporting standards. These results highlight the immense potential of our FHIR-integrated federated learning platform as a practical framework for future interoperable and privacy-preserving digital health ecosystems to optimize the use of connected health data.

Introduction

The digitalization of the modern world has led to major technological advances that impact various aspects of human life. Today, many devices are interconnected via the Internet in a network called the Internet of Things (IoT) (Aledhari et al., 2022). Connectivity not only improves functionality but also revolutionizes the way we interact with technology on a daily basis. These networks help solve many problems, with healthcare being a major sector requiring cutting-edge solutions and specific care (Famá, Faria & Portugal, 2022). The digitalization of healthcare opens immense opportunities for improving the quality of life through electronic mobile health applications, medical imaging, medical records, low-cost genetic sequencing, and the diffusion of sensors and wearable devices (Lehne et al., 2019; Gupta & Gupta, 2019). Combined with emerging technologies such as artificial intelligence (AI), big data analytics and cloud computing, this wealth of digital health data promises to improve the lives of millions of patients worldwide (Lehne et al., 2019). An individual can achieve digitalization of healthcare by using wearable devices that allow the user to measure their weight, calories, sleep, exercise, heart rate, body temperature, and other vital signs (White, Liang & Clarke, 2019; Pardamean et al., 2020; Sharma & Rani, 2021). Today, the wearable technology market is in a golden period, and therefore they are very economical (White, Liang & Clarke, 2019; Pardamean et al., 2020; Akdevelioglu, Hansen & Venkatesh, 2021). This has allowed many people interested in digitally tracking their health and behavior for personal improvement through self-tracking to form a community of enthusiasts known as the Quantified Self (QS) (White, Liang & Clarke, 2019; Akdevelioglu, Hansen & Venkatesh, 2021; Liang, 2022). This term generally encompasses anything that can be measured (Sharma & Rani, 2021). Recently, growing academic interest in machine learning has given rise to an interdisciplinary research field called personal informatics. This field necessitates the collaboration of various disciplines, encompassing consumer informatics, health informatics, ubiquitous computing, and human-computer interaction (HCI) (Liang, 2022).

Despite the rapid evolution of connected objects, analyzing large amounts of QS data remains a challenge. QS practice remains primarily focused on data collection, while data analysis is limited to basic visualization and correlation analysis (Liang, 2022). More effective use of a large amount of collected QS data requires machine learning (ML) algorithms (Sharma & Rani, 2021). However, a significant portion of medical data lacks interoperability. It is stored in isolated databases, hosted in incompatible systems, and enmeshed in proprietary software, making it difficult to share, analyze, and interpret. Current healthcare systems use varied data formats, individualized specifications, and unclear meanings. This hinders the development of technologies that rely on this data, such as machine learning, AI, and big data (Lehne et al., 2019; Xu et al., 2021). Therefore, data interoperability is essential to fully harness the capabilities of smart, interconnected healthcare, which has the potential to improve patient outcomes and reduce costs (Seneviratne, 2023).

Syntactic interoperability ensures that data from diverse sources, including sensors and wearable devices, follow a standard format, which is critical to facilitate seamless integration and analysis. Using standardized data formats through the adoption of protocols such as Fast Healthcare Interoperability Resources (FHIR), healthcare systems can share and use patient data more effectively. Such standardization allows machine learning models to receive and process precise and detailed datasets. The benefits of such interoperability extend beyond data integration; it also significantly enhances the predictive capabilities of ML algorithms as well as the scalability and reliability of Federated Learning (FL). This is because FL models can be trained on varied datasets from different institutions without infringing on data privacy. Syntactic interoperability is therefore at the heart of the successful deployment of machine learning and AI in healthcare, serving as the critical bridge between heterogeneous data sources and holistic, actionable intelligence.

This study compares FL and centralized learning (CL), leading to the development of a web application for predictive analysis of data from sensors and mobile devices, interoperable in a clinical setting. It involves the analysis of multiple datasets through the application of machine learning algorithms trained using federated and centralized methods. The results are evaluated using appropriate metrics to propose a solution that improves data interoperability. The main objectives of this research are to improve the ability of healthcare systems to effectively manage and analyze data, thereby improving the quality of care and operational efficiency through the integration of advanced AI technologies.

Our proposed system significantly improves the performance and privacy of wearable IoT sensors by adopting federated learning for decentralized processing. This approach avoids the centralization of sensitive data, thus improving privacy and interoperability. The main objective of our approach is to reduce the vulnerabilities associated with centralizing sensitive patient records for analytics. Furthermore, the proposed solution combines FL and FHIR to extract valuable insights from dispersed health data without centralizing all patient information. This is a key distinction from traditional centralized learning and a significant privacy improvement in the healthcare context. Additionally, it supports real-time applications that use both real and synthetic data, ensuring compliance with FHIR standards, which promotes more dynamic and secure interactions within health technology systems.

The main contributions of this study are summarized as follows: We developed and validated an innovative application that uses AutoML to train machine learning models on interoperable healthcare data, using both FL and CL. This advancement improves data privacy and interoperability within healthcare systems. Our study provides an in-depth comparative analysis of FL and CL, evaluating their performance and effectiveness. Furthermore, it presents an an intuitive tool for healthcare professionals to optimize and personalize patient care through cutting-edge AI technology.

The remainder of this article is organized into several sections for in-depth analysis. “Related Works” reviews the existing literature, focusing on key subtopics such as connected objects and IoT, Quantified Self, electronic medical records, and federated learning. “Methodology” details the programming stack used and describes the implementation process of the proposed solution. “Simulation Setup and Evaluation” presents the results, including a comparative analysis of the evaluation indicators of models trained with CL and FL. Finally, “Conclusion” summarizes the entire study, highlighting the critical need to address interoperability challenges in the healthcare sector through the application of FL, and also explores the implications for future research in this area.

Related works

This section covers several critical topics, including wearable devices and IoT, Quantified Self, electronic health records, and federated learning. There are a number of studies on the distributed intelligence in smart cities (Hashem et al., 2024) and healthcare using FL and FHIR (Naithani et al., 2024). Hashem et al. (2024) highlight the improvement in efficiency and functionality of smart cities by integrating distributed intelligence in IoT with a wide range of architectures, methodologies and applications. Similarly, Naithani et al. (2024) highlight the key challenges of implementing FL in healthcare, such as data heterogeneity, computational resources, and regulatory compliance. These technologies are fundamental to a comprehensive understanding of the problems to be solved and crucial to designing effective solutions.

Wearable devices and IoT

Just as the Internet and smartphones have transformed our lives, wearables and IoT devices are swiftly reshaping how we live (Lo, Ip & Yang, 2016). Today, individuals not only share photos of their activities but also detailed data such as heart rate metrics, step counts, and maps of their travel routes, all facilitated by wearable technology. Therefore, understanding the nature of IoT and wearable devices is crucial for exploring the extent of their potential applications.

Wearable biosensors, often referred to as “wearables,” represent a significant interdisciplinary effort within health services to harness mobile health (mHealth) technology for improved data collection, diagnosis, treatment monitoring, and enhanced health insights. These devices, now readily available as consumer products, are increasingly used to collect and analyze basic physiological data such as weight, calorie intake, exercise, sleep patterns, body temperature, and heart rate (White, Liang & Clarke, 2019; Pardamean et al., 2020; Sharma & Rani, 2021), and even more complex data like energy expenditure, arrhythmia, cardiovascular, continuous glucose monitoring, diabetes mellitus, and many more (Witt et al., 2019). Due to the complexity of the field, the topics researchers study vary significantly. Below, some of the applications of wearables are discussed.

Wearable health technology has been the focus of research in recent years. Witt et al. (2019) analyzes the algorithms used in popular wearable devices from 2014 to 2017, while in Dian, Vahidnia & Rahmati (2020) the authors surveys recent developments in wearable IoT. Another study Haleem et al. (2022) introduces Medical 4.0 as a significant advancement in healthcare, utilizing IoT in data collection and promoting patient-centred therapy. Pathak, Mukherjee & Misra (2023) develop SemBox, a wireless system that enhances interoperability among diverse wearable health monitoring devices. The authors of Ahmed et al. (2024) discuss the opportunities and challenges of the Internet of Medical Things (IoMT), highlighting the benefits of patient empowerment, healthcare collaboration, and data sharing. Fei & Ur-Rehman (2020) develop a low-power smart wristband that can monitor heart rate, count steps, and detect abnormal hand movements. A user-end program is also created for data analysis and presentation. In a study involving 46 participants (de la Casa Pérez et al., 2022), the Xiaomi Mi Band 4 fitness tracker is found to be accurate and precise in tracking step count and heart rate. Another study (Liang & Chapa Martell, 2018) shows that consumer sleep tracking devices such as Fitbit Charge 2 and Neuroon eye mask are good for non-clinical use, but not suitable for diagnosing sleep disorders.

Quantified-Self

Quantified-Self involves using technology to track various aspects of one’s life, such as biology, physical activity, behaviour, or environmental factors (Swan, 2013). The term ‘Quantified-Self’ was coined in 2007 by Wired writers Gary Wolf and Kevin Kelly, Kelly being the founder of the QS movement, a global community focused on self-tracking for gaining self-knowledge through data (Sharma & Rani, 2021; Akdevelioglu, Hansen & Venkatesh, 2021). Self-monitoring, measuring, and recording for self-improvement or reflection have historical roots, but the advent of digital technologies has sparked new interest and expanded the domains and applications of self-tracking. This subsection discusses the practical applications of the QS community and the potential impact it may have on improving mHealth.

Swan (2013) notes the rise of the QS movement, where individuals track various aspects of their well-being. The long-term vision is to optimize personal performance in real-time. QS encompasses quantitative and qualitative data and is evolving into a “qualified self” by tracking qualitative aspects and promoting behavior change. Another article discusses the rise of self-tracking applications and devices and how they can be used to improve personal productivity (White, Liang & Clarke, 2019). A netnographic study was conducted on the Fitbit user community to explore wearable tech’s impact on the quantified self (Akdevelioglu, Hansen & Venkatesh, 2021). The study uncovers social engagement mechanisms, such as empowering motivation and friendly rivalry, that are built on this foundation. Billis, Batziakas & Bamidis (2015) explore how seniors have integrated self-tracking, and how it can be a habitual practice. They developed a web application to visualize sensor data collected over 6 months in real-life settings. Almalki, Gray & Sanchez (2015) introduce the concept of the Personal Health Information Self-Quantification Systems (PHI-SQS) and systematically review data management processes in eleven self-quantification tools. The article suggests that self-quantification for personal health maintenance holds promise, but further research is needed to support its use in this context. Sharon (2017) studies how self-tracking devices can be used for individual health management and raise ethical concerns. The author suggests a practice-based theory for revealing diverse enactments of values by self-trackers within the QS community. Three novel recommendation methods for QS applications: Virtual Coach, Virtual Sleep Regulator, and Virtual Nurse are introduced in Erdeniz et al. (2020). The aim of these methods is to improve the health status of individuals through personalized exercise planning, considering medical history and intended health status for physical activity recommendation, and addressing sleep quality issues. Lastly, Pardamean et al. (2020) discuss recent literature on the use of wearable devices to promote physical activity, mental wellness, and health awareness.

Federated learning

Federated learning is a technological innovation in distributed learning, addressing growing concerns about data security in the era of big data. As an alternative to centralized deep learning frameworks, federated learning offers a privacy-friendly approach by avoiding large-scale data centralization. Compared to traditional machine learning solutions, federated learning allows machine learning models to be trained directly on devices. This is achieved through coordination between multiple devices, called FL clients, and a central server, called an FL aggregator, without the exchange of raw data. This approach is particularly crucial for protecting sensitive information (Yang et al., 2023).

Tyagi, Rajput & Pandey (2023) present a survey of FL, a decentralized solution for collaborative model training with data privacy guarantees. Another article (Xu et al., 2021) explains that FL can be an effective way of linking and analyzing private health data from different sources. The method can be used to solve the problems caused by privacy concerns in exchanging electronic health records among different organizations. In their work, Khan et al. (2024) propose Fed-Inforce-Fusion, a privacy-preserving FL-based Intrusion Detection System (IDS) for IoMT. The model utilizes reinforcement learning for finding relationships in medical data and federatively trains a composite IDS model on the nodes of a smart healthcare system (SHS). An FL approach to depression detection with the guarantee of patient data privacy is proposed in Gupta et al. (2022). The cluster-based model is an enhancement over conventional machine learning models and outperforms other collaborative learning approaches.

Interoperability of electronic health records

Electronic medical records (EMRs) have revolutionized the healthcare industry by establishing common standards for health data, making it easier to share. EMRs do not specify any standards, but rather standardize health information. Standardization is a prerequisite for interoperability—seamless communication between different applications without sacrificing data (Bhartiya, Mehrotra & Girdhar, 2016). EMRs consolidate patient data for efficient sharing, in accordance with ISO standards. To bridge the gap in data interoperability, various application standards, such as OpenEHR and Fast Healthcare Interoperability Resources (FHIR) (Famá, Faria & Portugal, 2022), play a vital role in IoT architectures in digital health. This subsection explores the various applications of EMRs and their importance.

Roehrs et al. (2019) propose a personal health record (PHR) interoperability model named OmniPHR that utilizes a standard ontology and AI with natural language processing (NLP) to achieve interoperability. The model is evaluated using a real database of anonymized patient records, demonstrating the feasibility of harmonizing data from various standards into a unified format. Another semantic ontology-based model that achieves interoperability in EHRs is proposed in Adel et al. (2022). The model unifies data formats, accommodating five healthcare data standards and allowing physicians to interact with diverse systems through a single interface. The integrated ontology facilitates improved patient care. Data from 68 oncology sites using five different EHR vendor products are analyzed in Bernstam et al. (2022) to quantify the interoperability of real-world EHR implementations concerning clinically relevant structured data. The results show that intra-EHR vendor interoperability is notably higher than intersystem interoperability, highlighting the lack of standardization for clinically relevant data. In Bhartiya, Mehrotra & Girdhar (2016), discuss the challenges of accessing and sharing EHR in team-based healthcare. They highlight issues related to privacy, data security, and interoperability caused by heterogeneity in EHR systems. The article identifies different approaches and challenges in achieving interoperability in EHR sharing. Evans (2016) analyzes the evolution of EHR from 1992 to 2015 and speculates on their expected state in the next 25 years. EHRs have been used for a long time, but technical, procedural, social, political, ethical, and security concerns overshadow their usage. Although current EHRs do not entirely meet the evolving needs of the healthcare landscape, emerging EHR technology will help establish international standards for interoperable applications, facilitating precision medicine and a learning health system. Daliya & Ramesh (2019) propose a hybrid approach to handle heterogeneous data from various sources in an IoT-based healthcare system. The approach extracts data meaning from diverse sources while ensuring uniformity in data format, contributing to enhanced data interoperability. Another study (Khalique, Khan & Nosheen, 2019) proposes a framework for managing public healthcare data based on standardized protocols. It utilizes EHR at basic health facilities, consolidates data from multiple sources, and incorporates contextual information using HL7 as an interoperability standard. Researchers from Italy developed a national IT platform for the interoperability of EHR systems (Silvestri et al., 2019). They introduce a Big Data architecture for EHR that offers valuable insights for healthcare professionals, patients, and decision-makers.

In contrast to earlier research, the current study stands out because it aims to achieve interoperability by employing FL architecture and EHR protocol. The proposed architecture functions not just with refined datasets but also with wearable devices.

Methodology

This section describes the design and implementation of the system, which includes data sources (agents), a FHIR server, a web user interface, and an FL server.

System design: centralized vs federated learning

Traditionally, CL requires all edge devices to share their data with some centralized node or server, which then trains the model on the aggregated data. This approach is efficient but raises privacy concerns, as the data is centralized. Figure 1 shows the design of CL.

Figure 1: Centralized learning design.

Download full-size image

DOI: 10.7717/peerj-cs.2870/fig-1

FL, on the other hand, allows edge devices to train the model locally and share only the model updates with the FL aggregator (or FL server) as shown in Fig. 2. The FL aggregator then takes these model parameters and aggregates them (e.g., by averaging), converging all updates in one global model, which is then again distributed among clients. This approach is more privacy-preserving, but it is more complex and may require more communication between devices.

Figure 2: Federated learning design.

Download full-size image

DOI: 10.7717/peerj-cs.2870/fig-2

System implementation

The application adheres to free and open-source software (FOSS) principles, avoiding non-FOSS dependencies, and features a modular architecture with replaceable components. The proof-of-concept solution focuses on Mi Band and CSV file agents, is low-code, configurable, includes an understandable GUI, and supports supervised learning for classification and regression.

Figure 3 illustrates an FL system integrating FHIR for processing healthcare data, highlighting interactions between edge devices, the FL server, and the FHIR server. Each edge device, representing local systems like wearable devices, holds local data and models and includes an FL client for local model training, an FHIR client for managing data, and a data service (DS) agent for translating and managing data exchange. Currently, the mapping and translation specifications within the DS agents are implemented manually for each type of wearable data we integrate (e.g., steps, heart rate, sleep data). The FHIR server stores healthcare data using FHIR standards, while the FL server aggregates local model parameters from edge devices, trains a global model, and distributes global model parameters back to the edge devices. Data flows involve edge devices sending FHIR data to the FHIR server, using local data for training, and exchanging model parameters to ensure privacy and collective learning. The system ensures syntactic interoperability through FHIR, enabling seamless data exchange, and leverages FL to train models across institutions without sharing sensitive data, maintaining scalability and privacy. The integration of FHIR and FL facilitates effective data utilization, privacy preservation, and improved model accuracy, supported by a graphical user interface (GUI) for user interaction and monitoring. To have a more comprehensive analysis, we use classification to determine the activity, such as walking and running, whereas the speed and the calorie expenditure are quantified in regression.

Figure 3: The diagram of the proposed solution.

Download full-size image

DOI: 10.7717/peerj-cs.2870/fig-3

Methodology for privacy and interoperability

The approach used on our platform to improve the privacy and interoperability of health information uses FL in combination with FHIR. This section describes the integration of the two technologies to achieve these goals.

Privacy preservation with federated learning

FL supports data privacy in our system. FL allows for local training of machine learning models from the original data (e.g., from connected objects). Importantly, sensitive data is not transmitted to a central server. Instead, model updates, such as weights or gradients, are transmitted to a central server, which averages the updates to generate a global model. This decentralized training process inherently eliminates the need for data centralization. Storing data at its source and not transmitting raw patient data between facilities significantly enhances data privacy.

Data interoperability with FHIR

To address interoperability challenges, our system uses the FHIR standard. FHIR provides a standardized and widely accepted model for the digital representation and transmission of health information. With FHIR, our system enables consistent interpretation and processing of health data from diverse sources, including wearable devices and disparate health systems. Our platform integrates the FHIR application by mapping all health data in the system to the standardized FHIR model. Data is stored and retrieved via the FHIR protocol, ensuring consistency and interoperability.

Integration of fast healthcare interoperability resources and federated learning

Our system provides seamless integration of FL and FHIR through a modular design. This integration is achieved through local agents running on devices (e.g., portable devices as FL clients). These agents have the following features:

FHIR client: Each agent has a FHIR client module that enforces all data management within the agent to meet FHIR standards.
Data service (DS) agent for FHIR translation: The DS agent is arguably the most crucial element, as it converts the device’s local data formats to the standardized FHIR format. This automatic data translation process translates local formats to FHIR standards, eliminating the need for manual translation on a case-by-case basis.

This tight integration of FHIR within our federated learning platform enables seamless data exchange and significantly improves interoperability between heterogeneous healthcare systems and wearable devices without losing the intrinsic privacy benefits of federated learning and minimizing the need for manual data intervention.

Dataset

Since the solution focuses only on two types of machine learning problems, classification and regression, we only focus on datasets suitable for both and retrieved by wearable devices.

Classification

For classification tasks, the dataset includes well-known datasets used to predict physical activities. These activities are classic classification tasks where the data is labeled, allowing for the training and evaluation of machine learning models.

Harvard’s Apple Watch dataset

The dataset (https://www.kaggle.com/datasets/aleespinosa/apple-watch-and-fitbit-data/data) from Harvard University’s research in Fuller et al. (2020) is utilized for solving classification problems. The research aims to investigate if commercial wearable devices can effectively forecast sitting, lying, and varying levels of physical activity. Scientists enlisted 46 volunteers, including 26 women, in a convenience sample to utilize three gadgets: a GENEActiv, an Apple Watch Series 2, and a Fitbit Charge HR2. The research shows that Apple Watch and Fitbit, popular wearable devices, can accurately predict the type of physical activity. The findings endorse the utilization of real-time data from Apple Watch and Fitbit with machine learning methods for large-scale classification of physical activity types among the general population. The Apple Watch dataset exhibits the following characteristics: 18 variables, 3,656 observations, no missing cells (0.0%) or duplicate rows (0.0%), a total size in memory of 514.2 KiB, comprising 16 numeric and two categorical variable types.

HARTH dataset

The HARTH dataset (https://www.kaggle.com/datasets/joebeachcapital/harth-dataset) (Logacjov et al., 2023) comprises 3-axial accelerometer data from 22 participants, with sensors placed on the thigh and lower back, recording acceleration in three dimensions at a high sampling rate for detailed motion tracking. The time-series data is accompanied by annotations of various activities and includes raw signals that allow for custom processing and feature extraction. Additionally, the dataset provides metadata on participant demographics and experiment protocols, supporting comprehensive analysis for human activity recognition research. The HARTH dataset exhibits the following characteristics: seven variables, 110,116 observations, no missing cells (0.0%) but a small number of duplicate rows (618, representing 0.6% of the data), a total size in memory of 5.9 MiB, and comprising seven numeric variable types.

Regression

For regression tasks, the dataset includes data to calculate calorie expenditure from physical activity data. These are continuous values, allowing models to predict numerical outcomes.

Mi band

Compared to classification, regression is much simpler. Regression models do not require specific target data (at a certain level). Thus, raw data from the device itself can be used without heavy data processing. The solution is tested using activity data from one of the author’s Mi Band 3, 4, and 7 devices. The models attempt to predict the number of calories burned during activity, which include steps, the walking distance, and the running distance. The Mi Band dataset exhibits the following characteristics: five variables, 2,454 observations, no missing cells (0.0%) or duplicate rows (0.0%), a total size in memory of 96.0 KiB, and comprising 1 DateTime and four numeric variable types.

DAT263x

To demonstrate the robustness of the results, it was decided to include another similar dataset, DAT263x. This dataset is mentioned in the EDX publication “Microsoft DAT263x Introduction to Artificial Intelligence (AI)” and is intended for use with Azure, which can be found on Kaggle (https://www.kaggle.com/datasets/fmendes/fmendesdat263xdemos). It contains gender, age, weight, height, duration, heart rate, body temperature, and calories burned from each participant during physical activity. The DAT263x dataset exhibits the following characteristics: eight variables, 12,000 observations, no missing cells (0.0%) but a minimal number of duplicate rows (1, representing less than 0.1% of the data), a total size in memory of 750.1 KiB, and comprising one categorical and seven numeric variable types.

Synthetic dataset

Since real data may be non-independent and identically distributed (non-IID), partitioning can strongly affect the results of CL and FL analyses. Therefore, it was decided to use the synthetic minority oversampling technique (SMOTE) to synthesize the data. Specifically, SMOTE and adaptive synthetic sampling (ADASYN) were chosen as practical methods to address potential class imbalance, a key characteristic of non-IID data, in our simulated FL environment. As shown in Chua, Sii & Ellyza Nohuddin (2022), these techniques can significantly improve the performance of different fitness data ML models when applied, providing a valuable starting point to mitigate non-IID effects. SMOTE is used to synthesize data for classification tasks. In this context, SMOTE allows balancing the class distribution within each client’s data partition before FL, thus aiming to create more robust local models despite possible non-IID class distributions across clients. ADASYN is used for regression problems, adaptively generating synthetic samples for the minority class and focusing on hard-to-learn examples. For the Mi Band dataset, ADASYN is applied to address potential imbalances in the distribution of the regression target variable (calorie expenditure) among customers in a non-IID setting. Although SMOTE and ADASYN are not exhaustive solutions to all non-IID challenges, they represent a pragmatic and effective initial strategy to mitigate class imbalance, a common manifestation of non-IID data in wearable sensor applications.

SMOTE is a commonly used method for dealing with imbalanced datasets by generating synthetic samples for the minority class. This technique involves creating new instances by interpolating between existing minority class samples. SMOTE identifies the nearest neighbors of each minority class instance and creates new synthetic examples by interpolating between these neighbors. These synthetic samples are designed to balance the class distribution, thus avoiding overfitting that can occur with standard oversampling methods.

In our study, we applied SMOTE to the Apple Watch dataset. Figure 4 represents the data distribution in the Age column. As the density plot shows, the synthetic and real data fit almost perfectly.

Figure 4: Classification: SMOTE vs real data distribution.

Download full-size image

DOI: 10.7717/peerj-cs.2870/fig-4

Unfortunately, SMOTE cannot be used in regression problems because the target column is not expressed by categorical values. Therefore, other oversampling techniques are tested. Among them, adaptive synthetic sampling (ADASYN) has proven to be the most suitable for real data. Therefore, ADASYN is used for the Mi Band dataset.

ADASYN is an algorithm used to mitigate the problem of unbalanced class distributions in machine learning. It works by adaptively generating synthetic samples, primarily for the minority class, focusing on the samples that are most difficult to learn. Unlike simple oversampling techniques that can replicate minority class data, ADASYN improves on this process by creating new synthetic data points that are similar, but not identical, to existing data. The number of synthetic samples generated for each minority class example is weighted based on how difficult that example was to learn, thus promoting better model generalization.

The adaptive nature of ADASYN makes it particularly useful in situations where some minority class examples are more difficult to classify than others. By generating more synthetic data for these complex examples, ADASYN results in a more balanced dataset, contributing to better performance in classification tasks.

The proposed solution utilizes ADASYN implementation from ImbalancedLearningRegression library (Wu, Kunz & Branco, 2022). It is applied to the Mi Band dataset. As shown in Fig. 5, even though the distribution is not identical, synthetic data closely resembles the original data.

Figure 5: Regression: ADASYN vs real data distribution.

Download full-size image

DOI: 10.7717/peerj-cs.2870/fig-5

Simulation setup and evaluation

This section describes the simulation environment setup and the methods used to evaluate the performance of FL and CL. The setup details the hardware, software, and settings used, as well as the evaluation metrics applied to assess the model accuracy and efficiency.

Experiments setup

The proof-of-concept uses open-source, low-code technologies. The GUI is built with Streamlit, an open-source Python library, enabling straightforward code writing similar to Python scripts and integrating well with popular libraries like Pandas and PyCaret. The AutoML framework chosen is PyCaret, which simplifies model development with a simple API. Flower is the FL framework known for handling large-scale experiments and supporting diverse edge devices. For FHIR, HAPI FHIR is used for data sharing, with FhirPy as the lightweight client for RESTful API access. The source code is available on GitHub repository (https://gitfront.io/r/weeebdev/gnm1dXbMqiHh/auto-fl-fit) under the MIT license.

The experiments were conducted on a MacBook Pro 14 M1. The machine is powered by an 8-core Apple M1 Pro processor, including six performance cores and two efficiency cores, 16 GB of unified memory, and a 14-core GPU. It runs macOS Sonoma 14.3.

To avoid the complexity of network constraints (latency and bandwidth limitations), heterogeneity of devices with different computing resources, security challenges, and the overhead of a distributed environment, we prioritize local simulation, focused on validating the key features of the proposed approach (i.e., the feasibility of FHIR with FL). Local simulation also aims to ensure the reproducibility of this work. Since experiments are simulated locally using only one dataset, it is crucial to clarify the approach for partitioning the data. One of the main advantages of using a single dataset is that it allows us to rigorously establish and evaluate the proposed methodology and platform for FL integrated with FHIR. Moreover, the use of a single, well-characterized dataset allows us to systematically study the main performance characteristics of our approach and ensures the robustness of our experimental setup. Assessing the solution requires utilizing various partition proportions and considering different numbers of clients. To guarantee a fair comparison between centralized and FL models, the data is divided into pairs of 60%/40%, 70%/30%, 80%/20%, and 90%/10%, resulting in two distinct partitions. It is important to note that these partitioning proportions are used to assess the sensitivity of model performance to different volumes of training data and to simulate scenarios with varying degrees of data sparsity at each client. This approach allows us to explore the robustness of federated and centralized learning under different data availability conditions and to observe the effect of different proportions on the comparative performance of the two learning paradigms. The other major reason for these proportions is to provide a range of more balanced partitionings for highly imbalanced scenarios, which presents a range of data distribution challenges. The first partition is trained using traditional centralized techniques. The resultant model, termed base model is then tuned and tested with the second partition for the CL evaluation, as shown in Fig. 6.

Figure 6: Evaluation of centralized learning.

Download full-size image

DOI: 10.7717/peerj-cs.2870/fig-6

As shown in Fig. 7, the FL evaluation base model acts as the initial global model and is distributed to k clients. In this study, we use k clients as 2, 4 and 8 to demonstrate the fundamentals of FL in a simulated environment and observe the performance trends as the number of participating clients increases to provide meaningful insight into the system behavior with the different number of participating clients k. The second partition is then split using KFold (StratifiedKFold is excluded as it does not represent realistic scenarios where the data is not IID and may not reflect the entire dataset) and distributed among these k clients. Each client refines the global model with their new data fold. These results are aggregated and compared to the performance of the optimized centralized models.

Figure 7: Evaluation of federated learning.

Download full-size image

DOI: 10.7717/peerj-cs.2870/fig-7

Regarding synthetic data, the evaluations are performed exactly as in CL and FL, but the initial dataset followed the synthetic data generation process as shown in Fig. 8.

Figure 8: Generation of synthetic data.

Download full-size image

DOI: 10.7717/peerj-cs.2870/fig-8

Performance evaluation

Not all models can be trained in a federated manner since it depends on the support of incremental (or online) learning, such that the models are continually updated and trained as new data becomes available, without the need to retrain from scratch. PyCaret provides five such models: Extra Trees, LightGBM, Extreme Gradient Boosting, Random Forest, and CatBoost. Thus, the results are obtained using those models. Additionally, to simplify the representation of results, models are referred to by codenames instead of their full names. For example, et stands for Extra Trees, xgboost for Extreme Gradient Boosting, rf for Random Forest, lightgbm for LightGBM, and catboost for CatBoost. While the experiments are conducted on various data partitions, only the 80%/20% split is discussed here. Note, that in the upcoming figures, the x-axis represents the number of clients k and the y-axis represents the metric.