ORIGINAL RESEARCH
Sari Puspita, S.Si, M.Si
; Gusrino Yanto, S.Kom, M.Kom
; Rifa Turaina, S.Kom, M.Kom
; Nency Extise Putri, S.Kom, M.Kom 
Program Study of Information System, Faculty of Information Technology and Creative Industries, Universitas Metamedia, Padang, West Sumatera, Indonesia
Keywords: ensemble of machine learning, LIME, multi-method XAI, preeclampsia, SHAP
Preeclampsia is a pregnancy complication that endangers the mother and fetus. Early detection is necessary to prevent serious complications. Here, the authors employ mixed methods, which combine quantitative and qualitative approaches to design a preeclampsia risk prediction system for pregnant women. The system uses four machine learning algorithms: logistic regression, decision tree, support vector machine (SVM), and random forest (RF). We conducted the evaluation process by using a multi-method explainable AI (XAI) approach with Shapley additive explanations, local interpretable model-agnostic explanations, and permutation feature importance to enhance the transparency and ease of interpretation of the results. Clinical variables included systolic and diastolic blood pressure, blood glucose, body temperature, heart rate, age, and urine protein. The results show that RF and SVM achieved the highest accuracy (78%) with relatively stable performance across risk categories. Multi-method XAI analysis indicated that blood pressure and blood glucose frequently appeared among influential features, although their relative importance varied depending on the model and explainability method. However, due to the limited dataset size and use of internal validation only, these findings should be interpreted as preliminary and multi-method XAI early identification of preeclampsia risk factors, not to replace clinical diagnosis or function as a standalone clinical decision-making tool.
Preeclampsia is a serious pregnancy complication that can endanger both mother and baby. Early risk identification is essential to support timely monitoring and care. This study analyzed anonymized data from 299 pregnant women at a community health center in Padang, Indonesia. Several machine learning models were used to classify preeclampsia risk into low, medium, and high categories. Random forest and support vector machine achieved the most consistent results, with 78% accuracy. Explainable artificial intelligence (XAI) methods were applied to clarify how predictions were made. Blood pressure and blood sugar frequently influenced the results. The proposed model is intended to support early risk screening and does not replace clinical diagnosis or medical judgment. Further validation is required before clinical implementation.
Citation: Telehealth and Medicine Today © 2026, 11: 663 - https://doi.org/10.30953/thmt.v11.663
DOI: https://doi.org/10.30953/thmt.v11.663
Copyright: © 2026 The Authors. This is an open-access article distributed in accordance with the Creative Commons Attribution Non-Commercial (CC BY-NC 4.0) license, which permits others to distribute, adapt, enhance this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0. The authors of this article own the copyright.
Received: November 25, 2025; Accepted: January 30, 2026; Published: March 31, 2026.
Corresponding Author: Sari Puspita, Email: saripuspita@metamedia.ac.id
Competing interests and funding: The authors declare no conflicts of interest.
The authors received financial support for publication from the Ministry of Higher Education, Science, and Technology.
Preeclampsia is a hypertensive disorder that occurs during pregnancy. It afflicts approximately 5% to 8% of high-risk pregnancies and is one of the leading causes of maternal and perinatal morbidity and mortality worldwide.1–4 Preeclampsia usually occurs after 20 weeks of gestation and is characterized by high blood pressure, which is often accompanied by proteinuria or signs of other organ damage.2,5,6
If undetected or untreated, preeclampsia can progress to eclampsia. Eclampsia is a more severe condition characterized by seizures and can lead to fatal complications for the mother and fetus.7,8 The high rates of maternal and infant mortality due to preeclampsia highlight the need for an early detection system that is both more accurate and more efficient.9–11 Today, identifying the risk of preeclampsia requires manual clinical and laboratory examinations, which are time-consuming and resource-intensive.12,13 Additionally, the numerous interrelated risk factors, including maternal age, history of hypertension, previous pregnancies, body mass index, and genetic conditions, make the diagnostic process more complex.14
Previous studies on predicting preeclampsia used an ensemble of machine learning (ML) algorithms.15–17 They demonstrated the high accuracy of models such as random forest (RF) and eXtreme Gradient Boosting.18–20 The main obstacle is the lack of transparency in the clinical field. Multi-method explainable AI (XAI), including Shapley additive explanations (SHAP), local interpretable model-agnostic explanations (LIME) and glass-box explainable boosting machine models, are used to explain variables, improve interpretability, and maintain accuracy. This supports model interpretability and transparency within clinical research contexts.21,22 Previous preeclampsia prediction studies largely relied on single explainability techniques with a focus primarily on predictive accuracy, limiting a comprehensive understanding of model behavior. By integrating multiple complementary multi-method XAI, this study addressed this limitation.22
Although several physiological variables, such as body temperature and heart rate, were included in the predictive models, these features are not well-established independent clinical risk factors for preeclampsia. In this study, these variables were considered supportive indicators of general maternal physiological conditions, rather than primary clinical determinants. Therefore, conclusions regarding their clinical relevance were drawn cautiously, and the predictive importance of these non-standard variables should not be extrapolated beyond the context of early risk screening.
In Padang City, Indonesia, there were approximately 17,376 pregnant women in a population of 942,938 in 2022.23,24 In the same year, the maternal mortality rate dropped to 17 cases per 100,000 live births from 30 cases in 2021. It then increased to 23 cases in 2023.24 Preeclampsia in pregnant women is one cause of maternal and infant mortality. Healthcare professionals aim to identify preeclampsia early to ensure the safety of mothers during pregnancy, childbirth, and postpartum. According to the operational definition, approximately 20% of pregnant women are considered high-risk, which includes those with preeclampsia.
Figure 1 shows that the detection rates among high-risk pregnant women in 2023 were 113.4%.25 The data on the detection rate of high-risk pregnant women show that artificial intelligence (AI) technology has the potential to support decision-making,26,27 especially in terms of predicting preeclampsia. One of the frequently used methods is ML, a branch of AI that allows computers to learn and make decisions or predictions without explicit programming. This study uses four ML algorithms—logistic regression, decision tree, support vector machine (SVM), and RF—to classify preeclampsia risk levels in pregnant women.

Fig. 1. Detection rates of high-risk pregnant women by health workers in 2023.25
Source: DINAS KESEHATAN KOTA PADANG [Internet], 2025.25
RF is an ensemble-based classifier that combines multiple decision trees to improve predictive stability.28–35 Logistic regression is a probabilistic linear classifier commonly used as a baseline model in medical prediction.32,36,37 A decision tree is a rule-based model that provides interpretable classification decisions,38–40 and an SVM is a margin-based classifier that is effective for handling high-dimensional clinical data.41–44
To ensure a transparent understanding of the prediction results from each model, this study applies a multi-method XAI approach,45,46 which helps explain and improve understanding of how models make decisions.47 In addition, three models were employed: SHAP, LIME, and permutation feature importance (PFI). SHAP uses principles of game theory to measure how much each feature contributes to the predicted result.48 Meanwhile, LIME is developing models to explain individual predictions and assess feature influence on performance using PFI.49–51 These techniques provide technical transparency regarding model behavior and feature contributions. However, the interpretability benefits are assessed at a methodological level and were not formally evaluated through clinician usability testing.
This study uses an ML modeling approach with a multi-method XAI method to predict preeclampsia risk in pregnant women. The modeling process is carried out in several systematic stages, as shown in Figure 2.

Fig. 2. Stages of the machine learning modeling process for preeclampsia risk prediction. AI: artificial intelligence; AUC ROC: area under the receiver operating characteristic curve; LIME: local Interpretable model-agnostic explanations; PFI: permutation feature importance; SHAP: Shapley additive explanations; SMOTE: synthetic minority oversampling technique.
There are the sequences of the preeclampsia prediction modeling process by using ML algorithms:
These collected data contain the medical information and clinical characteristics of pregnant women. The initial data undergo a filtering stage to remove incomplete, inconsistent, or irrelevant entries. This ensures that the data are high-quality, accurate, and suitable for effective and valid ML modeling analysis.
The data preprocessing stage includes handling missing values, encoding categorical variables into numerical form, and normalizing continuous variables using min-max scaling to ensure comparable feature ranges. Outliers are retained to preserve clinically meaningful extreme values, and no domain-specific thresholds were applied in order to avoid introducing subjective bias. Class imbalance was addressed using the synthetic minority oversampling technique (SMOTE).52–54 SMOTE generates synthetic data for minority classes, balancing the dataset and preventing ML models from favoring the majority.55–57
This study uses four ML algorithms—logistic regression, decision tree, SVM, and RF—to classify preeclampsia risk levels in pregnant women.58–60
Model performance: It is evaluated using quantitative metrics such as accuracy, precision, recall, and F1-score. These metrics evaluate the model’s capacity for precise predictions and its ability to balance the detection of positive and negative cases, especially in datasets with skewed class distributions. The model will be evaluated using the following metrics:
Accuracy: It measures how often the model makes correct predictions, which are compared to the total number of predictions.61,62
Precision: It measures how many positive predictions are correct, which are compared to all positive predictions made by the model.63,64
Recall (i.e., sensitivity or true positive rate): It measures how many positive examples are detected by the model compared to the total number of examples in the dataset.10
F1-Score: a metric that balances precision and recall by combining them into a single value. It is useful when there is an imbalanced class or when both false positives and false negatives must be considered simultaneously.65
Model evaluation was conducted using an internal validation framework. Due to dataset size constraints, nested cross-validation and external validation with an independent test set were not performed. Consequently, the reported performance metrics reflect the model’s internal behavior and should be interpreted as preliminary screening performance rather than definitive clinical accuracy. Formal statistical uncertainty estimation, such as confidence intervals for accuracy or area under the curve (AUC), and hypothesis-based statistical comparisons between models were not performed due to the limited dataset size and the exploratory screening nature of this study. In addition, model performance was not evaluated on an independent, untouched external test set. Therefore, the reported metrics should be interpreted as descriptive indicators of internal screening performance rather than statistically validated estimates of generalizable clinical accuracy. Accordingly, no claims of model superiority or clinical equivalence are made.
Importance and Elimination Feature: It analyzes the level of importance of each feature in its contribution to the prediction. Identifying these important features determines which variables influence the model output the most.66 Features with low importance values are considered less relevant and eliminated from the modeling process. This elimination simplifies the model structure, reduces computational complexity, and improves interpretability without sacrificing prediction performance.67
Application of multi-method XAI: To improve the transparency and interpretability of the prediction model, this study uses three methods, namely, SHAP, LIME, and PFI.48,68,69
The results of the model interpretation process are presented through two types of plots: summary plots and force plots. Summary plots illustrate the contribution of features to the prediction results, while force plots provide specific explanations at the individual level. Summary plots offer an overall view of how each feature influences the model’s predictions, while force plots provide specific explanations at the individual level.48,68,69
The final result of the estimation and interpretation process is compiled into comprehensive findings that support decision-making in medical contexts. This information is designed to help healthcare professionals detect the risk of preeclampsia earlier.6,22,34
Data were collected using field research methods from pregnant women. Then, the data were processed in Google Collaboratory with the Python programming language.70,71
This study has several limitations related to the size of the dataset. With 299 records, the dataset increases the risk of overfitting and limits the generalizability of the findings to broader clinical populations. Although SMOTE was applied to address class imbalance, the use of synthetic samples might introduce noise and potentially distort underlying clinical relationships, especially in small datasets. Consequently, the reported performance should be interpreted with caution. Future studies should prioritize larger, multicenter datasets and external validation to ensure model robustness and clinical reliability. Table 1 shows an overview of the data used.
| No. | Age (yrs) | Systolic BP (mmhg) | Diastolic BP (mmhg) | Blood sugar (mg/dL) | Body temp (°F) | Heart rate (BPM) | Urine protein (mg/dL) | Risk* |
| 0 | 34 | 130 | 80 | 15.0 | 98 | 86 | 109.99 | High |
| 1 | 34 | 140 | 90 | 121.0 | 98 | 70 | 117.97 | High |
| 2 | 29 | 90 | 70 | 118.0 | 100 | 80 | 147.26 | High |
| 3 | 33 | 140 | 85 | 7.0 | 98 | 70 | 112.28 | High |
| 4 | 40 | 120 | 60 | 6.1 | 98 | 76 | 197.08 | Low |
| *Risk stratification (low, medium, and high risk). The three-class risk stratifications (low, medium, and high risk) reflect operational screening categories used in routine antenatal care at the study site. These labels were derived from existing clinical records and were not independently validated by specialist clinicians. bpm: beats per minute, BP: blood pressure; F: temperature reported in the Fahrenheit scale; mg/dL: milligrams per deciliter; yrs: years. | ||||||||
To facilitate the processing of the classification algorithm, the risk level as a target variable is converted from the categories of low, medium, and high risk to the numerical values 0, 1, and 2, respectively. The three-class risk stratification (low, medium, and high risk) reflects operational screening categories used in routine antenatal care at the study site. These labels were derived from existing clinical records and were not independently validated by specialist clinicians; therefore, they should be interpreted as screening-level risk indicators rather than diagnostic classifications.
As shown in Figure 3, the distribution visualization illustrates the initial imbalance of the data before balancing. The highest number is in the low-risk category (0), then it is followed by the high-risk category (2), and the lowest number is in the medium-risk category (1).

Fig. 3. The risk for preeclampsia.
The data distribution shows class imbalance, which affects model performance. This issue can be addressed through stratified sampling, oversampling, or weighting the minority class to improve the balance of risk prediction.
SMOTE application successfully balanced the amount of data in each risk label, which was previously uneven, eliminating bias toward the dominant class. Accordingly, each category (Label 0, Label 1, and Label 2) has 108 samples. This helps the model to recognize patterns evenly and improves the accuracy of risk predictions (Figure 4).

Fig. 4. Distribution of preeclampsia risk labels after SMOTE implementation. SMOTE: synthetic minority oversampling technique.
A confusion matrix evaluates the performance of a classification model by showing the number of correct and incorrect predictions in each category. This facilitates analysis of the model’s accuracy and classification error.
Figure 5 shows that the classification performance varied among the models. The medium-risk group was the most challenging to predict. The RF and SVM models demonstrated the most balanced performance across all risk categories, whereas the decision tree and logistic regression models showed reduced sensitivity in the medium-risk class. These results suggest that ensemble and margin-based models are more effective for screening-oriented risk stratification. The comparison of model performance is shown in Table 2.

Fig. 5. Confusion matrix for four classification models. The y-axis is the number of the data based on risk level. SVM: support vector machine.
The ROC curve evaluates a model’s ability to distinguish between classes. It displays the relationship between the true positive and false positive rates. Performance is measured by the AUC.
As shown in Figure 6, RF and SVM had the most consistent discriminative performance across all risk categories. Meanwhile, decision tree and logistic regression showed greater variability. Overall, ensemble-based models demonstrated superior class separability under the applied internal validation framework.

Fig. 6. Classification of receiver operating characteristic (ROC) curve for all evaluated models. AUC: area under the curve; SVM: support vector machine.
SHAP is used to explain how each feature contributes to a model’s prediction. It provides a transparent, in-depth interpretation of the decisions that are made by an ML algorithm.
Figure 7 shows that the SHAP analysis revealed that dominant features varied across models, reflecting differences in learning mechanisms. Although their relative importance differed depending on the algorithm, blood pressure and blood glucose variables frequently appeared among the top contributors.

Fig. 7. SHAP visualization illustrating the contribution of each feature to the model’s prediction. BP: blood pressure; SHAP: Shapley additive explanations; SVM: support vector machine.
LIME is used to explain how features contribute to model predictions at the local level. This facilitates interpretation of complex decisions and reveals differences in focus between algorithms for each preeclampsia risk category.
In Figure 8, LIME analysis reveals the differences in focus of each algorithm. It highlighted model-specific local decision patterns, reinforcing the complementary role of local explanations alongside global interpretability methods. Although blood pressure variables were consistently among the most influential features across models, their relative importance varied depending on the algorithm and explainability method.

Fig. 8. LIME visualization for explaining the model’s prediction of preeclampsia risk. BS: blood sugar; BT: body temperature; DBP: diastolic blood pressure; HR: heart rate; LIME: local interpretable model-agnostic explanations; SBP: systolic blood pressure.
PFI visualization is used to assess the contribution of each feature to model performance. A feature’s importance is determined by how much accuracy decreases when its value is randomized. The largest decrease indicates the feature’s most significant influence on preeclampsia risk prediction.
PFI (Figure 9) shows feature sensitivity differed across models. While blood pressure variables frequently contributed to performance changes, other features, such as blood glucose and body temperature, played model-dependent roles.

Fig. 9. Permutation Feature Importance (PFI) visualization: Showing the decrease in model performance when each feature is randomly permuted. SVM: support vector machine.
The consistent model performance across multiple evaluation metrics, including accuracy, recall, the F1 score, and the AUC, indicates stable internal performance. A key strength lies in the focused evaluation of the medium-risk group, which is the most clinically relevant and challenging to classify. Furthermore, integrating multi-method XAI (SHAP, LIME, and PFI) enhances the interpretability of the model, with comprehensive XAI visualizations facilitating an understanding of the prediction mechanisms.
Notably, the relative importance of features varies depending on the explainability method and learning algorithm used, reflecting the fundamental differences between global interpretability frameworks, such as SHAP and PFI, and local interpretability frameworks, such as LIME. These variations suggest that feature dominance depends on the model and method rather than being absolute. Therefore, XAI findings should be viewed as different perspectives on model behavior rather than definitive rankings of clinical importance.
From a clinical screening perspective, misclassification, particularly false-negative predictions in the medium-risk group, might delay closer monitoring or early intervention. This highlights the importance of cautious interpretation and reinforces that the proposed models are intended to support early risk screening rather than definitive diagnosis.
The RF and SVM algorithms deliver the most accurate predictions. They achieve an accuracy rate of 78%, a precision rate of 0.77, and an F1-score of 0.77. The main advantage of this study lies in the application of the multi-method XAI approach, which combines SHAP, LIME, and PFI methods to improve model transparency and interpretability. This improves the technical transparency of model behavior and feature contributions. However, because no formal usability testing or expert evaluation involving clinicians was conducted, interpretability claims are limited to methodological assessment rather than validated clinical usability.
Due to differences in datasets, outcome definitions, and validation strategies, direct comparisons with previous studies should be interpreted cautiously. Nevertheless, the proposed models demonstrate balanced internal performance within the context of the available data. Studies (22) and (34) utilized one algorithm, which achieves accuracies of 79.5 and 65.22%, respectively. Studies (31) and (16) emphasized the role of AI in improving accuracy; the best model (SGB) achieved 97.3% accuracy but was not highly interpretable. Therefore, this study’s advantage lies not only in the stability and balance of the model’s performance but in its ability to identify the most influential variables (e.g., blood pressure and blood glucose levels) in predicting preeclampsia risk.
All the predictors used in this study were collected at a single time point using a cross-sectional design. While this approach is suitable for creating a baseline screening model, it is not adequate to support long-term clinical inference or analyzing disease progression. Future research should therefore incorporate longitudinal or trend-based clinical data in order to capture the dynamic physiological changes that occur during pregnancy. This could enhance risk stratification and improve the predictive performance of preeclampsia screening models.
Although RF and SVM achieved the highest overall accuracy (78%) and stable classification across all risk groups, this remains a moderate level for clinical application. False negative predictions within the medium-risk group, in particular, may delay early intervention and allow progression to high-risk preeclampsia without timely monitoring. From a clinical perspective, the balanced performance observed across risk categories, especially within the medium-risk group, suggests that the proposed models could serve as screening aids rather than definitive diagnostic systems. Future work should involve testing the model’s transportability across diverse populations.
Four ML algorithms, which were utilized, are logistic regression, decision tree, SVM, and RF. The evaluation used the multi-method XAI approach, which combines SHAP, LIME, and PFI. The results showed that RF and SVM provided the best performance. They achieved 78% of accuracy and made consistent predictions across all risk categories. Multi-method XAI analysis indicated that blood pressure variables frequently appeared among influential features, although their relative importance varied across models and explainability methods. The multi-method XAI approach not only improves accuracy but also provides transparency and interpretability by enhancing methodological transparency and supporting model interpretability.
These findings might support future decision-making based on screening by providing transparent insight into model behavior. However, further clinical validation is required before real-world implementation. Therefore, the proposed model should be interpreted as a tool to support early risk screening and assist healthcare professionals in identifying potential preeclampsia risk, rather than as a diagnostic system or a substitute for clinical judgment.
Ethical clearance for this study was obtained from the Anak Air Community Health Centre (Puskesmas Anak Air) in the Koto Tangah district of Padang city. A written informed consent was obtained from all participants prior to data collection. To ensure confidentiality and compliance with ethical research standards, all personal identifiers were removed, and the data were fully anonymized before analysis.
Sari Puspita supervised the research, administered the project, provided resources, and critically revised the manuscript. Gusrino Yanto conceptualized the study, collected and processed the data, performed machine learning analysis, implemented the Multi-Methode XAI, and drafted the original manuscript. Rifa Turaina and Nency Extise Putri contributed to manuscript refinement and critical review.
The data that support the findings of this study are available from the corresponding author upon reasonable request.
AI-assisted tools, including language refinement and grammar correction software, were used for improving linguistic clarity and formatting of the manuscript. No part of the analytical work, data-set processing, or result generation was performed using AI-generated content.
The study used retrospective secondary data collected in 2024 from the Anak Air community health center. The dataset was fully anonymized prior to being provided to the authors, and no identifiable personal information was accessed during the analysis. According to institutional research guidelines, retrospective analysis of anonymized secondary data does not require formal IRB approval. Therefore, an ethics approval number is not applicable. Informed consent was waived due to institutional data protection policies, but may be made available from the corresponding author upon reasonable request. The study was not prospectively registered because it was not a prospective clinical study.
Copyright Ownership: This is an open-access article distributed in accordance with the Creative Commons Attribution Non-Commercial (CC BY-NC 4.0) license, which permits others to distribute, adapt, enhance this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See http://creativecommons.org/licenses/by-nc/4.0. The authors of this article own the copyright.