Abstract
Backgrounds Observational studies plays an important role in urology studies, But few studies have paid attention to the statistical reporting quality of observational studies. The purpose of this study was to investigate the frequency and evaluate the reporting quality of statistical methods of the published observational studies in urology.
Methods The five urology journals were selected according to the 5-year impact factor. A systematic literature search was performed in PubMed for relevant articles. The quality of statistical reporting was assessed according to assessment criteria.
Results A total of 193 articles were included in this study. The mean statistical reporting score of included articles was 0.42 (SD=0.15), accounting for 42% of total score. The items that must be reported with a reporting rate more than 50% were: alpha level (n=122, 65.2%), confidence intervals (n=134, 69.4%), name of statistical package (n=158, 84.5%) and exact P-values (n=161, 86.1%). The items with a reporting rate less than 50% were: outliers (n=2, 1.0%) and sample size (n=13, 6.7%). For multivariable regression models (liner, logistic and Cox), variables coding (n=27, 40.7%), validation analysis of assumptions (n=58, 40.3%), interaction test (n=43, 30.0%), collinearity diagnostics (n=5, 3.5%) and goodness of fit test (n=6, 5.9%) were reported. Number of authors more than 7(OR=2.06, 95%CI=1.04-4.08) and participation of statistician or epidemiologist (OR=1.73, 95%CI=1.18-3.39) were associated with the superior reporting quality.
Conclusion The statistical reporting quality of published observational studies in 5 high-impact factor urological journals was alarming. We encourage researchers to collaborate with statistician or epidemiologist. The authors, reviewers and editors should increase their knowledge of statistical methods, especially new and complex methods.
Introduction
Nowadays, more and more medical researchers notice the phenomenon that the main results of some research cannot be reproduced (1, 2), and improper use of statistical methods or inadequate statistical reporting may be the important reasons for this phenomenon. The low-quality statistical reporting may not make full use of research results, resulting in a waste of valuable information and varying degrees of bias. In addition, it may also make editors and readers unable to measure the reliability of research, and make readers misinterpret the results of the study, thus leading to errors in the secondary studies (3).
Truthly, the poor statistical reporting problem is long-standing, but few people pay attention to it. In 1966, the first paper on the quality of medical literature statistical report was published (4), and then dozens of similar studies were published (5). These studies found that the application of statistical methods is becoming more and more complicated, but the problem of insufficient statistical reporting has always existed. To make matters worse, the literature evaluated by these studies was from influential general medical and professional journals.
In 2015, T.A. Lang et al. published the “Statistical Analyses and Methods in the Published Literature’’ (SAMPL) guidelines to improve the quality of basic statistical reporting (6). The principle of SAMPL is that “authors should describe statistical methods with sufficient detail to enable readers in the professional domain to access raw data to verify the results of the report”. In 2017, Pentti Nieminen et al. made the SIMA (Statistical Intensity of Medical Articles) tool and assessed the statistical intensity in the high impact factor respiratory journal’s articles, and found that approximately one third of the respiratory papers provided incomplete description of their statistical reports (7, 8). Even though the SAMPL guidelines and SIMA broaden the standards for the scrutiny of statistical methods, there is still a void in requiring or assessing the reporting of statistical methods in observational studies.
Therefore, we carried out this study to describe the frequency and trends of statistical methods used in high-impact urology journals, and evaluate the reporting quality of statistical methods. We hope this study can identify the major quality deficiencies in the statistical reporting of urological observational studies, and promote the improvement of statistical reporting quality.
Material and Methods
Journals selection and search strategy
Top five urology journals were selected according to 5-year impact factor (excluding review journals), including European Urology (IF=17.581), Journal of Urology (IF=5.157), BJU International (IF=4.688), Prostate Cancer and Prostatic Diseases (IF=4.099), and Prostate (IF=3.820). The relevant studies from January 1, 2009 to September 30, 2019 were searched in the PubMed. The search strategy was shown in Figure. 1.
Articles selection
Articles that met the following criteria were selected: (1) original articles; (2) observational studies, including cross-sectional studies, case-control studies, and cohort studies; (3) studies on humans, including both adults and children. The exclusion criteria were as follows: (1) review articles and case reports, (2) quasi-randomized trial, randomized controlled trials, and other interventional studies, (3) unpublished data and published abstracts only.
The articles retrieved were preliminary reviewed according to titles and abstracts by two investigators independently. Any disagreement was resolved by consulting with senior authors. After the initial screening, two investigators retrieved the full texts of relevant researches and determined the final list based.
Frequency and trends of statistical methods used in the included articles
Two investigators extracted data independently from the included articles. Statistical methods used in the article and general characteristics were extracted, including name of journal, publication time, origin of corresponding author, type of study, number of authors, impact factor, funding support, the affiliation of corresponding author, international collaborative authorship, and participation of statistician or epidemiologist.
A standardized evaluation form (see Appendix 1) based on Emerson et al.(9) was used to record the frequency and trends of statistical methods for observational studies published in the selected urology journals. The original study divided statistical methods into 21 categories, 20 categories with the exception of benefit analysis have been incorporated into this checklist with slightly modified. The items containing the following information have been added to this checklist: multiple comparisons, repeated measurement data analysis, consistent measurement, Bayesian analysis.
Statistical reporting quality of the included articles
Two reviewers applied the assessment criteria (see Appendix 2) independently to appraise the statistics reporting quality of included studies. This checklist was established based on the SAMPL guidelines (6) and other previously published studies (10-12), and the items were modified to be listed in a simple and readable manner. All of the logistic regression assessment items droved from Zhang’s research (13), and Cox regression items were from Zhu’s research (14). The checklist consists of 7 items (marked *) that must be reported and 39 items that are subject to selective reporting based on the statistical methods used. The proportions of items that were adequately reported in the statistical methods used were calculated as statistical reporting quality score for each article. The total score of statistical reporting score of each article was 100% (1). The evaluations that two investigators argued over were resolved by discussion with the senior author.
Statistics analysis
Continuous variables with normal distribution were represented by mean ± standard deviation (SD), non-normal variables by medians (interquartile range), and categorical variables by frequency (percentage). The comparisons of means between the two groups were performed by independent Student’s t-test, and multi-group by one-way analysis of variance (ANOVA). The Mann-Whitney U test and Kruskal-Wallis test were used to analyze variables for non-normal variables. According to a cutoff value of the 75 percentile of the statistical reporting quality score, the articles were divided into high and low quality groups. Univariate and multivariate logistic regression analyses were performed to identify the factors affecting statistical reporting quality. The variables with a P ≤0.05 for univariate logistic analyses were included in the multivariable logistic regression model. All the statistical analyses were performed using SPSS version 21.0. All significance tests were two-sided, and P ≤ 0.05 was considered as significant.
Results
Search results
The initial search of databases confirmed 8,605 potentially relevant articles without duplication. After the screening of titles and abstracts, 8,326 articles were excluded. In total, 279 full-text articles were further reviewed, with 84 articles were subsequently eliminated for various reasons. Finally, 193 relevant articles were included. The process of literature retrieval was shown in the flow diagram (Figure.2).
General characteristics of the included articles
Of the 193 articles, 58 (30.05%) were cross-sectional studies, 56 (29.02%) were case-control studies, and 79 (40.93%) were cohort studies. Characteristics of included articles were shown in Table.1.
The frequency of statistical methods applied in the included articles
The frequency of statistical methods in included articles was shown in Table.2. 6 (3.1%) articles didn’t use statistical methods or used descriptive statistics only. 36 (18.7%) articles only used simple methods such as t-test and chi-square test, 54 (28.0%) articles only adopted multivariable analyses like logistic regression and cox regression, 97 articles (50.3%) both employed simple and multivariable analyses. Logistic regression (n=93, 48.2%) and Chi-square test (n=90, 46.6%) were the most frequently used statistical methods in included studies. 7 articles (3.6%) used propensity score method. None of the articles used Bayesian methods, artificial neural networks and machine learning. The number of statistical methods in articles with 7 or more authors was more than others. The trend of the percentages of the different statistical methods used over time was shown in Figure.3.
The reporting quality of statistical methods of included articles
The mean score of reporting quality in included articles was 0.42 (range: 0.11-0.82) with a standard deviation of 0.15. 132 (68.4%) articles have a score less than 0.5. The statistical reporting score has not been improved over the past decade. The articles that number of author greater than or equal to 7 had a higher score than others, but the difference was not statistically significant. The statistical reporting score was similar in three type studies. The articles that have participation of statistician or epidemiologist tend to have a higher statistical reporting score.
As shown in Table.3, three of the seven items that must be reported were often missed: outliers (n=2, 1.0%), sample size (n=13, 6.7%), missing data (n=19, 9.8%). Two items were reported suboptimal: one or two tailed (n=84, 44.9%), confidence intervals (n=134, 69.4%). In addition, 158 articles (84.5%) reported the name of statistical package or program with 161 articles (86.1%) reported exact P-values of significance test.
The quality of the multiple analyses reporting left much to be desired. The mean of reporting score of the multiple logistic regression was 0.24, 0.27 for multiple cox regression, and 0.29 for multiple linear regression. The item that was reported most often in multiple logistic regression was ‘‘sufficient events’’ (n=92, 98.9%) followed by conformity with linear gradient (n=25, 26.9%) and interactions between independent variables (n=24, 25.8%). The least-reported item was colinearity analysis (n=3, 3.2%), and the goodness of fit test was slightly better (n=5, 5.4%). Methods for variable selection (n=23), coding of variables (n=19), reason of selection of variables (n=8) and validation of the statistical model (n=6) were reported in 24.7%, 20.4%, 8.6% and 6.5% of the articles, respectively. Of the reporting of Cox regression models, assumption of proportional hazard were reported in 66.7% studies (n=28). However, important information such as interaction test (n=17, 40.5%), methods for variable selection (n=7, 16.7%), variables coding (n=7, 16.7%), reason of selection of variables (n=5, 11.9%) and colinearity test (n=1, 2.4%) were rarely reported. Seven articles (17.1%) assessed assumptions for t-test and 1 article for ANOVA (4.0%). Nine articles (23.1%) described the reason for conducting non-parametric tests.
The included articles were divided into inferior (n=144) and superior reporting quality groups (n=49) based on the cut-off value (0.50). As shown in table.4, univariate logistic regression analyses showed the following factors were related to the high reporting quality: participation of a statistician or epidemiologist (OR=1.92, 95%CI=1.04-3.71) and number of authors (OR=2.23, 95%CI=1.13-4.37).
Multivariate logistic regression analysis likewise demonstrated that participation of a statistician or epidemiologist (OR=1.73, 95%CI=1.18-3.39) and number of authors (OR=2.06, 95%CI=1.04-4.08) were associated with the superior reporting quality. The multivariate logistic model has sufficient events (the ratio of outcome events to independent variables was 24.5). The likelihood ratio test was used to validate the goodness of fit of the model, it was found that the fitting degree of the model was good (χ2=0.84, P=0.656). The max variance inflation factor (VIF) was 1.026 to indicate no multicollinearity among the independent variables.
Discussion
The study estimated the frequency, trend and reporting quality of statistical methods of 193 observational studies published in five high-impact urology journals after 2008. The number and complexity of statistical methods have not changed in recent years, and the reporting quality of statistical methods has not improved either.
Compared with other disciplines in the medical field, fewer statistical methods were used in urology studies (15, 16). This is an alarming phenomenon, because researchers need to use several different statistical methods when facing different distribution data or different study purposes. Logistic regression models and chi-square test are most frequently used statistical methods in urological observational studies, because observational studies make heavy use of binary endpoints. Different from previous research results, our study indicated that the statistical methods of observational studies in urology have not changed more complicated in the past years (15-17). During recent years, statisticians have introduced new and complex methods that are attributable to the rapid expansion in computing capability, such as, Bayesian methods, artificial neural networks, and machine learning. However, we found no reference to these methods in any of the evaluated articles. These may be on account of the new complex statistical methods are not included in the introductory or second-level statistics courses, few authors are familiar with these methods. 36 (18.7%) articles only used simple tests, however, the use of simple statistical methods only, may be generally insufficient. Sophisticated and valid methods should be used based on study designs and data properties to avoid the bias and inflation of the type I error rate for the treatment effect.
Standardized statistical reporting of medical papers not only help editors or reviewers better understand research design to improve the quality of journal papers, but also enable readers in related fields to better understand the content and results of research to enrich their professional knowledge. However, the reporting quality of statistical methods in this study was inferior and still needs to be improved in many aspects. The average score of reporting quality in included articles was 0.42, and 68.4% articles have a score less than 0.5, this means that the statistical reporting adherence of most articles was less than 50%.
Pre-study calculation of the sample size is necessary. The correct sample size of a study has the advantage of enhancing feasibility, reducing costs, and also has ethical implications. If the sample size is too small, it can not detect the effect, causing type II errors. Too large sample size may not only waste time, resources and money, but also difficult to implement. The Strengthening the Reporting of Observation Studies in Epidemiology (STROBE) statement also recommends researchers should explain how the study size was arrived at when they conducting observational studies (18, 19). However, only 6.7% articles reported the calculation of sample size or power analysis in this study. Data missing is a common and unavoidable problem in the medical studies. If the missing data are not handled properly, it will cause bias or insufficient use of data, thus reducing the efficiency of the study or biased inferences (20, 21). Nevertheless, we found that only 9.8% included studies indicated number of variables with missing data. Multivariate analyses are very sensitive to outliers, as outliers may cover or cause multicollinearity between independent variables and affect the parametric estimation of the model. However, just 1% articles detected outliers of the original data. The encouraging finding in this study was that the statistical software (84.5%) and exact P values (86.1%) were reported at high rates as essential requirements for statistical analysis. But there is still room for improvement. In statistical inference, it is advisable to report the exact P-values and confidence intervals. A large confidence interval means that the small sample size and large random error. In this way, the conclusion drawn by the exact P is questionable.
We found that 133 articles (68.9%) employed simple statistical methods and several problems were occurred in the reporting of these methods. The use of the t test and ANOVA requires that the data follow normal distribution and homogeneity of variance, whereas very fewer articles mention it. Using non-parametric tests on data suitable for parametric tests will reduce statistical power. Therefore, the reason for using non-parametric tests should be stated. However, only 23.1% articles described the reasons.
The confounders controlling is a crucial step in analytical observational studies, and multivariable analyses are widely used as statistical adjustment techniques (22). 151 articles (78.2%) applied multivariable regression models such as logistic, Cox, linear, or Poisson regression. Several previous studies have shown that there was still room for improvement in quality of the reporting of multivariate regression (13, 22-24), which was consistent with our findings. Only 6 articles (5.9%) described goodness of fit test. By contrast, Casals et al. conducted a systematic review of 108 articles, and found testing for goodness-of-fit was reported in 15.7% (23). 96.5% included articles did not consider collinearity diagnostics, which may result unstable regression coefficients or wide confidence intervals, and even affect the selection of variables. Real, J et al. found that 26.2% articles that used multivariable analyses described linear gradient for continuous or rank variables (22). This reporting rate in our study was 40.3%, higher than their findings, but far from ideal as well. This may be due to the lack of automatic options for this test in current common statistical software.
The result of multivariate logistic regression analysis demonstrated that the participation of a statistician or epidemiologist was associated with high reporting quality. Professional statisticians or epidemiologists can not only provide correct guidance for data processing in research, but also better understand what important information in statistical analysis should be reported. We recommend that researchers seek the assistance of statistical professionals when processing data and writing articles.
Some limitations were present in our study as well. First, we only searched five top journals and not the full breadth of the articles related to urology. However, the highest impact factor journals have the utmost visibility in the urology literature and are likely more relied upon by doctors to inform practice. Second, only the most commonly used statistical methods are included in our checklists, and some rarely used statistical methods are not included.
Conclusion
The statistical reporting of published observational studies in 5 high-impact factor urological journals was often ambiguous, especially for multivariable regression models, and there is room for improvement. The participation of statistician or epidemiologist may improve statistical reporting quality. The authors, reviewers and editors should increase their knowledge of statistical methods, especially new and complex methods. We recommend editorial board of journal publish guidelines to guide and facilitate critical appraisal of statistical reporting.
Supporting Information
Appendix S1 Common statistical methods in medical studies (Docx)
Appendix S2 Reporting of common statistical methods in evaluated journals (Doc)
Author contribution
Conceived and designed the experiments: DSY ZXB LBB. Performed the experiments: DSY XH ZJG. Analyzed the data: DSY LBB. Contributed reagents/materials/analysis tools: DSY ZXB LBB. Contributed to the writing of the manuscript: DSY LBB ZXB.