Abstract
Background
Direct empirical evidence for the existence of outcome reporting bias is accumulating and this source of bias is recognised as a potential threat to the validity of metaanalysis of randomised clinical trials.
Methods
A method for calculating the maximum bias in a metaanalysis due to publication bias is adapted for the setting where withinstudy selective nonreporting of outcomes is suspected, and compared to the alternative approach of missing data imputation. The properties of both methods are investigated in realistic small sample situations.
Results
The results suggest that the adapted Copas and Jackson approach is the preferred method for reviewers to apply as an initial assessment of robustness to withinstudy selective nonreporting.
Conclusion
The Copas and Jackson approach is a useful method for systematic reviewers to apply to assess robustness to outcome reporting bias.
Background
Publication bias, whereby an entire study is either published or not depending on its results, is recognised as a potential threat to the validity of any metaanalysis. Empirical research suggests that published work is more than twice as likely to be statistically significant (p < 0.05) than unpublished research [1]. Several different methods for the identification of and adjustment for this source of bias are available [2]. More recently, asymptotic results have been presented that allow the metaanalyst to estimate the maximum bias possible in the treatment effect estimate assuming a certain number of trials have not been published [3]. Although the method is proposed as a sensitivity analysis, by varying the number of trials assumed to be missing, rather than a correction for bias, it is still important to investigate how well the result holds since metaanalyses typically include only 5–10 studies. Copas and Jackson suggest bootstrapping as a method for doing this however this is unlikely to be implemented by systematic reviewers unless made available in metaanalysis software.
Withinstudy selective reporting, or outcome reporting bias, has been defined as 'the selection of a subset of the original variables recorded for inclusion in publication of trials' [4]. This may include the situation where only one of several outcomes measuring similar things may be reported, e.g. weight gain at three or six months, or it may relate to a subtly different selection issue where a particular outcome is not reported on the basis of the results obtained [5]. Direct empirical evidence for the existence of such bias is accumulating [6,7]. In a metaanalysis it is often the case that a total number of eligible studies k are identified but only n (k > n) report the data of interest. The reviewer needs to examine the remaining (kn) studies and try to establish whether the particular outcome has been collected but not reported. This should ideally involve contact with the original trialists which may result in missing data being made available or it may confirm that the outcome data were not recorded [8]. However it is likely that in a subset of these studies, m (≤ kn) say, no such information is forthcoming. If the level of suspicion that selective nonreporting has occurred in these m studies is high, it has been recommended that a sensitivity analysis be undertaken assuming such bias has occurred [8].
We have previously proposed and applied a method for imputing data in this setting when the outcome of interest is binary and each trial compares two treatments [8]. Applying the Copas and Jackson bound with known m is easier to compute than the imputation method and therefore potentially very useful as an initial sensitivity analysis to assess robustness to extreme withinstudy selective nonreporting. In this paper we present a simulation study undertaken to assess how the two methods perform when bias arises as a result of various realistic suppression models.
Methods
In sections 2.1 and 2.2 we describe the two methods of bias adjustment based on theoretical considerations and imputation respectively. Each method is applied to five metaanalyses from a cohort previously described [8]. Some discrepancies between results from the two methods were noted. In section 2.3 we describe a simulation study undertaken to gain an understanding of how the two methods perform in realistic metaanalysis and trial suppression model settings.
2.1 Maximum bias bound
For a given number of unpublished studies, m, the maximum bias, b, in the treatment effect estimate is given by the formula
where n is the number of studies reporting data and σ_{i }is the standard error of the treatment effect estimate in study i [3]. The approach assumes that larger studies (with small standard error) are more likely to be published than smaller studies (with larger standard error). The number of unpublished studies is usually unknown and thus sensitivity analysis, varying the unknown number of unpublished studies, m, is recommended. In the context of withinstudy selective nonreporting, the number of studies found to be eligible, where the outcome is known or suspected to have been measured but no results were presented, is known. If we take m to equal this number, (2.1) can be used to assess the robustness to this form of bias. A pooled effect estimate is first calculated from the n studies reporting data and either a fixed or random effects model as appropriate The biasadjusted estimate is calculated by adding the value of this bound to the pooled effect estimate. Either +b or b is added depending on the direction of effect such that the estimate is moved closer to the null.
2.2 Imputation of missing data
The maximum possible value for the pooled log odds ratio was also estimated by imputing missing data, specifically the number of events in each treatment group, for each of the m studies under the extreme assumption that the reason they are missing is because the two tailed pvalue from the trial was greater than 0.05, and then combining this with observed data [8]. All possible imputation combinations are enumerated and in turn are pooled with the available data from the other n studies to produce a histogram of the metaanalysis estimates. For those examples where τ^{2 }> 0, random effects estimates were also calculated incorporating the betweenstudy variance estimated from the observed data. The maximum imputationadjusted estimate for the pooled log odds ratio is taken from the distribution of all possible values under this imputation method. An assessment is then made as to whether the inference is robust to the extreme value of the histogram or not. This approach is attractive since establishing robustness to the most extreme scenario avoids the need to undertake more complex analyses.
The imputation can be constrained by using information, either in the report or from clinical knowledge, to reduce the range of the possible number of events in a particular treatment group. For example, reported data from an associated outcome such as cancerspecific mortality may be taken as the lower limit for allcause mortality. Unconstrained data were imputed for all examples. Constrained data were imputed for two reviews as follows [8]. In the cancer example, information from trial reports on subgroups and infectionrelated mortality was used to limit the range of the possible number of overall deaths. In the immunoglobulin review, by definition the number of serious infections had to be at least equal to the number of events reported for either sepsis or death from infection, providing a lower bound.
2.3 Simulation study
Data for the treatment group were simulated from a Bin(N, pt) distribution, for the control group from a Bin(N, pc) distribution. The choice of pt and pc determines the log odds ratio, log(OR). The sample size, N, was varied across trials according to a normal distribution with mean N_{μ }and standard deviation υ * N_{μ }where υ could take values between 0 and 1 with increasing values providing greater sample size variability between trials within a metaanalysis. A lower limit for N was applied such that values of N could not be less than 10 by setting any values generated less than this equal to 10. Data were simulated from k = n +m trials.
The true relative treatment effect was varied with values of the odds ratio taken as 1, 0.7, 0.5, and 0.3 by setting pt equal to 0.25 and varying the value of pc. The total number of trials before suppression, k = n + m, was taken as 5,6,7,8,10, and 15 with the value of m allowed to take values equal to but not greater than n. Trial size was varied by taking values for N_{μ }of 25, 50, 75, 100, 150, 200, and 300 and υ of 0.05, 0.15, and 0.35.
Three selection mechanisms have been considered: standard errors, one and twotailed pvalues. Suppression on the basis of one and twotailed pvalues and standard errors is not the same as the selection mechanism resulting in the Copas and Jackson bias bound being attained. However our objective in this paper is to assess how robust the methods are to realistic suppression models. A selection mechanism based on standard errors alone will not systematically bias the results of a metaanalysis, although there will be some loss of precision. Selection mechanisms based on one and twotailed pvalues will provide similar results the further the true effect is from the null. However there will be differences in the studies suppressed and resulting bias when the true odds ratio is close to or equal to one. In this situation suppression based on onetailed pvalues will systematically bias the results in one direction by suppressing a corner of the funnel plot producing an asymmetrical funnel, while suppression using two tailed pvalues will produce a hollow funnel similar to suppression based on standard errors and will not systematically bias results in a particular direction. Therefore of the three selection mechanisms onetailed pvalue selection will produce the strongest bias and is arguably the most realistic suppression mechanism as it allows for differential selection according to the direction of the treatment effect.
The true bias in the treatment effect estimate was calculated as the difference between the pooled log(OR) from all n + m trials (unsuppressed) and the pooled log(OR) based on the selected trials. The Copas and Jackson bias bound was calculated using (2.1) above. For each simulated dataset, the difference between the pooled log(OR) from all n + m trials (unsuppressed) and the Copas and Jackson adjusted estimate was calculated. This simulation exercise was repeated 10,000 times for each combination of parameters pt, pc, N_{μ}, and υ. For settings where the alternative method of imputation may be considered, namely small study sizes and few trials not reporting the data of interest, results were also obtained after adjustment via the imputation method for comparison. In settings where larger numbers of larger studies are missing, imputation was not considered due to the computational time involved [8].
Results
3.1 Results for real examples
Table 1 shows the pooled estimates for each example following bias adjustment via both imputation and the Copas and Jackson approach (original data available on request from the first author). There are some differences in the results from the two methods. In the cancer metaanalysis, there is both a small number of studies presenting results in the trial publication and a large variability in the size of the standard errors of the reported effect estimates, and the difference between the results from the two methods of adjustment is marked. The albumin example includes a larger number of trials similar in size however, and the biasadjusted estimates are almost identical.
Table 1. Estimates of biasadjusted pooled effect in five metaanalysis case studies. FE: fixed effects, RE: random effects.
The difference between the two estimates is most marked in the cancer example with the imputation approach resulting in a much greater bias adjustment. Clearly the smallest studies are not suppressed here but neither are they in the other examples where the methods gave more similar results. Motivated by this example in particular, our aim in the rest of this paper is to understand how the two methods perform in a variety of settings via a simulation study.
3.2 Simulation study
Figures 1, 2, 3, 4, 5, 6 show the results of the simulation study for the effects of increasing the treatment effect size and variability in trial size around the average shown when suppression is based on the onetailed pvalue. The median difference between the true bias and the Copas and Jackson bias bound is plotted against the median true bias, where median values have been found from the distribution of values across the 10,000 simulated datasets.
Figure 1. The numbers on each line represent the number of studies suppressed. The yaxis is the difference between the estimated bias and the Copas and Jackson bias bound. Simulation results: OR = 1, selection based on one tailed pvalues, trial size variability = 0.05.
Figure 2. The numbers on each line represent the number of studies suppressed. The yaxis is the difference between the estimated bias and the Copas and Jackson bias bound. Simulation results: OR = 0.7, selection based on one tailed pvalues, trial size variability = 0.05.
Figure 3. The numbers on each line represent the number of studies suppressed. The yaxis is the difference between the estimated bias and the Copas and Jackson bias bound. Simulation results: OR = 0.5, selection based on one tailed pvalues, trial size variability = 0.05.
Figure 4. The numbers on each line represent the number of studies suppressed. The yaxis is the difference between the estimated bias and the Copas and Jackson bias bound. Simulation results: OR = 1, selection based on one tailed pvalues, trial size variability = 0.35.
Figure 5. The numbers on each line represent the number of studies suppressed. The yaxis is the difference between the estimated bias and the Copas and Jackson bias bound. Simulation results: OR = 0.7, selection based on one tailed pvalues, trial size variability = 0.35.
Figure 6. The numbers on each line represent the number of studies suppressed. The yaxis is the difference between the estimated bias and the Copas and Jackson bias bound. Simulation results: OR = 0.5, selection based on one tailed pvalues, trial size variability = 0.35.
The plots show that, under the selection model based on onetailed pvalues, the bias increases as the number of suppressed trials increases, and the Copas and Jackson bias bound overestimates the bias, as expected. This conservatism increases as the number of suppressed trials increases. These results were similar across all three suppression models. Importantly these results provide information on the degree of overestimation in realistic settings.
Table 2 shows the 2.5th, 50th and 97.5th centiles of the distribution of the difference between the pooled log(OR) from all n + m trials (unsuppressed) and the biasadjusted estimates from the two methods for selected simulation parameters. The number of trials has been chosen to be five or 10 since most metaanalyses fall into this range. The results indicate that for a metaanalysis that includes a small number of small trials, the bias could be substantially underestimated in a proportion of cases. This was not the case however for selection based on standard errors. For suppression based on one and twotailed pvalues, the accuracy of the Copas and Jackson estimate increased with increasing number of trials, larger trial size, decreased trial size variability and larger treatment effects. For suppression based on standard errors, the accuracy of the Copas and Jackson estimate increased with increasing number of trials, larger trial size, increased trial size variability and smaller treatment effects.
Table 2. Selected simulation study results. C&J: Copas and Jackson bias bound approach
The simulation results demonstrate that the imputation method leads to systematically greater overadjustment for bias compared to the Copas and Jackson method. We believe this to be the explanation for the difference between the results of the two methods in the cancer example shown in Table 1.
Discussion
In this paper we advocate that robustness of the metaanalysis to outcome reporting bias be assessed where there is a high level of suspicion that withinstudy selective reporting has occurred. We recommend sensitivity analysis rather than adjustment, since correction for bias is impossible without knowledge of the exact selection mechanism operating. Understanding selection bias is made all the more difficult when one recognises that the process may vary across different fields, for example mechanisms operating in genetic epidemiology may differ from clinical trials [9].
In some instances it may be obvious that an outcome was measured even if not reported given the other outcomes included in a trial publication. For example, if causespecific mortality is reported then overall mortality must have been measured, even if not reported. In other situations it may be that a battery of tests or measurements are usually undertaken together, for example systolic and diastolic blood pressure, such that if one outcome is reported but another is not, one should be suspicious that the latter may have been selectively not reported. However it is probable that it will often be difficult to assess whether the outcome was measured and judgment will be required. The ORBIT (Outcome Reporting Bias In Trials) project, funded by the UK Medical Research Council, will attempt to further our understanding of the processes resulting in selective outcome reporting through interviews with clinical trialists. The sensitivity and specificity of a method for assessing outcome reporting bias in a trial will be estimated by comparing the assessment based on all trial reports with the information obtained directly from trialists.
Researchers often rely on the shape of funnel plots to detect publication bias however empirical studies suggest that this may be misleading [10]. Tests for asymmetry in a funnel plot, including that of Egger, have low power in typical metaanalyses involving 5–10 trials [11]. It may be of interest to undertake such a test, and a significant result may be taken as evidence of asymmetry, however the sources of bias, be they methodological quality or outcome reporting bias or some other small study effects, need further investigation. We prefer not to rely on a nonsignificant result as indicative of a lack of bias when evidence from clinical opinion or the trial paper, e.g. pvalue for the outcome reported to be >0.05, clearly raises our level of suspicion.
Assessing the robustness of a metaanalysis to extreme withinstudy selective nonreporting is a useful first stage. Imputation of missing data has the advantage that it can be constrained by information given in the report. However the disadvantages include the programming and computational time required as well as the limitation to binary outcomes. The approach suggested in this paper based on the method of Copas and Jackson has the advantage of being simple to compute. However information given in the report cannot be used to provide a tighter bound.
The simulation results provide useful information for practical metaanalysts. The approach taking the extreme estimate under imputation has been shown to work poorly by severe overadjustment. The Copas and Jackson adjustment works well for most cases investigated under a variety of true suppression models. In situations where the treatment effect is small, trial sizes are small and/or variable, the number of studies with available data is small and the number with missing outcome data large, the approach was found to be less accurate. However, in these situations the adjustment is conservative, and hence a metaanalysis which is found to be robust after this degree of adjustment can be considered to be robust to this form of bias.
There are several issues in this work worthy of further development and investigation. Firstly, the application of the method to the problem of withinstudy selective nonreporting makes no allowance for the possibility that there are also further studies that may have collected data on the outcome of interest that are simply not known about. As with other papers in this area [8,12], this should be viewed as an initial exploratory analysis. If the metaanalysis is not robust to withinstudy selective reporting bias this needs to be recognised. If the results are robust, further work should be undertaken regarding bias due to unpublished studies. Further work is needed to evaluate this twostage approach and also to consider methods to allow for both within and betweenstudy selective reporting simultaneously. Secondly, the method has only been investigated for trial settings in the absence of heterogeneity of treatment effect across studies. Thirdly, in their work related to betweenstudy selection bias, Copas and Jackson comment that "In reality, significance is not the only factor taken into account by editors and referees, or even by authors in deciding whether to write up their study and submit the article in the first place". It may be however that statistical significance plays a larger part in the decision about which outcomes to present within a report. Further work is needed to understand the selection mechanisms operating, and there may be more than one in any particular setting, in order to better inform statistical modelling in this area.
Conclusion
The Copas and Jackson approach is a useful method for reviewers to apply to assess robustness to withinstudy selective nonreporting. A question for further research is whether an improved method can be developed in situations where the trials are small or there are few trials with missing outcome data.
Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions
PW conceived of the study, designed and coordinated the study, and drafted the manuscript. CG participated in the design of the study, carried out the simulation study and performed the statistical analysis. Both authors read and approved the final manuscript.
Acknowledgements
We would like to thank Jane Hutton for helpful comments on an earlier version of this paper.
References

Dickersin K, Chans S, Chalmers T, Sacks H, Smith H: Publication bias and clinical trials.
Controlled Clinical Trials 1987, 8:34353. PubMed Abstract  Publisher Full Text

Sutton AJ, Song F, Gilbody SM, Abrams KR: Modelling publication bias in metaanalysis: a review.
Statistical Methods in Medical Research 2000, 9:421445. PubMed Abstract  Publisher Full Text

Copas J, Jackson D: A bound for publication bias based on the fraction of unpublished studies.
Biometrics 2004, 60:146153. PubMed Abstract  Publisher Full Text

Hutton JL, Williamson PR: Bias in metaanalysis due to outcome variable selection within studies.

Williamson PR, Gamble C, Altman DG, Hutton JL: Withinstudy selective reporting bias in metaanalysis.
Statistical Methods in Medical Research 2005, 14:515524. PubMed Abstract  Publisher Full Text

Hahn S, Williamson PR, Hutton JL: Investigation of withinstudy selective reporting in clinical research: followup of applications submitted to an LREC.
Journal of Evaluation in Clinical Practice 2002, 8:353360. PubMed Abstract  Publisher Full Text

Chan AW, Hrobjartsson A, Haahr M, Gotzsche PC, Altman DG: Empirical evidence for selective reporting of outcomes in randomized trials: Comparison of protocols to publications.
Journal of the American Medical Association 2004, 291:24572465. PubMed Abstract  Publisher Full Text

Williamson PR, Gamble C: Identification and impact of outcome selection bias in metaanalysis.
Statistics in Medicine 2005, 24:15471561. PubMed Abstract  Publisher Full Text

Ioannidis J, Trikalinos TA: Early extreme contradictory estimates may appear in published research: The Proteus phenomenon in molecular genetics research and randomised trials.
Journal of Clinical Epidemiology 2005, 58:543549. PubMed Abstract  Publisher Full Text

Terrin N, Schmid CH, Lau J: In an empirical evaluation of the funnel plot, researchers could not visually identify publication bias.
Journal of Clinical Epidemiology 2005, 58:894901. PubMed Abstract  Publisher Full Text

Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L: Comparison of two methods to detect publication bias in metaanalysis.
Journal of the American Medical Association 2006, 295:676680. PubMed Abstract  Publisher Full Text

Jackson D, Copas J, Sutton AJ: Modelling reporting bias: the operative mortality rate for ruptured abdominal aortic aneurysm repair.
Journal of the Royal Statistical Society A 2005, 168:737752. Publisher Full Text