Prentice-Wilcoxon Test for Paired Time-to-Event Data

Written by Statistical Consultancy Team | Fri, May 16, 2014

In survival analyses we conventionally compare a time-to-event endpoint between two or more strata; patients are either represented in one strata or the other and the strata are independent of each other.

In this poster we are interested in the within-patient time-to-improvement of psoriasis symptoms in various parts of the body; within-patient data from the same timepoint are not independent, and conventional survival analysis methodology is therefore not appropriate.

The severity of plaque psoriasis and response to treatment is most commonly assessed using the Psoriasis Activity and Severity Index (PASI). PASI assesses the extent of body surface involvement of psoriasis in four anatomical regions (head, trunk, upper extremities and lower extremities) and the severity of desquamation, erythema and plaque induration (thickness/infiltration) in each corresponding anatomical region. The endpoint of interest was the time-to-75%-improvement (response) in PASI and, in particular, whether one type of symptom or anatomical region improved more rapidly than another.

Prentice-Wilcoxon Test

The Prentice-Wilcoxon Test allows paired/tied/correlated time-to-event data, within a patient, to be compared. Data are right-censored, meaning that by the end of follow-up, not all patients will have experienced 75% improvement in a given PASI subscale.

The method of comparing censored, paired time-to-event data was originally described by Kalbfleisch and Prentice (1980) and a SAS macro developed by Lashley, et al. (2000). The method and macro assume data exist for both subscales for all patients.

For a given pair of subscales, a score is calculated for each patient, taking into account the time-to-response in both subscales and the censored nature of the data.

These scores are combined across patients to give Z, the test statistic, with a standard normal distribution, and a p-value can be calculated. The p-value indicates whether there is a difference between subscales, but does not give the direction nor quantify the magnitude of this difference.

Initial Kaplan-Meier Analyses

Longitudinal PASI body-specific and component-specific data for 800 patients (two treatment regimens) were simulated based on estimated distribution and correlations in body- and component-specific subscales in real-life data.

Initial Kaplan-Meier analysis indicated that there was a significant difference between the two regimens in the time-to-response in PASI total score (which combines all body- and component-specific subscales).

Of interest was whether there were differences between the regimens in the time-to-response for the individual body- and component-specific subscales, and whether one subscale improved significantly more rapidly than another. The analysis was repeated within each of the eight subscales (strata = regimen) and the Kaplan-Meier curves, by regimen are plotted below.

Body-Specific Subscales

Component-Specific Subscales

Application of Prentice-Wilcoxon

The SAS macro supplied by Lashley, et al. (2000) (see handout) was run iteratively for each combination of pairwise comparison within body- or component-specific subscales. Other than the pairwise comparison of locations 3 with 4, all body-specific pairwise comparisons were significant. Similarly, component-specific pairwise comparisons indicated that all pairwise comparisons were significant, with the exception of components 1 with 4 and components 2 with 3.

For example, in regimen A, median time-to-response was 12 weeks for both location 1 and 2 subscales; the Prentice-Wilcoxon test indicated a significant difference. It is only when examining the Kaplan-Meier plot that we are able to interpret that PASI symptoms at location 1 improve significantly more rapidly than at location 2.

At baseline, between 2% and 12% of patients had no PASI symptoms in one or more of the body-specific subscales. With no symptoms at baseline for a subscale, the time-to-response can not be calculated. In the initial Kaplan-Meier analysis, and for those subscales, these patients are consequently omitted from analysis. Similarly, for the pairwise comparisons in the Prentice-Wilcoxon analysis, these patients are omitted for that particular pairwise comparison, despite perhaps having valid time-to-response data for the other subscale in the pairwise comparison.

Sensitivity analyses were performed, imputing response/censored data at week 0 for those subscales where there were no symptoms at baseline. In general the Prentice-Wilcoxon results were similar. Methods for such sensitivity analyses could be explored further.

Conclusion and References

The Prentice-Wilcoxon test is easy to apply using the Lashley, et al.(2000) supplied SAS macro. It relies on patients having valid time-to-event data for both variables, else those patients are omitted from analysis. The result is a p-value, but does not give the direction nor quantify the magnitude of the difference. Therefore, Kaplan-Meier median estimates and plots are essential in interpreting the direction of any significant difference.

Kalbfleisch, JD. and Prentice RL. (1980). The Statistical Analysis of Failure Time Data, New York, NY: John Wiley & Sons.
Lashley, R., et al. (2000). A Nonparametric Approach for the Statistical Analysis of Time to First Event in Censored Paired Data, May 7-10, 2000, Seattle, Washington. PharmaSUG: LexJansen.

View full post