<img alt="" src="https://secure.perk0mean.com/171547.png" style="display:none;">

SAS Proc Transpose In Clinical Trial Programming

By Clinical Programming Team
October 22, 2024

SAS Proc Transpose

What is PROC Transpose in SAS?

In clinical programming and data analysis, datasets often come in a variety of structures that don’t always align with the needs of analysis or reporting. For example, you may receive data in a "long" format where each observation (row) represents a single event or time point for a subject, but your analysis requires the data in a "wide" format where each subject has a single row with multiple variables representing different time points or conditions.

In SAS, one of the most efficient ways to restructure your data is by using the PROC TRANSPOSE procedure. PROC TRANSPOSE allows you to convert data from rows to columns and vice versa quickly and easily, without having to manually manipulate the dataset. This procedure is especially useful when preparing datasets for clinical trial reporting, where you need to generate summary statistics or cross-tabulations that require data in a specific format.

 

PROC TRANSPOSE Syntax

This section explains the essential and optional syntax of PROC TRANSPOSE, outlining how it can be applied to restructure datasets in SAS. By specifying the appropriate options, PROC TRANSPOSE allows for flexible data transformation. Here’s the general syntax:

proc transpose data=<input-dataset> out=<output-dataset>;
   by <grouping-variables>;
   id <identifier-variable>;
   var <variables-to-transpose>;
run;

Key Components of the Syntax:

  • data=: Specifies the input dataset to be transposed.
  • out=: Defines the name of the output dataset that will contain the transposed data.
  • by: Lists the variables by which the data will be grouped before transposing. This means that the data will be transposed within each group defined by these variables.
  • id: Indicates the variable whose unique values will become the column headers in the transposed dataset.
  • var: Specifies the variables whose values will be transposed into the new columns.

In addition to these key components, PROC TRANSPOSE also offers several optional arguments that provide greater control over the process, allowing users to customize the output as needed.

PROC TRANSPOSE offers several optional arguments that provide greater control over the transposition process. Some of the more frequently used optional arguments include:

  • DELIMITER=: Allows users to define a character that will separate components in the names of the transposed variables in the resulting dataset.
  • LABEL=: Allows you to assign a custom name for the variable that contains the label of the variable being transposed.
  • PREFIX=: Specifies a prefix for the names of the transposed variables in the output dataset.
  • SUFFIX=: Specifies a suffix to append to the names of the transposed variables.

 

A comprehensive list of all optional arguments and their functionalities can be found in the SAS documentation (PROC TRANSPOSE documentation).

 

PROC TRANSPOSE Example: Restructuring Data in SAS

The code below will create a dataset showing the number of hours worked by each employee. The structure of the original dataset, workweek, is one record per employee per day.

data workweek;
     input ID Employee $ Day $ 13-21 Hours;
     datalines;
table for blog3
;
run;

In this dataset, the goal is to restructure the data so that each employee has a single record, with separate variables for each day of the week. This transformation can be accomplished using the PROC TRANSPOSE procedure as follows:

proc transpose data=workweek out=workweek2(drop=_name_);
   by id employee;
   var hours;
   id day;
run;

This could also be achieved by using an array:

Here is the dataset created by the PROC TRANSPOSE:

table for blog

 

Handling Multiple Variables with PROC TRANSPOSE

PROC TRANSPOSE in SAS is not only capable of restructuring a single variable but also allows users to handle multiple variables simultaneously. This feature is particularly useful in clinical data analysis, where multiple measurements for different parameters need to be organised into a more compact format.

To transpose multiple variables, the VAR statement can include a list of variables that you want to transform into new columns. When using this approach, each variable specified will have its own set of transposed columns in the output dataset.

Consider a dataset that records the scores of students across different subjects for various assessment periods. The initial dataset, scores, is structured as follows:

data scores;
input StudentID $ Subject $ Assessment1 Assessment2

     datalines;  

S1 Math 85 88
S1 Science 90 92

S2 Math 78 80

S2 Science 83 85;
run;

sas transpose table 01

 

In this dataset, each student has scores for two assessments in different subjects. To restructure this dataset such that each student has a single record with scores for each subject and assessment, PROC TRANSPOSE can be employed as shown below:

proc transpose data=scores out=transposed_scores(drop=_name_);
   by StudentID;
   var Assessment1 Assessment2;
   id Subject;
run;

sas transpse table 02

In this example:

The BY statement groups the data by StudentID.

The VAR statement lists Assessment1 and Assessment2 as the variables to be transposed.

The ID statement indicates that the values of the Subject variable will become the new column headers in the transposed_scores dataset.

The resulting dataset has columns for StudentID, Math and Science.

This flexibility in handling multiple variables allows researchers and analysts to tailor their datasets to better meet the requirements of their analyses and reporting needs. PROC TRANSPOSE simplifies the data restructuring process, enabling efficient data management and analysis in SAS.


Conclusion

PROC TRANSPOSE is a highly efficient and user-friendly tool for reshaping data in SAS, especially when converting datasets from long to wide format. It requires fewer lines of code, is quick to implement, and can automatically handle both numeric and character data types, making it versatile for a range of tasks. Its simplicity and speed make it an ideal choice for clinical programmers working with large datasets that need restructuring.

However, the procedure does have some limitations, particularly around controlling variable names, as it often defaults to generic names like COL1, COL2, unless customized with options like PREFIX=. It also requires clean, well-structured data to function smoothly. Despite these challenges, PROC TRANSPOSE remains an excellent option when your primary goal is to efficiently reshape data with minimal effort.