A member of the Quanticate Programming team writes about their opinions of the INTO statement in PROC SQL and other applications.
I do not like PROC SQL. I don’t like the fact that it is neither pure SQL nor is it SAS, and that SAS programmers need to jump between SAS datasteps and PROC SQL, depending on which will perform a particular operation in the most efficient manner. Most of all, I do not like the fact that as SAS programmers we cannot live without it.
In the SAS documentation PROC SQL is described as a procedure that:
“. . . can perform some of the operations that are provided by the DATA step and the PRINT, SORT, and SUMMARY procedures.”
What the documentation does not tell you is that PROC SQL can do some of these operations more elegantly than SAS. There are even operations the SQL procedure performs that would take vast amounts of complicated code to implement in SAS.
Before using PROC SQL to create macros or perform complex data manipulations, it’s essential to understand the structure and contents of your datasets. Two useful tools for this purpose are PROC CONTENTS and PROC DATASETS.
PROC CONTENTS provides a detailed summary of the dataset, including the number of observations, variables, and their types. It helps you gain insights into the data you are working with, which is important when planning your SQL queries or when you need to create a macro using PROC SQL.
proc contents data=your_dataset;
run;
While PROC CONTENTS gives you an overview, PROC DATASETS allows you to manage and modify your datasets. You can view the list of datasets available in a library, delete or rename datasets, and more. It’s particularly useful for organising your data before you dive into the specifics of SQL operations.
proc datasets library=work;
run;
By using PROC CONTENTS and PROC DATASETS, you ensure that you have a thorough understanding of your data’s structure and contents. This knowledge is key to effectively utilizing PROC SQL for tasks such as creating macros, filtering data, and applying advanced SQL techniques.
One such operation is performed using the INTO statement, which is similar in purpose to the SYMPUT function, as both are used to store query results into macro variables. The INTO statement in PROC SQL is used to put values from datasets into macro variables. There is nothing special about this on its own, but the INTO statement also has the ability to populate the macro variable with a delimited list of multiple values. The code below extracts names from a dataset that begin with the letter “P”. These names are then put into a macro variable using the INTO statement. This macro variable is then used as the criteria to make an extraction from another dataset.
data name_list;
length name $10;
input name $;
datalines;
Peter
John
Paul
David
;
run;
data likes;
length name $10 likes $10;
input name $ likes $;
datalines;
Peter Cakes
Simon Basketball
Philip Apples
Sam Oranges
Paul Bananas
John Cricket
Frank Cats
;
run;
%macro example1;
proc sql noprint;
select name
into :names separated by '" "'
from name_list
where substr(name,1,1) = 'P';
quit;
data output;
set likes;
where name in ("&names");
run;
%mend;
%example1;
Although this example does not offer much over the alternative of sorting and merging the two datasets together, when the datasets in use are very large, this method becomes hugely attractive because of its efficiency over the merging method.
Another example where the INTO statement can prove useful is creating a macro variable containing the number of observations within a dataset. The code below counts the observations in a dataset before taking each value in turn and using it to create a new dataset.
%macro example2;
proc sql noprint;
select count(*)
into :obs
from name_list;
quit;
%do i = 1 %to &obs;
proc sql noprint;
select name
into :name
from name_list (firstobs = &i obs = &i);
quit;
data &name;
set likes;
where name = "&name";
run;
%end;
%mend;
%example2;
These are just a couple of ways the INTO statement can solve problems that, although solvable in SAS, the solutions are not quite as concise and elegant. The examples given here are very simple because they are intended as a starting point to demonstrate the ease with which the INTO statement can be used.
The examples above are an excellent introduction to using SQL to create macro variables, but with some minor changes to example 2, we can show off an additional facet of the INTO statement - the ability to create multiple macro variables at once.
Let's take a slightly different situation, where we only have the 'likes' dataset and we want to create subsets for all the names contained in it. This is straightforward enough to begin with - we simply run our Proc SQL steps on the 'likes' dataset, rather than the 'name_list' dataset, and everything else follows through as before. However, what if Peter is not a one-dimensional character, who can be summed up by a single observation in a SAS dataset? What if Peter likes cakes *and* chocolate?
Technically the code will still function in this case too. On the first run through the do loop, with &i = 1, it will pull the name "Peter" from the first observation, set it as the value for &name and then use that to create the data subset. Then on the second run through the loop, with &i = 2, it will pull out the name "Peter" again and recreate the exact same dataset. The resulting output will be correct, but there is unnecessary repeat processing, which will only get worse if Peter also like cookies, pizza, beer, etc.
So, in a slight variation (which we will imaginatively call Example 2a), let's have a look at what happens if we use INTO to create all the macro variables up front. Instead of the do loop creating the &name macro variable each time, we can first create multiple macro variables - &name1, &name2, &name3, and so on - containing the unique names, so that we don't re-run on any duplicates. These can then be run through the loop, pretty much as before, making use of double ampersands to let the macro variables resolve correctly.
data likes;
length name $10 likes $10;
input name $ likes $;
datalines;
Peter Cakes
Peter Chocolate
Simon Basketball
Philip Apples
Sam Oranges
Paul Bananas
John Cricket
Frank Cats
Frank Evita
;
run;
%macro example2a;
proc sql noprint;
select count(distinct name)
into :obs
from likes;
quit;
proc sql noprint;
select distinct name
into :name1-
from likes;
quit;
%do i = 1 %to &obs;
data &&name&i;
set likes;
where name = "&&name&i";
run;
%end;
%mend;
%example2a;
The CASE statement in SQL is a powerful tool for applying conditional logic to your queries. It allows you to create new columns or modify existing ones based on specified conditions. Here’s how you can use the CASE statement to categorise different types of "likes" into broader categories:
proc sql; select name, likes, case when likes = 'Cakes' then 'Sweet' when likes = 'Chocolate' then 'Sweet' when likes = 'Basketball' then 'Sport' when likes = 'Apples' then 'Fruit' when likes = 'Oranges' then 'Fruit' else 'Other' end as category_label from likes;
quit;
In this example, we use the CASE and multiple when statements to assign a category label to each type of "likes". The new column category label categorises each entry as 'Sweet', 'Sport', 'Fruit', or 'Other', based on the value in the ‘likes’ column. This approach allows you to perform more nuanced data analysis and reporting.
Mastering the INTO statement in PROC SQL is pivotal for SAS programmers who seek to leverage the full potential of macro variables. The primary benefit of this feature lies in its ability to seamlessly store query results within the macro language, allowing these values to be utilized throughout your entire program. This capability not only enhances data manipulation and retrieval but also enables more dynamic and flexible SAS programming. By integrating the INTO statement into your workflow, you can significantly simplify data handling and improve the maintainability of your code. We encourage you to experiment with different applications of INTO to see firsthand how it can transform your data analysis processes, making them more efficient and insightful.
Learn more about Quanticate's statistical programming services and how our experts can assist you with your clinical trial to produce more efficient outcomes using SAS programming best practices and macro developments.
Related Blogs