Codementor Events

Building blocks of a T-SQL statement

Published Nov 08, 2019

Transact-SQL, or T-SQL, as it has come to be commonly known, is the language that is used to communicate with Microsoft SQL Server. Any actions a user wishes to perform in a server, such as retrieving or modifying data in a database, creating objects, changing server configurations, and so on, are all done via a T-SQL command.
This article is taken from the book Learn T-SQL Querying by Pam Lahoud and Pedro Lopes. This book will be a useful guide to learning the art of writing efficient T-SQL code in modern SQL Server versions, as well as the Azure SQL Database.
In this article, we will be introduced to the typical components of a T-SQL statement, including the logical order with which SQL Server processes a statement.

Writing a T-SQL statement

When writing a T-SQL statement, the following three actions are required:

1.	Express the intended operation, such as reading or changing data
2.	Provide a target or source list of affected tables
3.	Provide a condition that filters the affected records

The intended operation is determined by the presence of the following clauses:

• The SELECT clause lists columns or expressions that will be displayed in the result set
• The DELETE, INSERT, or UPDATE clauses state the target table or view for these logical operations

As for the affected tables and filters, they are determined by the following clauses:

• The FROM clause lists the source tables, views and/or sub-queries that contain the data to be queried
• The WHERE clause states one or more conditions that will serve to filter the result set to the desired rows

The formatting of the results can be further modified by adding any of the following parts:

• The ORDER BY clause defines the order in which the rows will be returned
• The GROUP BY clause aggregates rows together based on the criteria provided (typically combined with aggregate functions in the SELECT clause)
• The HAVING clause applies a predicate to the results (different than the WHERE clause, which applies a predicate to the source rows)

In the following sections we will look at each of these clauses in detail:

SELECT

The SELECT clause defines the columns and expressions that will be returned in the results and is the only element that is required to form a valid T-SQL data retrieval statement. Elements in the SELECT statement can be as simple as a single constant value, or as complex as a full T-SQL sub-query, but generally it is a comma-separated list of columns from tables and views in a database.
The following query will return a single row with a single column:

SELECT 1;

The SELECT clause can also be used to format the results by providing column aliases or using expressions to modify the data. Aliases are created with the optional keyword AS, followed by the intended column name to be displayed in the result set:

SELECT Name AS ProductName, LEFT(ProductNumber, 2) AS ProductCode, ISNULL(color, 'No Color') AS Color [...]

Note that, in the results, any row that has a value for Color will display that value, whereas any row that has a null color will display No Color instead:
Capture1.JPG

DISTINCT

DISTINCT specifies that repeated rows in the result set are collapsed into a single row.

SELECT DISTINCT Name AS ProductName, LEFT(ProductNumber, 2) AS ProductCode, ISNULL(color, 'No Color') AS Color [...]

TOP

The TOP clause specifies that from the applicable rows, the results set only produces a predetermined number of rows, set in percentage or absolute number.

SELECT TOP 25 Name AS ProductName, LEFT(ProductNumber, 2) AS ProductCode, ISNULL(color, 'No Color') AS Color [...]

FROM

The FROM clause specifies the tables or views used in the SELECT, DELETE, and UPDATE statements. It is required unless a SELECT list contains only constants, variables or arithmetic expressions, or an UPDATE clause does not contain references to other tables other than the target. It can be a single table, a derived table, a Table-Valued Function (TVF), or it can be several tables and/or views joined together.

If the desired result set contains data from more than one table or view, joins can be used to link rows from one table to another. There are essentially the following three types of logical joins that are expressed when writing a query:

INNER JOIN

Inner joins compare the rows from two tables based on conditions specified in the query. Typically, this type of join would be used to intersect rows that have the same value in a specific column or set of columns. The only rows that would be returned are the ones that have matching rows in both tables, as represented in black in the following diagram:
Capture2.JPG
For example, the AdventureWorks sample database has a Product table that contains the ProductID and Name columns and a ProductInventory table that contains the ProductID and Quantity columns. To write a query that returns the product name and the quantity together, an inner join can be used to combine rows from the Product table with rows from the ProductInventory table based on matching values in the ProductID column. In this case, only products that have rows in both tables will be returned. The query would look like the following:

SELECT Name AS ProductName, Quantity
FROM Production.Product
INNER JOIN Production.ProductInventory ON Product.ProductID = ProductInventory.ProductID;

OUTER JOIN

Outer joins are used to return all the data in one table, plus any matching rows in the other table. In the left outer join, the entire left table is returned along with any matching rows from the right table. If there is no matching row on the right, null values will be returned for these columns:
Capture3.JPG
Building on the preceding example, there are some rows in the Product table that are not currently in inventory; therefore, there are no rows with these product IDs in the ProductInventory table. To return all the products, whether they have matching rows in the ProductInventory table or not, a left outer join can be used.
In this case, rows with no inventory will return NULL for the Quantity column:

SELECT Name AS ProductName, Quantity
FROM Production.Product
LEFT OUTER JOIN Production.ProductInventory ON Product.ProductID = ProductInventory.ProductID;

So, following on from what was covered previously in the SELECT clause section, NULL can be replaced by zeros in the results, if desired, by using an expression:

SELECT Name AS ProductName, ISNULL(Quantity, 0) AS Quantity
FROM Production.Product
LEFT OUTER JOIN Production.ProductInventory ON Product.ProductID = ProductInventory.ProductID;

In a right outer join, all the rows from the right table are returned along with any matching rows from the left table (and NULL for the left columns if no match exists):
Capture4.JPG
If there are products in the ProductInventory table that are not in the Products table for some reason, a right outer join would return all the quantities whether they have a corresponding name or not, and a NULL value for the Name column if no matching row exists in the Products table:

SELECT Name AS ProductName, Quantity
FROM Production.Product
RIGHT OUTER JOIN Production.ProductInventory ON Product.ProductID = ProductInventory.ProductID;

In this case, it might be good to display the ProductID column if the Name column is null:

SELECT ISNULL(Name, ProductInventory.ProductID) AS ProductName, Quantity
FROM Production.Product
RIGHT OUTER JOIN Production.ProductInventory ON Product.ProductID = ProductInventory.ProductID;

For full outer joins, all rows are returned from both tables; if rows are matched, they are combined into a single row in the results:
Capture5.JPG
Using a full outer join, all the rows from both the Product and the ProductInventory tables will be returned. The Name column will be NULL for rows that appear only in the ProductInventory table, and the Quantity column will be NULL for rows that appear only in the Product table:

SELECT ISNULL(Name, ProductInventory.ProductID) AS ProductName, ISNULL(Quantity, 0) AS Quantity
FROM Production.Product
FULL OUTER JOIN Production.ProductInventory ON Product.ProductID = ProductInventory.ProductID;

CROSS JOIN

Cross joins are also called Cartesian products. In a cross join, every row in the left table is returned, and each of these rows is combined with all the rows from the right table. If the left table has 10 rows and the right table 100 rows, then the cross join produces 1000 rows.

APPLY

APPLY is like a cross join in the type of result set that it produces, but usable only with functions. In a cross join, both inputs (left and right) are tables or views that already exist in the database, with a fixed definition. However, APPLY is used in scenarios where a join cannot be used. In APPLY, one of the inputs (the right) is not physically materialized in the database because its output is dependent on input parameters, such as in the case of a table-valued function (TVF).

For example, the AdventureWorks sample database has a SalesPerson table that contains the BusinessEntityID and SalesYTD columns, and a ufnGetContactInformation TVF that returns the FirstName, LastName, and JobTitle columns. The TVF creates a runtime abstraction for columns that exist in multiple underlying tables, like building a table on-the-fly. To write a query that returns the year-to-date (YTD) sales per sales person, together with their name and job title, a cross apply can be used to return all rows from the SalesPerson table, and each of those rows is combined with the rows coming from the ufnGetContactInformation TVF.

The query would look like the following code block:

SELECT SP.SalesYTD, P.FirstName, P.LastName, P.JobTitle
FROM Sales.SalesPerson AS SP
CROSS APPLY dbo.ufnGetContactInformation (SP.BusinessEntityID) AS P;

In the following screenshot, the results of the ufnGetContactInformation function are displayed alongside the SalesYTD column, just as if they came from another table using a simple inner or outer join:
Capture6.JPG
However, the following query produces an error (ID 4104) because a join cannot be used directly with a TVF:

SELECT SP.SalesYTD, P.FirstName, P.LastName, P.JobTitle
FROM Sales.SalesPerson AS SP
CROSS JOIN dbo.ufnGetContactInformation (SP.BusinessEntityID) AS P;

WHERE

The WHERE clause specifies the search condition that determines whether a row should be returned in the result set. Rows will be returned only if the entire WHERE clause evaluates to TRUE. Each condition within the WHERE clause is referred to as a predicate. There is no limit to the number of predicates that can appear in a WHERE clause, and predicates are combined using the AND, OR, and NOT logical operators.
For example, the AdventureWorks sample database has a Product table that contains the Name and ProductID columns, a ProductInventory table that contains the Quantity, LocationID, and ProductID columns, and a Location table that contains the LocationID and Name columns. A query that returns the current product inventory per location, for the entire Touring line of products would look like the following code block:

SELECT P.Name AS ProductName, [PI].Quantity, L.Name AS LocationName
FROM Production.Product AS P
INNER JOIN Production.ProductInventory AS [PI] ON P.ProductID = [PI].ProductID
INNER JOIN Production.Location AS L ON [PI].LocationID = L.LocationID
WHERE P.Name LIKE 'Touring%';

The following screenshot shows that all the ProductName values in the result set begin with the word Touring:
Capture7.JPG

ORDER BY

The ORDER BY clause orders the results. The default order is ascending, meaning that the ASC keyword can be omitted. To produce the result set in descending order, DESC must be set explicitly. Building on the same example from the WHERE section, the following code block is used to explicitly present results in descending order of product name and location name:

SELECT P.Name AS ProductName, [PI].Quantity, L.Name AS LocationName
FROM Production.Product AS P
INNER JOIN Production.ProductInventory AS [PI] ON P.ProductID = [PI].ProductID
INNER JOIN Production.Location AS L ON [PI].LocationID = L.LocationID
WHERE P.Name LIKE 'Touring%'
ORDER BY P.Name DESC, L.Name DESC;

The following screenshot shows the results in the specified order:
Capture8.JPG

GROUP BY

GROUP BY aggregates the results on the required column names or expressions. Building on the same example from the ORDER BY section, we want to know the overall product quantity per product name and location, from the following code snippet. The Quantity column is using the aggregate function SUM. Therefore, the remaining columns need to be contained in the aggregation GROUP BY clause:

SELECT P.Name AS ProductName, SUM([PI].Quantity) AS Total_Quantity, L.Name AS LocationName
FROM Production.Product AS P
INNER JOIN Production.ProductInventory AS [PI] ON P.ProductID = [PI].ProductID
INNER JOIN Production.Location AS L ON [PI].LocationID = L.LocationID
WHERE P.Name LIKE 'Touring%'
GROUP BY P.Name, L.Name
ORDER BY P.Name DESC, L.Name DESC;

The following screenshot shows the results with one row per set as defined by the GROUP BY clause:
Capture9.JPG

HAVING

HAVING further filters the result based on values in the results, rather than the actual data. A HAVING clause only applies to columns that are included in the GROUP BY clause or in an aggregate function. Building on the same example used in the WHERE, ORDER BY, and GROUP BY sections, here we want to additionally know which locations carry an inventory of over 100 items per product. For that, after the GROUP BY clause, the query has a HAVING clause over the aggregate function, where its result is greater than 100:

SELECT P.Name AS ProductName, SUM([PI].Quantity) AS Total_Quantity, L.Name AS LocationName
FROM Production.Product AS P
INNER JOIN Production.ProductInventory AS [PI] ON P.ProductID = [PI].ProductID
INNER JOIN Production.Location AS L ON [PI].LocationID = L.LocationID
WHERE P.Name LIKE 'Touring%'
GROUP BY P.Name, L.Name
HAVING SUM([PI].Quantity) > 100
ORDER BY P.Name DESC, L.Name DESC;

The following screenshot shows the results as containing only rows with an aggregate Total_Quantity greater than 100:
Capture10.JPG

Logical statement processing flow

When writing T-SQL, it is important to be familiar with the order in which the SQL Server Database Engine interprets queries, to later create an execution plan. This helps anticipate possible performance issues from poorly written queries. However, it also helps to understand cases of unintended results. The following steps outline the process that the database engine follows to process a T-SQL statement:

  1. Parse the query for correctness.
  2. Build a structure that represents the logic of the query as expressed by the developer—a query tree, also called a sequence tree.
  3. Process all the source and target objects stated in the FROM clause, together with the intended logical operation to perform on those objects.
  4. Apply whatever pre-filters are defined in the WHERE clause that can reduce the number of incoming rows from those objects.
  5. Apply any aggregation defined in GROUP BY, followed by any filters that can only be applied to the aggregations.
  6. Keep only the required columns for the output, and account for any limits stated in a TOP or DISTINCT clause.
  7. Order the resulting row set as specified by the ORDER BY clause, and make the result set available for the client.
    The following diagram summarizes the logical statement processing flow for the building blocks discussed in this chapter:

Capture11.JPG

This article introduced to the basic building blocks that make up a T-SQL statement. We also saw an overview of how SQL Server interprets those blocks to begin the process of executing the user's query.
If you found this post useful, do check out the book, Learn T-SQL Querying by Packt Publishing. This book will help you troubleshoot query performance issues, identify anti-patterns in code, and write efficient T-SQL queries.

Discover and read more posts from PACKT
get started