We are excited to share that there are several Accelerated Database Recovery (ADR) enhancements in SQL Server 2022 that further improve the overall availability and scalability of the database, primarily around persistent version store (PVS) cleanup and management.
Overview of Accelerated Database Recovery (ADR)
ADR improves database availability, especially in the presence of long running transactions, by redesigning the SQL database engine recovery process. ADR is introduced in SQL Server 2019 (15.x) and improved in SQL Server 2022 (16.x).
ADR is also available for databases in Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse SQL. ADR is enabled by default in SQL Database and SQL Managed Instance and cannot be disabled.
The primary benefits of Accelerated Database Recovery (ADR) are
Fast and consistent database recovery
With ADR, long running transactions do not impact the overall recovery time, enabling fast and consistent database recovery irrespective of the number of active transactions in the system or their sizes.
Instantaneous transaction rollback
With ADR, transaction rollback is instantaneous, irrespective of the time that the transaction has been active or the number of updates that has performed.
Aggressive log truncation
With ADR, the transaction log is aggressively truncated, even in the presence of active long running transactions, which prevents it from growing out of control.
ADR completely redesigns the database engine recovery process.
Make it constant time and instant by avoiding having to scan the log from and to the beginning of the oldest active transaction. With ADR, the transaction log is only processed from the last successful checkpoint (or oldest dirty page log sequence number (LSN)). As a result, recovery time is not impacted by long running transactions.
Minimize the required transaction log space since there is no longer a need to process the log for the whole transaction. As a result, the transaction log can be truncated aggressively as checkpoints and backups occur.
At a high level, ADR achieves fast database recovery by versioning all physical database modifications and only undoing logical operations, which are limited and can be undone almost instantly. Any transactions that were active at the time of a crash are marked as aborted and, therefore, any versions generated by these transactions can be ignored by concurrent user queries.
Note: For more details about ADR, please visit this page: and this video for a high-level overview of ADR and its components.
SQL Server 2022
Learn about the new features on security, platform, management, and more.
In SQL Server 2019 (15.x), the ADR cleanup process is single threaded within a SQL Server instance. Beginning with SQL Server 2022 (16.x), this process uses multi-threaded version cleanup (MTVC), that allows multiple databases under the same SQL Server instance to be cleaned in parallel.
MTVC is enabled by default in SQL Server 2022 and uses one thread per SQL instance. To adjust the number of threads for version cleanup, set ADR Cleaner Thread Count with sp_configure.
USE master;GO-- Enable show advanced option to see ADR Cleaner Thread CountEXEC sp_configure 'show advanced option', '1';-- List all advanced optionsRECONFIGURE;EXEC sp_configure; -- The following example sets the ADR Cleaner Thread Count to 4EXEC sp_configure 'ADR Cleaner Thread Count', '4';RECONFIGURE WITH OVERRIDE; -- Run RECONFIGURE to verify the number of threads allocated to ADR Version Cleaner.RECONFIGURE;EXEC sp_configure;
In the above example, if you configure the ADR Cleaner Count to be four on a sql instance with two databases, the ADR cleaner will allocate only one thread per database, leaving the remaining two threads idle.
Note: The maximum number of ADR Cleaner threads is capped at the number of cores used by the SQL Server instance. For example, if you are running SQL Server on an eight core machine, the maximum number of ADR cleaner threads that the engine can use will be eight, even if the value in the sp_configure is set to a greater value.
User transaction cleanup
This improvement allows user transactions to run cleanup on pages that could not be addressed by the regular cleanup process due to lock conflicts. This helps ensure that the ADR cleanup process works more efficiently.
Reducing memory footprint for PVS page tracker
This improvement tracks persisted version store (PVS) pages at the extent level, in order to reduce the memory footprint needed to maintain versioned pages.
Accelerated Data Recovery cleaner improvements
ADR cleaner has improved version cleanup efficiencies to improve how SQL Server tracks and records aborted versions of a page leading to improvements in memory and capacity.
Transaction-level persisted version store
This improvement allows ADR to clean up versions belonging to committed transactions independent of whether there are aborted transactions in the system. With this improvement PVS pages can be deallocated, even if the cleanup cannot complete a successful sweep to trim the aborted transaction map.
The result of this improvement is reduced PVS growth even if ADR cleanup is slow or fails.
New extended event
A new extended event, tx_mtvc2_sweep_stats, has been added for telemetry on the ADR PVS multi-threaded version cleaner.
Summary
In this blog post, we covered all the exciting ADR improvements that we are including with SQL Server 2022 that further improve the overall availability and scalability of your databases.
Figure 1: 5M rows bulk insert and recovery times after SQL restart with ADR on and off (side by side comparison)
Stay tuned as we are currently working on further improvements of the multi-threaded version cleaner that will enable parallelizing version cleanup within databases.
Learn more
For more information, and to get started with SQL Server 2022, check out the following references:
In SQL Server 2008 Enterprise Edition, backup compression was introduced addingWITH COMPRESSION as an option in the T-SQL backup command. SQL Server backup compression provides the compressed backup option for all streaming SQL Server backup typesfull, differential, and transaction log. Using the metadata in the backup file header, the RESTORE command decompresses compressed backup files. Compressing backup data benefits customers in two ways: first, the backup files will consume less storage, and second, since there are fewer bytes being transferred, the backups run faster than uncompressed backups.
The default configuration in SQL Server is to not compress backups, but many users change this default globally using sp_configure.
While customers use the current compression standard, the challenge many customers have with the current compression implementation is that the compression ratios are frequently too low to justify the additional compute overhead required to compress the backup files.
Additionally, customers cannot offload the compute overhead of the compression task. Many customers in high-volume online transaction processing (OLTP) environments put their business continuity and disaster recovery (BCDR) strategy at risk by delaying their regular backup schedules in order to remain operational for customer workloads.
Intel QuickAssist (Intel QAT) technology in SQL Server 2022, solves these challenges by providing hardware acceleration and CPU offload capabilities for enhanced compression and decompression functions. This feature is designed to give tangible improvements both in terms of reduced CPU usage and backup completion times, as well as a reduction in storage consumption.
The interface for Intel QAT devices uses a software library called QATzip where the QATzip API maintains a built-in software fallback mechanism. This fallback mechanism allows administrators to hot-add/remove devices, better resist hardware failure scenarios, and provide software-based compression where needed.
The SQL Server 2022 feature provides integrated compression/decompression capabilities by using the QATzip library and the associated Intel Intelligent Storage Acceleration Library (ISA-L). Intel ISA-L is installed alongside QATzip where the presence and enablement of both the QATzip and Intel ISA-L assemblies is a prerequisite for enabling instance-level server configuration for Intel QAT hardware acceleration.
QATzip currently supports offloading backups to hardware devices, either through a peripheral component interconnect express (PCIe) adapter or as part of the host server chipset, and also supplying a software-based compression capability that can be used if there isn't hardware available.
The SQL Server 2022 server-scope T-SQL syntax extends the current BACKUP DATABASE command to make it easy for database administrators to choose the desired algorithm to compress backups.
Additionally, the sp_configure configuration option is available to adjust backup compression defaults while introducing a new dynamic management view in order to query the configuration state.
The implementation of this feature surfaces appropriate error log and informational messaging to troubleshoot the Intel QAT configuration as well as being able to track the benefits of the Intel QAT compression feature over time by surfacing the compression algorithm history in the msdb database.
SQL Server 2022
Learn about the new features on security, platform, management, and more.
Business cases for backup compression with Intel QAT
The organizations that will find the most benefit from the Intel QAT are those that have high needs around maintaining customer workloads, such as mission critical OLTP and online analytical processing (OLAP) environments, while also ensuring their organization's data recovery strategy maintains the organizations service level agreements (SLAs). These organizations must maintain application workloads to meet customer expectations while ensuring the organization's data recovery strategy is never put at risk.
Often these organizations are highly virtualized and are running SQL Server instances at, or very close to, 100 percent processing time to maximize hardware and software licensing investments.
Customers should download the Intel drivers themselves directly from the Intel QuickAssist supportability page. The supportability page will give an overview of the Intel QAT, list driver versions per operating system, and maintain the release notes for the Intel QAT drivers.
Note: The official Intel QAT Driver for Windows, which for the writing of this documentation is version 1.8.0-0010, is required to be installed separately from the SQL Server 2022 installation.
For the initial release of QAT-enabled backups for SQL Server 2022, the feature will be available for SQL Server 2022 on Windows Server only. As a host operating system, Windows Server 2016, Windows Server 2019, and Windows Server 2022 are supported.
When the drivers are installed and server-scope configuration option is enabled for SQL Server, it is expected that the QATzip library will be available in the C:\Windows\ system32\ path for QATzip and the Intel ISA-L will be available in C:\Program Files\Intel\ISAL\* path.
This path is true for both hardware and software-only deployment scenarios.
Customers can install the drivers before the SQL Server 2022 installation in preparation for a disaster recovery site, for system standardization, or after the SQL Server 2022 installation, even without QAT hardware. In either case, it is required for customers to maintain the driver version.
Preparing a disaster recovery server site
It's possible to install the QATzip library on a server with no QAT hardware acceleration devices available (virtual or physical). In such cases, SQL Server will still load the QATZip.dll assembly provided the ALTER SERVER CONFIGURATION option for HARDWARE_OFFLOAD option is enabled for QAT.
In this scenario, users can specify the ALGORITHM for COMPRESSION as QAT_DEFLATE and rely on the software fallback provided by the Intel ISA-L to perform the compression operation using the same format and encoding as provided by QAT devices.
Below are links to the currently supported Windows Server driver version, the Intel QAT release notes, and the Intel QAT landing page, which will be used as the primary location for Intel QAT support.
SQL Server 2022 implements different levels of Intel QAT support based on the SQL Server edition. The SQL Server 2022 Enterprise edition will use hardware offloading if an Intel QAT device is available, otherwise, it will fall back to software-based compression if the Intel QAT device is not available for any reason.
SQL Server 2022 Standard Edition will only allow Intel QAT backups with software-only compression, even if an Intel QAT device is available.
SQL Server 2022 Express Edition will allow Intel QAT backups to be restored but will only allow backups to be performed with the default MS_XPRESS algorithm.
Note: The Intel QAT drivers are required to be installed in order to perform backups and restores using the Intel QAT feature.
SQL Server 2022 configuration for Intel QAT for backup compression
In order to back up databases with the Intel QAT compression algorithm, it is required to have the drivers installed, and the SQL Server must then be permitted to load the drivers into the SQL Server 2022 process space.
To complete this task, a new option has been added to ALTER SERVER CONFIGURATION in SQL Server 2022 with the HARDWARE_OFFLOAD syntax to enable or disable the use of hardware acceleration for a SQL Server instance. With this configuration being applied, after a restart of the SQL Server service, the Intel QAT hardware acceleration technology can be used to provide a new optional backup compression algorithm.
Note: The HARDWARE_OFFLOAD option requires a restart of the SQL Server instance to take effect.
After the feature is enabled, every time the SQL Server service starts, the SQL Server process will look for the required user space software library that interfaces with the hardware acceleration device driver API and will load the software assemblies if they are available. The Intel QAT user space library is QATzip, which provides a number of features. The QATzip software library is a user space software API that can interface with the QAT kernel driver API. It is used primarily by applications that are looking to accelerate the compression and decompression of files using one or more QAT devices.
In the case of the Windows operating system, there is a complimentary software library to QATzip, the IntelIntelligent Storage Acceleration Library (ISA-L). This serves as a software fallback mechanism for QATzip in the case of hardware failure as well as a software-based option when the hardware is not available.
Note: The unavailability of a QAT hardware device will not prevent SQL instances from performing backup or restore operations using the new QAT_DEFLATE algorithm.
Software compression or decompression using Intel ISA-L still requires that the QATzip software library is installed and that HARDWARE_OFFLOAD server configuration has been enabled for QAT. This is because SQL Server loads the QATzip.dll assembly provided by QATzip and it handles the fallback to Intel ISA-L.
Additionally, software-based compression will be available for SQL Server 2022 Standard Edition allowing Standard Edition customers the ability to take advantage of software-based accelerators where SQL Server Enterprise Edition 2022 customers will have the extended capability of offloading backups to an external device.
Enabling Intel QAT backup compression
The ALTER SERVER CONFIGURATION statement will be used to enable or disable access to QAT devices. Using the HARDWARE_OFFLOAD statement, as shown below, will persist a bit on the SQL Server boot page to load QAT binaries at startup.
ALTER SERVER CONFIGURATION SET HARDWARE_OFFLOAD = ON (ACCELERATOR = QAT)
After executing the statement, users would then restart the SQL Server service to have the QAT binaries loaded into the SQL Server process space.
Note: The default configuration will be to use the Microsoft default MS_XPRESS compression capabilities.
The ALTER SERVER CONFIGURATION statement below will disable the Intel QAT feature for SQL Server 2022.
ALTER SERVER CONFIGURATION SET HARDWARE_OFFLOAD = OFF (ACCELERATOR = QAT)
Hardware acceleration configuration and availability group replicas
The HARDWARE_OFFLOAD server-scope configuration option will be compatible with Availability Groups. If the setting is enabled on a primary replica, the configuration option will be a persistent server-scope property such that upon failover from a primary to a secondary replica, an attempt will be made to load the correct assemblies on the new primary replica. The recommendation will be to enable the server-scope setting on all replicas after installing the latest Intel QAT drivers on each replica.
Verifying Intel QAT accelerator configuration
In order to verify the Intel QAT accelerator configuration, a number of methods can be used including a new dynamic management view (DMV), sp_configure and the sys.configurations catalog view, and the SQL Server 2022 error log.
sys.dm_server_hardware_offload_config
The SQL Server sys.dm_server_hardware_offload_config dynamic management view can be used to verify the Intel QAT configuration state.
The addition of this dynamic management view will make it much easier to verify the SQL Server accelerators that are configured for use, and those that have been confirmed to be loaded. This dynamic management view (DMV) should be the primary method to verify the configuration state of the available accelerators.
Example: SELECT * FROM sys.dm_server_hardware_offload_config
GO
sp_configure and the sys.configurations catalog view
The SQL Server backup compression default behavior can be adjusted. You can change the server default configuration as well as other options. You can enable or disable hardware acceleration, you can enable backup compression as a default, and you can also change the default compression algorithm as by using sp_configure.
The status of these options is reflected in the sys.configurations table.
As mentioned previously, you can discover the hardware offload configuration by querying the sys.dm_server_hardware_offload_config dynamic management view.
Backup compression default
To discover the backup compression default state, you can use the query below:
SELECT * FROM sys.configurations
WHERE name = 'backup compression default';
Changing this configuration is permitted through the sys.sp_configure stored procedure.
No restart of SQL Server is required for this change to take effect. When this default option is changed to 1 (i.e., WITH COMPRESSION), MS_XPRESS will continue to be the default compression option, provided the backup compression algorithm has not been changed.
Backup compression algorithm
To discover the backup compression algorithm being used, you can use the query below:
SELECT * FROM sys.configurations
WHERE name = 'backup compression algorithm';
The "configure backup compression algorithm" configuration allows customers to change the default compression algorithm so they can determine if they want Intel QAT to be the default compression algorithm for SQL Server.
Backup compression enabled
There will be a global sp_configure option to enable/disable hardware offloading.
Note:The value will be either 0 or 1 and will require a reboot to take effect.
sp_configure ‘hardware offload enabled’, 0
GO
Creating a backup using QAT compression
The T-SQL BACKUP command WITH COMPRESSION has been extended to allow for a specified backup compression algorithm. When using Intel QAT for backup compression acceleration, the algorithm QAT_DEFLATE will initiate an Intel QAT compressed backup if the drivers are available and the SQL Server configuration has been completed.
Note: The standard compression algorithm will be referred to as MS_EXPRESS and will remain the default compression option.
The ALGORITHM command will be used to specify either of these two algorithms (QAT_DEFLATE, MS_EXPRESS) for backup compression.
The example below will perform backup compression using Intel QAT hardware acceleration.
BACKUP DATABASE testdb TO DISK='D:\Backups\testdb.bak' WITH COMPRESSION (ALGORITHM = QAT_DEFLATE);
Either of these statements will use the default MS_XPRESS compression engine.
BACKUP DATABASE testdb TO DISK='D:\Backups\testdb.bak' WITH COMPRESSION (ALGORITHM = MS_XPRESS);
BACKUP DATABASE testdb TO DISK='D:\Backups\testdb.bak' WITH COMPRESSION;
The extension to the T-SQL syntax for backup compression provides the addition of the ALGORITHM option, allowing the options MS_XPRESS (default) or QAT_DEFLATE (Intel QAT-based compression) in SQL Server 2022.
The table below gives a summary of the BACKUP DATABASE with COMPRESSION options in SQL Server 2022.
BACKUP STATEMENT
OVERVIEW
BACKUP DATABASE <database_name> TO {DISK|TAPE|URL}
Backup with no compression or with compression depending on default setting
BACKUP DATABASE <database_name> TO {DISK|TAPE|URL} WITH COMPRESSION
Backup using XPRESS compression algorithm
BACKUP DATABASE <database_name> TO {DISK|TAPE|URL} WITH COMPRESSION (ALGORITHM = MS_XPRESS)
Backup with compression using XPRESS algorithm. There is an argument for permitting use of DEFAULT or NATIVE as permitted options.
BACKUP DATABASE <database_name> TO {DISK|TAPE|URL} WITH COMPRESSION (ALGORITHM=QAT_DEFLATE)
Backup with compression using the QATZip library using QZ_DEFLATE_GZIP_EXT with compression level 1.
SQL Server 2022 RESTORE process
The RESTORE command does not include a COMPRESSION option as the backup header metadata specifies if the database is compressed and, therefore, the storage engine can restore from the backup file(s) accordingly. The backup metadata will be extended to include the compression type.
Running RESTORE HEADERONLY on a backup without compression and a backup compressed with the default MS_XPRESS algorithm will return result sets similar to the commands below:
RESTORE HEADERONLY FROM DISK=’C:\temp\QAT-DEFLATE.bak’ GO RESTORE HEADERONLY FROM DISK=’C:\temp\MS-XPRESS.bak’ GO
The RESTORE HEADERONLY command will display the compression algorithm along with other information such as the backup name, backup description, backup type, compression state, device type, and other columns. In SQL Server 2022, we will be adding the compression algorithm column.
Backups that have been compressed with either QAT_DEFLATE or MS_XPRESS, and uncompressed backups will all use the same T-SQL syntax RESTORE DATABASE DDL commands.
In the example below, the T-SQL restore syntax is valid for restoring from the testdb.bak backup file, regardless of whether it is an uncompressed backup, or a compressed backup created using QAT_DEFLATE or MS_XPRESS.
RESTORE DATABASE testdb FROM DISK=’E:\Backups\testdb.bak’ WITH RECOVERY;
SQL Server backups compressed using QAT_DEFLATE will support all T-SQL RESTORE operations. The RESTORE {DATABASE|LOG} statements for restoring and recovering databases from backups will support all recovery arguments, such as WITH MOVE, PARTIAL, STOPAT, KEEP REPLICATION, KEEP CDC, and RESTRICTED_USER.
Auxiliary RESTORE commands will also be supported for all backup compression algorithms. Auxiliary RESTORE commands include RESTORE FILELISTONLY, RESTORE HEADERONLY, RESTORE VERIFYONLY, and more.
Note: If the server-scope configuration HARDWARE_OFFLOAD option is not enabled, and/or the Intel QAT drivers have not been installed, an error will be surfaced instead of attempting to perform the restore.
To restore an Intel QAT compressed backup, the correct assemblies must be loaded on the SQL Server instance initiating the restore operation.
Best practices
The use of a BLOCKSIZE of 65536 (64KB) will be recommended, but options such as BUFFERCOUNT and MAXTRANSFERSIZE will continue to remain dependent on the user's environment if changes from the default are being made.
Learn more
For more information, and to get started with SQL Server 2022, check out the following references:
Go to the Intel QAT portal for the latest Intel information on Intel QAT device drivers and application support along with technical documentation and whitepapers.
Time series data is a set of values organized in the order in which they occur and arrive for processing. Unlike transactional data in SQL Server, which is not time-based and may be updated often, time series data is typically written once and rarely, if ever, updated.
Some examples of time series data include stock prices, telemetry from equipment sensors on a manufacturing floor, and performance metrics from SQL Server, such as CPU, memory, I/O, and network utilization.
Time series data is often used for historical comparisons, anomaly detection and alerting, predictive analysis, and reporting, where time is a meaningful axis for viewing or analyzing data.
Time series capabilities in SQL Server were introduced in Azure SQL Edge, Microsoft's version of SQL Server for the Internet of Things (IoT) which combines capabilities such as data streaming and time series with built-in machine learning and graph features.
SQL Server 2022
The most Azure-enabled release yet, with continued performance, security, and availability innovation.
With SQL Server 2022 and Azure SQL, we've brought time series capabilities to the entire SQL Server family. Time series capabilities in SQL Server consist of enhancements to existing T-SQL functions for handling NULL values, plus two new functions that make working with temporal-based data easier than ever.
Create contiguous ranges with GENERATE_SERIES
When analyzing time series data, it's common to create a contiguous set of datetime values in evenly spaced intervals (for example, every second) within a specific range. One way to accomplish this is by creating a numbers table, also known as a tally table, which contains a set of consecutive numbers between a lower and upper bound. The numbers in the table can then be used in combination with the DATEADD function to create the range of datetime values.
Prior to SQL Server 2022, creating a numbers table usually involved some form of common table expressions, CROSS JOIN of system objects, looping, or some other creative T-SQL. These solutions are neither elegant nor efficient at scale, with additional complexity when the step between interval values is larger than 1.
The GENERATE_SERIES relational operator in SQL Server 2022 makes creating a numbers table simple by returning a single-column table of numbers between a start and stop value, with an optional parameter defining the number of values to increment/decrement between steps in the series:
GENERATE_SERIES (start, stop [, step ])
This example creates a series of numbers between 1 and 100 in steps of 5:
SELECT valueFROM GENERATE_SERIES(1, 100, 5);
Taking this concept one step further, the next example shows how GENERATE_SERIES is used with DATEADD to create a set of values between 1:00 PM and 2:00 PM in 1-minute intervals:
SELECT DATEADD(minute, s.value, 'Dec 10, 2022 1:00 PM') AS [Interval]FROM GENERATE_SERIES(0, 59, 1) AS s;
If the step argument is omitted, a default value of 1 is used when computing interval values. GENEATE_SERIES also works with decimal values, with a requirement that the start, stop, and step arguments must all be the same data type. If start is greater than stop and the step is a negative value, then the resulting series will be a decrementing set of values. If start is greater than stop and the step is positive, an empty table will be returned.
Finally, GENERATE_SERIES requires a compatibility level of 160 or higher.
Group data in intervals with DATE_BUCKET
Time series data is often grouped into fixed intervals, or buckets, for analytical purposes. For example, sensor measurements taken every minute may be averaged over 15-minute or 1-hour intervals. While GENERATE_SERIES and DATEADD are used to create the buckets, we need a way to determine which bucket/interval a measurement belongs to.
The DATE_BUCKET function returns the datetime value corresponding to the start of each datetime bucket for an arbitrary bucket size, with an optional parameter to define the origin from which to calculate each bucket. If no origin is provided, the default value of Jan 1, 1900, will be used as the origin date:
DATE_BUCKET (datepart, number, date, origin)
The following example shows the buckets for Dec 10, 2022, for several date parts with a bucket size of 1 and an origin date of Jan 1, 2022:
Notice how the date bucket value for the Week date part is Dec 10, 2022, which is a Saturday. That's because the provided origin date (Jan 1, 2022) is also a Saturday. (Note the default origin date of Jan 1, 1900, is a Monday). Therefore, when working with the Week date part, if you want your Week bucket to begin on a Sunday then be sure to use a known origin that falls on a Sunday.
Where DATE_BUCKET becomes especially useful is for bucket sizes larger than 1, for example when grouping data in 5-minute or 15-minute buckets.
DATE_BUCKET provides an easy way to determine which time-based interval a timestamped measurement belongs to using any arbitrary-sized interval.
Gap analysis with FIRST_VALUE and LAST_VALUE
FIRST_VALUE and LAST_VALUE are not new functions to SQL Server 2022; what is new is how NULL values are handled. In previous versions of SQL Server, NULL values are preserved.
When working with time series data, it's possible to have gaps between measurements. Ideally, gaps are filled in with an imputed value. When using FIRST_VALUE and LAST_VALUE to compute the value corresponding to an interval, preserving NULL values isn't ideal.
In the following example, a series of sensor readings taken at 15-second intervals has some gaps:
If analyzing the data in 1-minute intervals (using DATE_BUCKET), the default value returned by FIRST_VALUE will include the null values:
SELECT [timestamp] , DATE_BUCKET(MINUTE, 1, [timestamp]) AS [timestamp_bucket] , SensorReading , FIRST_VALUE (SensorReading) OVER ( PARTITION BY DATE_BUCKET(MINUTE, 1, [timestamp]) ORDER BY [timestamp] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS [Default (RESPECT NULLS)]FROM MachineTelemetryORDER BY [timestamp];
FIRST_VALUE and LAST_VALUE include new syntax (IGNORE NULLS or RESPECT NULLS) in SQL Server 2022 which allows you to decide how NULL values should be handled:
RESPECT NULLS is the default behavior and will include null values in the result when computing the first or last value within a partition. Specifying IGNORE NULLS will cause NULL values to be excluded when computing the first or last value over a partition.
SELECT [timestamp] , DATE_BUCKET(MINUTE, 1, [timestamp]) AS [timestamp_bucket] , SensorReading , FIRST_VALUE (SensorReading) IGNORE NULLS OVER ( PARTITION BY DATE_BUCKET(MINUTE, 1, [timestamp]) ORDER BY [timestamp] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS [Default (RESPECT NULLS)]FROM MachineTelemetryORDER BY [timestamp];
The new options for IGNORE NULLS and RESPECT NULLS allow you to decide how null values should be treated when analyzing your data.
Learn more
Get started with SQL Server 2022 today. For more information and additional examples, visit the following resources:
Cardinality estimation (CE) is a process used by the query optimizer to estimate the number of rows returned by a query or sub-part of a query. These estimates rely on two underlying things: statistics about the data and a set of assumptionsknown as the model. The model determines how we interpret statistics and data coming from various sub expressions, and so, for any given set of statistics, different models will arrive at different cardinality estimates.
SQL Server 2022
The most Azure-enabled release yet, with continued performance, security, and availability innovation.
Until SQL Server 2022, CE could only have one model. The set of assumptions used was baked into the code of the server, and so whatever model was picked is what was used for all queries. However, we know that not all queries and workloads perform best under one single model. For some queries, the model we use works out well, but for others, a different model would perform better. With CE Feedback in SQL Server 2022, we can now tailor the model used to generate a query play to the specific query.
CE has always had three basic assumptions that comprise the model: independence (or partial independence), uniformity, and containment. These three assumptions determine how we interpret histograms, and they determine how we combine data during joins or in the presence of multiple predicates. In this blog, I will explain these model variants and what they mean in more detail.
Uniformity
Let's begin by discussing uniformity assumption. This assumption is used when interpreting data from the on-disk histogramsabstracted data about the columns being queried. We assume that all data within steps, or buckets, of a histogram is uniformly distributed at an average frequency for that bucket. Thus, when we query data, this allows us to determine the number of rows that satisfy the predicate.
Now, CE Feedback modifies the uniformity assumption only in one special casethat of Row Goal queries. These queries look like TOP n, or Fast n, or IN. and there is a special optimization for row goal queries that relies on the independence assumption. Whenever we believe that a particular value occurs at a high enough frequency (based on our interpretation of the histogram using the independence assumption), we choose to do a quick scan of a few pages assuming that we will get enough qualifying rows very quickly. However, if the data is skewed, we may have falsely assumed more qualifying values than were actually present. This means we scan far more pages than expected to get the requisite number of rows.
CE Feedback can detect such scenarios and turn off the special row goal optimization. If it turns out that the query is indeed faster without this assumption, we keep this change by persisting it in the query store in the form of a query store hint, and the new optimization will be used for future executions of the query.
Independence
Consider a where clause with two predicates, combined with an AND. Something like City='Seattle" AND State='WA'. Under the model assumption of independence, we would take the selectivity of the individual predicates (City='Seattle', State='WA') and multiply those probabilities together. Under the model assumption of correlation, we would take the most selective predicate (City='Seattle') and use the selectivity of that predicate only to determine the selectivity of the conjunctive clause. There is a third model of partial correlation, in which we multiply the selectivity of the most selective predicate with a weakened selectivity (raised to a power less than 1, to make the selectivity closer to 1) of the successive predicates.
Figure 1: Independence. P1 and P2 are independent – that is, the truth (or falsehood) of P1 tells us nothing about the truth or falsehood of P2.
Figure 2: Partial Correlation. In cases where we are evaluating P1 = T and P2 = T, you can see that P1 being true gives a higher likelihood of P2 being true. The cases where P1 is false are greyed out because they do not satisfy P1=T.
Figure 3: Complete correlation. When evaluating P1=T and P2 = T, we can see that anytime P1 is true, P2 is also true (P1 implies P2).
CE always starts out with this last model of partial independence (referenced in other places as exponential backoff), but with CE Feedback, we can see if our estimates are too high, meaning the predicates are more independent, or too low, meaning that there is more correlation than expected, and adjust the model used for that query and those predicates accordingly for future executions. If this makes the plan or performance better, we persist this adjustment using a query store hint, and use it for future executions.
Containment
The model assumption of containment means that users query data that is actually in the table. Meaning, if there is a column = constant predicate in the table, we assume that the constant actually exists in the table, at the frequency appropriate for where it falls within the histogram. However, we also assume that there is a containment relationship between joins. Basically, we assume that users wouldn't join two tables together if they didn't think there would be matches. However, there are two ways of looking at the containment relationship between joins: Base containment, or Simple containment.
Base containment assumes that there is an inherent relationship between the tables participating in a join but doesn't make assumptions about the filters occurring on top of those tables before the join occurs. A good example might be a table of store sales and a table of store returns. We would assume that all things returned were also sold, but we would not assume that any filtering on the sales or returns tables makes containment more or less likelywe just assume containment at the base table level and scale the size of the estimated result up or down based on the filters in play.
Simple containment is a bit differentinstead of assuming some inherent relationship between the base tables, it assumes that the filters applied to those tables create a containment relationship. For example, querying for graduating seniors from a specific high school and joining that with a query for athletes in a given zip code. While there is some inherent relationship between the two tables a priori, the filters applied specifically limit and create a containment relationship.
CE starts with the base containment model for all queries. However, if the estimates for the join are 'off' in some waythe incoming estimates are good, the outgoing estimates are badwe try the alternate containment model. When the query is executed again, we try out the other model, and if it is better, we persist it with a query store hint and use it for future executions.
Conclusion
In summary, CE requires a basic set of assumptions that are used to interpret and combine statistical data about histograms or sub-parts of a query. Those assumptions work well for some queries, and less well for others. In SQL Server 2022, we have introduced a method of CE Feedback which adjusts those assumptions in a per-query way, based on actual performance of the query over time.
In the face of changing market conditions and pressure to accelerate growth, the ability for customers to do more with less is critical. And while the cloud has long been the north star for realizing efficiencies and accelerating innovation, at Microsoft, we understand that these benefits also need to happen outside of the cloud. In 2021 we announced the general availability of Azure Arc-enabled SQL Managed Instance. Previously only available in Azure, Azure Arc-enabled SQL Managed Instance allows customers to build new cloud native applications on any infrastructure, in their on-premises environments and across other public clouds. Now, we are offering a way for customers to make the most of their legacy applications, with Azure Arc-enabled SQL Server.
Yesterday, we announced the general availability of SQL Server 2022, the latest milestone in the 30 year history of SQL Server. This newest edition of SQL Server delivers continued innovation with new hybrid and multicloud capabilities that allow customers to manage and protect their SQL environments in more ways than ever before, no migration required. Together with these innovations, we are also introducing a consumption-based cloud billing model to purchase SQL Server enabled by Azure Arc.
Bring Azure Innovation to SQL Servers Anywhere
Azure Arc is a bridge that extends the Azure platform to help customers build applications and services with the flexibility to run on-premises, at the edge, and in multicloud environments. With Azure Arc, customers can manage their entire SQL estate through a single pane of glass and delivers multilayered security and improved data governance through a single integrated solution.
When customers enable Azure Arc on SQL servers, they can unlock more robust functionalities:
Manage
Manage their entire SQL estate in the Azure portal and get a single, unified view into their entire SQL Server estate to better manage inventory and licenses. Leverage SQL Best Practice Analyzer to automatically evaluate SQL Server environments for performance, scalability, and business continuity, all with no additional cost.
Govern
Microsoft Purview is a family of data governance, risk, and compliance solutions that can help organizations govern, protect, and manage their environments. With Azure Arc, SQL Server customers will have Purview access policies readily available for those on-premises environments to centralize insights and governance across their entire data estate.
Secure
We also enabled Microsoft Defender for Cloud for SQL Server. Microsoft Defender enhances the security posture of a customer's SQL Server environments, no matter where they reside. Microsoft Defender continuously scans and proactively identifies any vulnerabilities or attacks that are happening within SQL Server environments. This multi-layered security is made possible through Azure Arc, so data is protected both at the powerful database engine level and by Azure security capabilities from the cloud.
SQL Server 2022 customers will also be able to benefit from Single Sign-On with Azure Active Directory to secure identities. These are security capabilities that customers have asked for in their on-premises environments for a long time and we are excited to be able to bring them to our SQL server customers.
Cloud billing model for all SQL workloads*
We are excited to provide customers with a new consumption-based, cloud billing model that provides flexibility to innovate quickly and save on costs. With Pay-as-you-go billing on SQL Server, customers can benefit from:
Flexibility to choose between consumption-based licensing or the perpetual SQL Server licenses.
Improved cost efficiencies for both Standard Edition and Enterprise Edition. Eliminate the need to pay full upfront licenses and pay for only what is used, by the hour, which is ideal for spike workloads or ad-hoc usage.
Support for multicloud and hybrid deployments with consistent billing for all a customer's entire SQL environment, no matter where they reside.
The PAYG model is enabled by Azure Arc, so connection of the SQL Server to Azure via Azure Arc is a prerequisite. For SQL Server 2022 customers, the Azure Arc connection is a default part of the SQL Server 2022 setup process and the PAYG billing is available now. For customers with prior versions of SQL Server (2014 editions and above), this billing model will be enabled in the Azure Portal (coming soon) once an Azure Arc connection is established.
Innovate anywhere with Azure Arc today
Azure Arc was built to bring Azure innovation anywhere. With this week's announcements, our SQL Server customers can unlock cloud-native features on-premises, while benefiting from a more secure and streamlined management experience for their SQL Server environments, no matter where they reside. For SQL Server 2022 customers, these benefits will come already integrated with SQL Server. But customers with SQL Server 2014 and above can unlock many of the same innovations with the help of Azure Arc.
With Azure Arc, SQL Server customers will be able to maximize the value of their investments with technology that allows them to do more with less.
Today, we announced the general availability of SQL Server 2022, the most Azure-enabled release of SQL Server yet, with continued innovation across performance, security, and availability1. This marks the latest milestone in the more than 30-year history of SQL Server.
SQL Server 2022 is a core element of the Microsoft Intelligent Data Platform. The platform seamlessly integrates operational databases, analytics, and data governance. This enables customers to adapt in real-time, add layers of intelligence to their applications, unlock fast and predictive insights, and govern their datawherever it resides.
SQL Server 2022's connections to Azure2, including Azure Synapse Link and Microsoft Purview make it easier for customers to drive deeper insights, predictions, and governance from their data at scale. Azure integration also includes managed disaster recovery (DR) to Azure SQL Managed Instance, along with near real-time analytics, allowing database administrators to manage their data estates with greater flexibility and minimal impact to the end user3.
Performance and scalability are automatically enhanced via built-in query intelligence. Security innovation, building on SQL Server's track record as being the least vulnerable database over the last 10 years, continues with Ledger for SQL Server, which uses blockchain to create a tamper-proof track record of time of all changes to the database.
Watch how one of our customers, Mediterranean Shipping Company, is already taking advantage of the new capabilities in SQL Server 2022.
Azure-enabled features
Link feature for Azure SQL Managed Instance: To ensure uptime, SQL Server 2022 is fully integrated with the new link feature in Azure SQL Managed Instance. With this new capability, you benefit from a PaaS environment applied to disaster recoveryallowing you to spend less time on setup and management even when compared to an IaaS environment. This works by using a built-in Distributed Availability Group (DAG) to replicate data to a previously deployed Azure SQL Managed Instance as a DR replica site. The instance is ready and waiting for whenever you need itno lengthy configuration or maintenance required. You can also use this link feature in read scale-out scenarios to offload heavy requests that might otherwise affect database performance. We are working on building out more capabilities to support online disaster recovery.
Azure Synapse Link for SQL: Previously, moving data from on-premises databases, like SQL Server, to Synapse required you to use extract, transform, and load (ETL). Configuring and running an ETL pipeline is time-consuming, and insights often lag behind what is happening at any moment. Azure Synapse Link for SQL Server 2022 provides automatic change feeds to capture the changes within SQL Server and feed them into Azure Synapse Analytics. Synapse Link provides near real-time analysis and hybrid transactional and analytical processing with minimal impact on operational systems. Once the data comes to Synapse, you can combine it with many different data sources, regardless of their size, scale, or format, and run powerful analytics over all of it using your choice of Azure Machine learning, Spark, or Power BI. Because the automated change feeds only push what is new or different, data transfer occurs much faster and allows for near real-time insights, all with minimal impact on the performance of the source database in SQL Server 2022.
"Synapse Link for SQL Server 2022 helps us to seamlessly replicate operational data in near real-time to be able to have more powerful analytics."Javier Villegas, IT Director for DBA and BI Service, Mediterranean Shipping Company.
Microsoft Purview integration: Microsoft Purview is a unified data governance and management service. We are excited to highlight that SQL Server is also integrated with Microsoft Purview for greater data discovery, allowing you to break down data silos. Through this integration you will be able to:
Automatically scan your on-premises SQL Server for free to capture metadata.
Classify your data using built-in and custom classifiers and Microsoft Information Protection sensitivity labels.
Set up and control specific access rights to SQL Server.
Additional Azure-connected features: SQL Server 2022 has a number of additional Azure-enabled features. A simple connection to the Azure Arc agent, part of the default setup process for SQL Server 2022, enables additional capabilities, including:
Single view of all SQL Servers deployed on-premises, in Azure and other clouds.
Fully automated technical assessment for SQL Server at no additional cost, to help you optimize your database's performance, scalability, security, business continuity, and more.
Protect your on-premises data using Microsoft Defender for Cloud.
Secure identities with Single Sign-On and Azure Active Directory.
Pay-as-you-go billing.
Continued innovation to the core SQL Server engine
Performance: Performance is critical. On the SQL Server Engineering team, our core engine feature principles are: do no harm, no app changes required. With SQL Server 2022, performance enhancements come without requiring code changes by the end user.
SQL Server continues to offer differentiated performance, with #1 OLTP performance, #1 Non-Clustered DW performance on 1TB, 3TB, 10TB, and 30TB according to the independent Transaction Processing Performance Council. In SQL Server 2022:
With Query Store, we are adding support for read replicas and enabling query hints to improve performance and quickly mitigate issues without having to change the source T-SQL.
With Intelligent Query Processing, we're expanding more scenarios based on common customer problems. For example, the "parameter sensitive plan" problem refers to a scenario where a single cached plan for a parameterized query is not optimal for all possible incoming parameter values. With SQL Server 2022's Parameter Sensitive Plan optimization feature, we automatically enable the generation of multiple active cached plans for a single parameterized statement. These cached execution plans will accommodate different data sizes based on the provided runtime parameter values.
"As a company with 24/7 availability requirements, we are looking forward to embracing all SQL Server 2022 features that can make database failover faster, such as Buffer Pool Parallel Scan, ParallelRedo and Accelerated Database Recovery(ADR) enhancements. On the development side, we expect to further utilize continuous improvements in the Intelligent Query Processing package. In an environment with a lot of servers and huge databases, even when you have people and resources to deal with performance issues, each feature that can improve performance or fix performance issues automatically or without touching the code is very valuable." Milos Radivojevic, Head of MSSQL Database Engineering at Entain.
Security: Over the past ten years, SQL Server has had few vulnerabilities. Building on this, the new Ledger for SQL Server feature creates a tamper-evidence track record of data modifications over time. This detects tampering by malicious actors and is beneficial for scenarios such as internal and external audits.
Availability: With Contained Availability Groups you can create an Always On availability group that manages its own metadata objects (users, logins, permissions) at the availability group level in addition to the instance level with contained availability groups. Additionally, it keeps multi-write environments running smoothly when you have users across multiple locations. With SQL Server 2022, we are automating the last-writer wins rule to ensure that when a conflict is detected, the most recent modification time will be chosen to be persisted on all replicas.
New pay-as-you-go SQL Server billing model, enabled by Azure Arc
Today, we are also excited to announce a new billing model that provides flexibility to innovate quickly and move as fast as you do.
Via a simple connection to Azure Arc, a default part of the SQL Server 2022 setup process, you now have access to a new cloud-enabled billing model for SQL Server, providing you with cost efficiency to pay only for what you use. Pay by the hour for consumption spikes and ad hoc usage without the need for upfront investment. Learn more in the announcement blog post.
Azure SQL migration offer
If you're ready to start your journey to the cloud, Microsoft can help. Today we are announcing a new offer, the SQL + Apps Migration Factory. This program can assess and migrate qualifying low complexity SQL Server applications and databases to Azure SQL at no cost to you.4 Talk to your Microsoft account team or apply now at aka.ms/SQLAppsMigrationFactory to get started.
SQL Server IoT 2022
We are also announcing SQL Server IoT 2022, which is designed for fixed function use cases and licensed through the OEM channel under special dedicated use rights. You can read more about SQL Server IoT 2022.
Learn more and get started with SQL Server 2022 today
Learn more about SQL Server 2022 at the PASS Data Community Summit, including Microsoft and community-delivered keynote and general sessions.
[1] SQL Server 2022 free editions (Developer edition, Express edition) are available to download starting today. SQL Server 2022 paid editions (Enterprise edition, Standard edition) will be available in Volume Licensing (Enterprise Agreement, Enterprise Agreement Subscriptions) and MPSA starting today, which represents the majority of SQL Server customers. Customers purchasing via CSP, OEM, and SPLA can begin purchasing SQL Server 2022 in January 2023.
[2] For all Azure-connected features of SQL Server 2022, customers can optionally enable these capabilities based on business requirements.
[3] The bidirectional disaster recovery capability of the Link feature for Azure SQL Managed Instance is available in limited public preview.Sign upfor early access. General availability will occur at a future date.
[4] Subject to the limitations described in the full SQL + Apps Migration Factory program specifications here, and provided that the SQL Server workloads are low complexity with no code changes, Microsoft agrees to assess and migrate SQL Server databases and SQL Server-associated applications from your datacenter or AWS EC2 to Azure at no cost to customer. Migrations must be completed by June 30, 2023.
We are excited to announce general availability of the link feature for Managed Instance for SQL Server 2022the most Azure-enabled release of SQL Server yet. We are also excited to announce a limited preview of the managed disaster recovery (DR) to Azure SQL Managed Instance.
Link feature for SQL Server 2022
Overview
The link feature, announced in the March SQL Server 2022 preview, is reimagining the connection between SQL Server hosted anywhere and the fully managed platform as a service (PaaS) service Azure SQL Managed Instance, providing unprecedented hybrid flexibility and database mobility. With an approach that uses near real-time data replication to Azure using Always On technology, you can now scale workloads, or analytics to read-only secondaries in Azure to take advantage of a fully managed database platform, performance, and scale.
Modernize in Azure
The link can be operated for as long as you need itmonths and years at a time, empowering you to get all the modern benefits of Azure today without migrating to the cloud. For example, you can use integrated machine learning services in Managed Instance to run analytics on SQL Server databases replicated to the instance using the link. You can also relay on the automated short-term and long-term backups of up to 10 years for your replicated SQL Server databases to Managed Instance in Azure Backup, reducing the management and operational costs of on-premises infrastructure.
Security
The link is secure, as VPN or Express Route (Azure’s premium VPN) is used to connect SQL Server to Managed Instance via a private endpoint. Data transmitted over the wire is encrypted, and we also support replication of transparent data encryptionencrypted-encrypted databases providing an additional layer of security. The link was built to be resilient, in case of the network being down, SQL Server being rebooted, or maintained, or in case of some other issue, the link will automatically resume replicating where it has left off once the issue has been resolved.
Migration benefits
On your modernization journey, when and if you are ready to migrate to the cloud, the link de-risks your migration experience allowing you to validate your workloads in Azure prior to migrating with a seamless and instant experience, and at your own pace. The link also provides the best minimum downtime solution to migrate from SQL Server to Managed Instance. Further de-risk your migrations with our backup portability making it possible to take backups on Managed Instance and restore them back to SQL Server 2022.
Experience the link today with a fully integrated easy-to-use experience in SQL Server Management Studio (SSMS) 19.0 and above. Learn more at the Link feature for Azure SQL Managed Instance.
Figure 1: Link feature for Managed Instance—the ultimate hybrid experience for SQL Server
Managed Disaster Recovery (preview)
We have seen a tremendous interest for the disaster recovery (DR) feature through our SQL Server 2022 Early Adoption Program (EAP) with hundreds of companies applying to preview the feature. With the managed disaster recovery feature you can ensure uptime, connecting SQL Server 2022 to Managed Instance in Azurea fully managed PaaS. Managed DR allows you to spend less time on setup and management compared to an infrastructure as a service (IaaS) environment.
Figure 2: Managed DR in the cloud between SQL Server 2022 and Managed Instance
The DR feature can also be used to further de-risk your migrations to Azure, as it allows an online failback from Managed Instance back to SQL Server 2022 with minimum downtime. With a fully integrated SSMS experienceat this time available to preview participantscomplexity of running your own DR is reduced to only a few clicks away.
Figure 3: DR for the link feature in preview version of SSMS
To apply for participation in the limited public previewof the DR for the link feature, sign up below:
I look forward to seeing you at PASS Data Community Summit, a conference curated for the Microsoft data platform professional. Hosted by Redgate, this hybrid event will include the latest innovation across SQL Server and the entire Microsoft Intelligent Data Platform, as well as practical training and networking to empower you to transform your career and your organization.
Join us live in Seattle or virtually online, but don't miss this unique opportunity to learn from the experts how to bring the full value of Microsoft Intelligent Data Platform to your organization.
This year, we continue the tradition of hosting a Microsoft day one keynote. We will also deliver a record-breaking 38 sessions, covering a wide array of themes and learning paths.
We kick off the first official day of Summit on Wed., November 16. I invite you to join me and the Microsoft engineering team to learn how Microsoft's Intelligent Data Platformincluding data, analytics, and governancecan come together to help you get the full benefit of your data estate and do more with less. During the keynote, I will share new product announcements that you won't want to miss, along with demos of some of the latest innovations from engineers like Anna Hoffman, Bob Ward, and Patrick Leblanc.
The keynote will be both live and in person at the Seattle Convention Center and live-streamed online and recorded, meaning registered attendees will have on-demand access. We highly encourage you to watch it live as we'll discuss fresh topics and introduce some surprises along the way. Keep your eyes and ears open.
The event also connects you to peers and professionals while giving you the chance to talk with our engineers and advocates to learn about our latest solutions during our sessions and at our booth.
To register for Summit, visit the website and sign up today. Don't forget to use the code AZURE175 at checkout to receive $175 off on the 3-Day, in-person conference pass.
Learn More
Learn more about our other sessions and Pre-cons in our post.
My team and I look forward to seeing you next week!
Memory grant feedback (MGF) is an existing feature in SQL Server, with two critical improvements available in SQL Server 2022: feedback persistence, and percentile grant feedback. These two features enhance the benefits of memory grant feedback as it already existed in SQL Serverallowing for less re-learning of appropriate grants and preventing fluctuating grant requirements from blocking the benefit of memory grant feedback.
Memory grant feedback
This existing feature aims to prevent costly spills or wasteful memory allocation of queries by remembering the memory usage of previous executions of the query and adjusting the grant based on previous data. This helps to prevent spills for underestimates, and to increase throughput of a workload by trimming overly large memory grants to a size that better fits a query.
Memory grant persistence
In prior versions of SQL Server, memory grant feedback data was stored only in the query plan cache. Thus, whenever the plan was evicted from cache, or in the case of failover/server restart, the system would have to re-learn the memory grant feedback from scratch. It seems prudent to store relevant information in the query store. This way it could be retrieved at any time from disk, with no concern for performance degradation after cache eviction or server restart. Because query store is on by default in SQL Server 2022, this feature is an obvious win for improving the overall effectiveness of memory grant feedback.
Percentile grant feedback
Prior to SQL Server 2022, memory grant feedback was only determining the current grant adjustment based on the single most recent execution of the query. However, in some casesespecially those where there is a cached plan for a stored procedure in which different parameters cause vastly different result set sizes (and thus vastly different memory requirements)this can trigger a severe anti-pattern of alternating request sizes and always-wrong memory grant adjustments. This pattern is shown in the image below:
In this example, the first query execution needs 800MB and is given 800MB. On the second execution, the query requires only 5MB but is given 800MB. MGF will realize that this was a massive over-grant and adjust the subsequent execution grant to 5MBbut we are back the original parameter for which 800MB is needed! When MGF only looks at the single prior execution, this undesirable pattern can occur. Before SQL Server 2022, MGF will detect this scenario and disable itselfnoticing that it is not helping to improve the workload.
SQL Server 2022 introduces percentile granta way of looking at more than just the single prior execution of a query. With percentile grant, we can look back to the history of executions and adjust memory grants based on a larger set of data points. We always err toward providing more memory to avoid spills, as spills are typically substantially more impactful to the customer than hits to throughput from an oversize grant. Thus, in this parameter-sensitive scenario, we might end up with a pattern of grants and executions that looks more like the following:
In this example, you can see that the first execution required a very low memory grant, but the second execution required a much higher grant, which it was not given due to the first execution. The third execution requires a smaller grant size, but percentile grant (blue line) gives it a grant higher than the last required grant, and the algorithm that we had previously (orange line) would grant only the amount needed in the prior execution. Over time, the blue line fine-tunes itself to the upper limit of the memory grants required by the query, whereas the prior grant algorithm finds itself out of phase with the grant required.
Taken together, these two requirements to memory grant feedback make the feature more reliable and more robust. By persisting the feedback, it becomes robust to failovers, restarts, and cache evictions. By using the new percentile grant algorithm, we are able to respond much more effectively to queries with a widely vacillating grant requirementallowing the feature to continue to bring benefit to the customer even in pathologically bad scenarios. These two improvements to memory grant feedback are just one of the many intelligent query processing improvements in SQL Server 2022.
DOP inefficiencies are a constant challenge; Current DOP methods are inefficient
The degree of parallelism (DOP) with which a query is executed can greatly impact its performance. Any time a query is using parallelism, there is always the question of if it’s using the right amount of parallelism. Sometimes, if the degree of parallelism is too high, it can introduce inefficiencies into the query execution. If the degree of parallelism is too low, we may be missing out on some of the speed-up that parallelism can provide. Users can manually set a maximum degree of parallelism for a query or for a server using the MAXDOP setting or hint. However, it has been a constant challenge for users to manually determine and tweak the correct degree of parallelism for each query. At most, they would set the MAXDOP specifically when they notice a problemthey don't typically try to determine the optimal degree of parallelism for each query in their workload.
Degree of parallelism feedback: feature overview
In SQL Server 2022, we introduced a new feature called DOP feedback. This feature will look at any parallel query and determine if it might perform better with a lower degree of parallelism than currently being used. For example, perhaps 16 threads will perform better than 20 if there are a lot of waits on other threads. It will test out the new degree of parallelism and, either decide that this was a good change and keep the 16 threads, or it will revert to previous levels of parallelism and go back to 20 threads. If the new degree of parallelism is good, then this optimization is persisted inside the query store and will be applied appropriately to a query for future executions.
Figure 1: DOP feedback reduces the degree of parallelism in a stepwise fashion, incrementally decreasing the degree of parallelism and verifying at each step.
DOP feedback never increases the degree of parallelism, at best, it will revert to a stable previous DOP, and it works incrementally, meaning instead of trying to drastically lower the degree of parallelism all at once, it will try a slightly lower degree of parallelism. Then if that’s good, it might try another slightly lower degree of parallelism. If the new, even lower degree of parallelism is good, it might try to reduce again down to the degree of parallelism of two, although it will not make a parallel plan become serial. If the new, lower DOP is not as good, we go back to the previous known good DOP and keep the query at that level.
Example
A query is compiled with a degree of parallelism of 32. This means that the query will split off 32 different threads to execute the query. If DOP feedback detects a fair amount of wait times between threads and CPU overhead, it will suggest a lower DOPsay, 20. On the next execution, the query will execute with a DOP of 20. If the performance is better over the next several executions, the DOP of 20 will be considered stabilized. However, DOP feedback may then determine that there are still too many waits and further attempt a DOP of 16. Again, several executions are used to verify the feedback. Then, perhaps, a DOP of 8 is tried. If after several executions the DOP 8 performance is not better, then the system will return to suggesting a DOP of 16 as the most recent, stable, and verified DOP.
Simple setup and easy optimization
With this feature enabled, all of this is done without triggering query recompiles, and without user action.
DOP feedback for SQL 22 addresses a long-held challenge for our customers: finding the right degree of parallelism for each query without having to manually test and tweak each query for optimal performance. This is one of a handful of features in Intelligent Query processing that will provide customers with the best performance and a low-touch to no-touch experience. You can read more about other similar features in the SQL 2022 blog series.