Vue normale

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.
À partir d’avant-hierLatest Blogs from the Microsoft Tech Community

What is causing the Always On AG issue? Is it cluster, AD, DNS, or SQL?

Do any of these issues sound familiar?

 

- My AG fails to failover

- Why did my AG failover? 

- Why did my AG fail?

- My AG is missing!

- Why am I seeing stale data on secondary? 

- What is with the sudden log growth alerts?

 

You are either hearing from end users that they are seeing stale data, or they are losing connections to availability database or listener, or you have monitoring solutions in place to catch these proactively and mitigate but are looking for a root cause. In either scenario, it helps to understand the high-level architecture of Availability Groups and the troubleshooting resources available for these components.

 

1.0 Overview of Availability Group

 

AG Overview.png

 

A picture is worth a thousand words. The above picture depicts the different high-level components that are at play with Always On Availability Groups.  This article attempts to give a high-level conceptual overview of the different components, a map to common failure troubleshooting articles, and specific support topics/component to use when calling Microsoft support for further assistance. This article is not an all-encompassing map of all the issues that one could run into with AG, as there could be various issues based on specific environment and workloads. However, this is an attempt to include the common areas of failures and the logs to look at for narrowing the issue and component down.

 

2.0 Why WSFC with AG? 

 

As you can see from the picture, in a Windows environment, when AG is deployed for HA, a Windows Server Failover Cluster (WSFC) is required. WSFC monitors the health of the AG cluster role and also controls operations such as offline/online of the AG cluster role.

 

  • AG is a cluster role on Windows Server failover Cluster (WSFC).
  • Cluster service has a Resource Control Manager (RCM) component that is responsible for bringing the cluster resource group to a persistent state. RCM also negotiates with all the RCM instances on all the cluster nodes to ensure that the AG cluster role is online on at most one cluster node at a time. RCM is responsible for bringing the resources in the cluster resource group online in the order of dependencies.
  • In order to keep the cluster process stable and not load the custom resource DLLs into the cluster process, a child process RHS (Resource Host Subsystem), also called a Resource Monitor, hosts the DLLs for cluster resources.  SQL Server Always On Availability group has a resource DLL (hadrres.dll) that is loaded into the RHS process. 
  • If you are interested in reading more about failover cluster software components, here is a good article - Creating a Cluster Resource DLL (Part 1).

 

WSFC component performs mainly the following functions:

 

  • Heartbeat communication between cluster nodes

Heartbeat messages are sent between cluster nodes to ensure that the cluster is in a healthy and consistent state. IPv6 is the default protocol that Failover Clustering will use for its heartbeats. The heartbeat itself is a UDP unicast network packet that communicates over Port 3343. 

 

- Delay defines the frequency in seconds at which cluster heartbeats are sent between nodes (default = 1 second in same subnet and cross subnet)

- Threshold defines the number of heartbeats which are missed before the cluster takes recover action. (default = 10 heartbeats in same subnet, 20 in cross subnet)

- Reference: Detailed heartbeat thresholds are documented here Tuning Failover Cluster Network Thresholds - Microsoft Community Hub

 

So, with default delay of 1 second and default threshold of 10 heartbeats on a single subnet WSFC environment, what do you expect to see when a certain cluster node does not respond to heartbeats for 10 seconds? This node is removed from active failover cluster membership. You will see a 1135 event on your system log with a message that goes like" Cluster node 'N1' was removed from the active failover cluster membership... " For more details on how to troubleshoot this issue, refer to A problem with deleting a node | Microsoft Learn

 

  • Cluster Quorum

Quorum mechanism is used to prevent split brain and also ensure a consistent and healthy state of WSFC. Cluster goes offline in the event of quorum loss. When cluster service shuts down due to quorum loss, Event ID 1177 is logged in event viewer under Microsoft-Windows-FailoverClustering with a message that goes like "The Cluster service is shutting down because quorum was lost. ... " For more information on troubleshooting this, please refer to : Event ID 1177 — Quorum and Connectivity Needed for Quorum | Microsoft Learn

 

  • Lease (Looks-Alive) 

This is a continuous handshake between SQL Server instance and SQL Server resource DLL to ensure that the AG resource looks alive.  A lease timeout may result in the following errors:

 

2016-02-16 11:37:03.05 Server      Error: 19419, Severity: 16, State: 1.
2016-02-16 11:37:03.05 Server      Windows Server Failover Cluster did not receive a process event signal from SQL Server hosting availability group 'ag' within the lease timeout period.
2016-02-16 11:37:03.07 Server      Error: 19407, Severity: 16, State: 1.
2016-02-16 11:37:03.07 Server      The lease between availability group 'ag' and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.

 

Other errors with lease timeout: Improved Always On Availability Group Lease Timeout Diagnostics - Microsoft Community Hub

 

  • Health Check (IsAlive) - 

Two values control the Always On health check: FailureConditionLevel and HealthCheckTimeout. The FailureConditionLevel indicates the tolerance level to specific failure conditions reported by sp_server_diagnosticsand the HealthCheckTimeout configures the time the resource DLL can go without receiving an update fromsp_server_diagnostics. The update interval forsp_server_diagnosticsis always HealthCheckTimeout / 3. If SQL Server does not respond with the results from executing sp_server_diagnostic within the HEALTH_CHECK_TIMEOUT (default is 30 sec), then the availability group will transition to RESOLVING state and failover if configured to do so. (Reference/CreditDiagnose Unexpected Failover or Availability Group in RESOLVING State - Microsoft Community Hub)

 

3.0 How is AD/DNS involved?

 

When a cluster is created, a CNO (Cluster Name Object) is created in Active Directory. CNO has IP address(es) as dependencies and the CNO and IP are used for management and cluster operations. The listener also referred to as a Virtual Network Name (VNN) has a network name and IP address(es) and listener is also created in Active Directory. If the account that is used to create the cluster or listener on AD, does not have full permissions to perform the operation, CNO and listener can be pre-staged on AD.

 

Both the CNO network name and listener VNN network name are also present on DNS for mapping between the network name and IP address(es). Reference on DNS: DNS Registration with the Network Name Resource - Microsoft Community Hub

 

Some common issues on AD/DNS is permissions related - 

 

  • Listener creation failure with error 19471 or 19476

Msg 19471, Level 16, State 0, Line 2
The WSFC cluster could not bring the Network Name resource with DNS name '<DNS name>' online. The DNS name may have been taken or have a conflict with existing name services, or the WSFC cluster service may not be running or may be inaccessible. Use a different DNS name to resolve name conflicts, or check the WSFC cluster log for more information.

 

Msg 19476, Level 16, State 4, Line 2
The attempt to create the network name and IP address for the listener failed. The WSFC service may not be running or may be inaccessible in its current state, or the values provided for the network name and IP address may be incorrect. Check the state of the WSFC cluster and validate the network name and IP address with the network administrator.

 

Troubleshoot article: Create Listener Fails with Message 'The WSFC cluster could not bring the Network Name resource online' - Microsoft Community Hub

 

  • Connection timeouts in multi-subnet availability group

Connection Timeouts in Multi-subnet Availability Group - Microsoft Community Hub

Connect to an availability group listener - SQL Server Always On | Microsoft Learn

 

  • Driver and Client support 

Driver and client connectivity support for availability groups - SQL Server Always On | Microsoft Learn (This table shows the driver versions that have support for Always On parameters like Application Intent, multi-subnet failover, read-only routing etc.)

 

4.0 How is SQL Server involved?

 

Data movement from the primary replica to the secondary replica(s) is done through log block transmission and redo process.

 

AG Data Flow.png

Reference for the above image: Troubleshooting data movement latency between synchronous-commit Always On Availability Groups - Microsoft Community Hub (This article is also an excellent resource to read further)

 

5.0 My AG has an issue. Where do I look? 

 

Issue Result Logs to look at Common Error numbers Causing Component  Articles
Windows Cluster Issues (CNO, CNO IP, Listener, Listener IP, cluster service crash) AG resource offline-online, failover cluster.log, system log, Microsoft-Windows-FailoverClustering log, SQL Server errorlog

1069

1207

1135

WSFC

Troubleshooting resource for WSFC resource such as IP, disk  failed/offline issues:

Can't bring a clustered resource online troubleshooting guidance - Windows Server | Microsoft Learn

 

Troubleshooting resource for 1135

Troubleshooting cluster issue with Event ID 1135 | Microsoft Learn

 

Cluster System Log Event IDs and explanation

Failover Clustering system log events | Microsoft Learn

 

 

Listener Creation Issues Listener unavailable cluster.log, system log, SQL Server errorlog

Msg 19471, Level 16, State 0, Line 2

 

Msg 19476, Level 16, State 4, Line 2

WSFC and AD/DNS

CNO of WSFC cluster must have Crate Compter Objects Permissions.

Configure availability group listener - SQL Server Always On | Microsoft Learn

 

Troubleshooting listener creation failures and common issues

KB2829783 - Troubleshooting Always On availability group listener creation in SQL Server 2012 - Microsoft Support

Listener Connectivity Issues  Client connection failures, timeouts and errors Client connection logs, network traces, SQL Server errorlog, cluster.log, system log

0x80131904

 

0x80004005

 

"Login timeout expired."

 

 

Driver multisubnet parameter support, RegisterAllProviders cluster setting, DNS 

Listener connection times out - SQL Server | Microsoft Learn

 

Troubleshoot SQL Server connectivity issues

Troubleshoot connectivity issues in SQL Server - SQL Server | Microsoft Learn

 

Connectivity Issues caused due to SQL Server

Connectivity Problems Caused by Issues in SQL Server - Microsoft Community Hub

 

Connection timeouts in multi-subnet AG

Connection Timeouts in Multi-subnet Availability Group - Microsoft Community Hub

 

Driver and client connectivity issues with AG  Client connection failures, timeouts and errors Client side connection logs, driver logs, network traces, SQL Server errorlogs   Driver, SQL Server configuration, Networking Driver and client connectivity support for availability groups - SQL Server Always On | Microsoft Learn
Lease Timeout AG resource offline-online, failover Cluster log, System Log, System Health events, Always On Health events, SQL Server errorlog, any dumps on SQL log folder

19407

19419

19420

19421

19422

19423

19424

 

OS not responding, low virtual memory, working set paging, SQL generating dump, pegged CPU, WSFC down (loss of quorum)

How It Works: SQL Server Always On Lease Timeout - Microsoft Community Hub

 

Lease timeout common errors and corrective actions

Improved Always On Availability Group Lease Timeout Diagnostics - Microsoft Community Hub

Cluster Health Check TImeout AG resource Offline-online or Failover, FCI restart/failover

Cluster log, System Log, System Health events, Always On Health events, SQL Server errorlog, any dumps on SQL log folder

On cluster.log

[RES] SQL Server Availability Group <ag>: [hadrag] Resource Alive result 0

Failure conditions met, OS not responding, low virtual memory, working set trim, SQL generating dump, WSFC (loss of quorum), scheduler issues (dead locked schedulers) Diagnose Unexpected Failover or Availability Group in RESOLVING State - Microsoft Community Hub
Cluster Quorum Loss AG in resolving state Cluster log, System Log, Network traces

1177

The Cluster service is shutting down because quorum was lost. 

Network connectivity issues, Node instability issues, OS not responding 

WSFC Quorum Modes and Voting Configuration (SQL Server)

WSFC quorum modes & voting configuration - SQL Server Always On | Microsoft Learn

 

Troubleshooting Event ID: 1177 

Event ID 1177 — Quorum and Connectivity Needed for Quorum | Microsoft Learn

Session timeout Secondary-disconnected Cluster log, System Log, System Health events, Always On Health events, SQL Server errorlog, any dumps on SQL log folder

35206

35201

35267

 

Network communication,
Issues on secondary - down, OS not responding, resource contention

Reasons for connectivity failures between availability replicas - SQL Server Always On | Microsoft Learn

 

Troubleshooting resource:

Troubleshooting intermittent connection time-outs between availability group replicas - SQL Server | Microsoft Learn

 

MSSQLSERVER_35267 - SQL Server | Microsoft Learn

 

Cluster service permissions on SQL Server AG creation failure SQL Server error log

41131

SQL Server Permissions

Error 41131 when creating availability group - SQL Server | Microsoft Learn

Data synchronization latency due to log send queuing Log file growth, low disk space alerts, Stale data in secondary replicas Perfmon, sys.dm_hadr_database_replica_state, Always On dashboard

9002

 

Troubleshooting long send queueing in an Always On availability group - SQL Server | Microsoft Learn

 

Error 9002 when transaction log is large - SQL Server | Microsoft Learn

Data synchronization latency due to recovery queuing Stale data Perfmon, sys.dm_hadr_database_replica_state, Always On dashboard

 

 

Troubleshooting recovery queueing in an Always On availability group - SQL Server | Microsoft Learn

Availability group removed or AG replica removed    

 

 

Issue: Replica Unexpectedly Dropped in Availability Group | Microsoft Learn

Create Availability Group Fails With Error 35250 'Failed to join the database'

Endpoint not created or started, Endpoint connection issues, Name resolution issues, Endpoint permissions

 

DMVs, Telnet, Test-NetConnection

35250

 

MSSQLSERVER_35250 - SQL Server | Microsoft Learn

 

 

This is not an all-encompassing list of issues, but includes some common issues observed with Always On. Hope this helps!

 

Thanks to @Joseph Pilov  and Kathleen Carter for reviewing this article. 

 

Till next time,

Dharshana

Export to Logic Apps Standard – Latest Improvements

The Logic Apps Standard extension in VS Code provides a feature that allow customers to export workflows from Integration Services Environment (ISE) or Consumption SKUs to a Logic Apps Standard VS Code project.

 

This experience allows customers to group workflows that should be deployed together as Logic Apps Standard application, validates the workflows against the export logic – guaranteeing that workflows, once export, work successfully in the new platform, and generates a local VS Code project, allowing users to test the code locally before packaging them for deployment.

This tool is a core part of the strategy to migrate workflows off ISE, which is retiring on August 31, 2024, providing a way to export ISE workflows into Logic Apps Standard projects, allowing customers to plan their migration on their own terms.

 

For more details this export features, including a walkthrough and known issues, access the Microsoft Learn documentation for ISE or consumption.

 

What’s new in the Export Tool

 

Updated Converter List

 

The latest version of the Export Tool implemented a series of converters that will automatically replace actions based on Azure Connections to their Service Provider counterparts. The following is the list of converters implemented today:

 

  • Azure API Management
  • Azure Automation
  • Azure Blob Storage
  • Azure File Storage
  • Azure Functions
  • Azure Queue Storage
  • Azure Table Storage
  • Batch Operations
  • DB2
  • Event Grid Publisher
  • Event Hubs
  • File System
  • Flat File
  • FTP
  • Integration Account
  • Key Vault
  • Liquid Operations
  • MQ
  • RosettaNet
  • SAP
  • Service Bus
  • SFTP
  • SMTP
  • SQL Server
  • Workflow Operations
  • XML Operations

 

Custom Connector Export

 

Logic apps containing Custom Connector actions are now able to be exported. The exported workflow action will be converted to an HTTP action, with the endpoint parameterized. This allows the workflows to be exported and adds flexibility for the user to decide how that action will be configured.

 

Workflow Operations Export

 

One of the differences between consumption in standard is related how the workflow operations action works. While in consumption you could refer to any workflow in a subscription, with standard you are restricted to workflows in the same application. This would restrict how workflows with a “call child workflow” action could be exported. The export tool now handles this situation automatically, with the following rules:

 

  • If a child workflow is part of the logic apps workflows selected to export, the tool will export that action as is.
  • If a child workflow is not part of the logic apps workflows selected to export, the tool will export that action as an HTTP action, with the endpoint parameterized, so you can point to a different application.

HLS Teams Productivity Summit- Empowering Frontline Workers on shared mobile devices

Thank you for being a part of the HLS Teams Productivity Summit. We are pleased to provide you with a copy of today's presentation and recording, which you can find below.

Session Title: Empowering frontline workers on shared mobile devices

Session Details: Most frontline workers today use shared devices while at work. This means they need to sign-in and -out of their devices once or multiple times a shift. This can be time consuming, and if not done correctly, a security risk. To simplify authentication for shared device, Microsoft created Shared Device Mode.

 

With Shared Device Mode, a user only needs to sign-in once to be automatically authenticated into all their MSAL enabled applications and sign-out once to make the device safe to give to the next user. During this session, we will discuss timelines for Shared Device Mode and how to roll it out to your frontline workers.

 

Target Audience: IT Professionals who manage a fleet of shared corporate devices.

Extracting Table data from documents into an Excel Spreadsheet

Documents can contain table data. For example, earning reports, purchase order forms, technical and operational manuals, etc., contain critical data in tables. You may need to extract this table data into Excel for various scenarios.

  • Extract each table into a specific worksheet in Excel.
  • Extract the data from all the similar tables and aggregate that data into a single table.

Here, we present two ways to generate Excel from a document's table data:

  1. Azure Function (HTTP Trigger based): This function takes a document and generates an Excel file with the table data in the document.
  2. Apache Spark in Azure Synapse Analytics (in case you need to process large volumes of documents).

The Azure function extracts table data from the document using Form Recognizer's "General Document" model and generates an Excel file with all the extracted tables. The following is the expected behavior:

  • Each table on a page gets extracted and stored to a sheet in the Excel document. The sheet name corresponds to the page number in the document.
  • Sometimes, there are key-value pairs on the page that need to be captured in the table. If you need that feature, leverage the add_key_value_pairs flag in the function.
  • Form Recognizer extracts column and row spans, and we take advantage of this to present the data as it is represented in the actual table.

 

Following are two sample extractions.

Pic3.png Pic4.png

Top excel is with key value pairs added to the table. Bottom one is without the key value pairs.

 

Pic1.png Pic2.png

The Excel shown above is the extraction of table data from an earnings report. The earnings report file had multiple pages with tables, and the fourth page had two tables. 

 
 
 

 

Solution

Azure Function and Synapse Spark Notebook is available here in this GIT Repository 

  • Deployment Steps 
  • Sample Data: The repository has two sample documents to work with:
  • Note on the Excel output: 
    • If there is a page in the main document with no tables, no sheet will be created for that page.
    • The code has been updated to remove the extracted text from check boxes (":selected:", ":unselected:") in the table.
    • If a cell does not have any alphanumeric text, it will be skipped. Please update the code to reflect different behavior.

 

How to leverage this Solution

  • Use this solution to generate an Excel file as mentioned above.
  • Integrate this with Power Automate so that end-users can use this seamlessly from O365 (email, SharePoint, or Teams).
  • Customize this to generate an aggregated table.

 

Contributors: Ben Ufuk Tezcan, Vinod Kurpad, Matt Nelson, Nicolas Uthurriague , Sreedhar Mallangi

Microsoft Purview in the Real World (April 21, 2023) - Sensitivity Labels and SharePoint Sites

James_Havens_1-1682100919511.png

 

Disclaimer

This document is not meant to replace any official documentation, including those found at docs.microsoft.com.  Those documents are continually updated and maintained by Microsoft Corporation.  If there is a discrepancy between this document and what you find in the Compliance User Interface (UI) or inside of a reference in docs.microsoft.com, you should always defer to that official documentation and contact your Microsoft Account team as needed.  Links to the docs.microsoft.com data will be referenced both in the document steps as well as in the appendix.

 

All the following steps should be done with test data, and where possible, testing should be performed in a test environment.  Testing should never be performed against production data.

 

Target Audience

Microsoft customers who want to better understand Microsoft Purview.

 

 

Document Scope

The purpose of this document (and series) is to provide insights into various user cases, announcements, customer driven questions, etc.

 

Topics for this blog entry

Here are the topics covered in this issue of the blog:

  • Sensitivity Labels relating to SharePoint Lists
  • Sensitivity Label Encryption versus other types of Microsoft tenant encryption
  • How Sensitivity Labels conflicts are resolved
  • How to apply Sensitivity Labels to existing SharePoint Sites
  • Where can I find information on how Sensitivity Labels are applied to data within a SharePoint site (i.e. File label inheritance from the Site label)

 

Out-of-Scope

This blog series and entry is only meant to provide information, but for your specific use cases or needs, it is recommended that you contact your Microsoft Account Team to find other possible solutions to your needs.

 

Sensitivity labels and SharePoint Sites – Assorted topics

 

Encryption Sensitivity Label Encryption versus other types of Microsoft tenant encryption

 

 

Question #1

How does the encryption of Sensitivity Labels compare to encryption in leveraged in BitLocker?

 

Answer #1

The following table breaks this down in detail and is taken from the following Microsoft Link.

Encryption in Microsoft 365 - Microsoft Purview (compliance) | Microsoft Learn

 

James_Havens_0-1682101199234.png

 

Sensitivity Labels relating to SharePoint Lists

 

 

Question #2

Can you apply Sensitivity Labels to SharePoint Lists?

 

Answer #2

The simple answer is NO while in the list, but YES once the list is exported to a file format.

 

Data in the SharePoint List is stored within a SQL table in SharePoint.  At the time of the writing of this blog, you cannot apply a Sensitivity Label to a SharePoint Online tables, including SharePoint Lists.

 

SharePoint Lists allow for exports of the data in the list to a file format.  An automatic sensitivity label policy can apply a label to those file formats. Here is an (example below of those export options.

 

James_Havens_1-1682101270872.png

 

 

How to apply Sensitivity Labels to existing SharePoint Sites

 

Question #3

Can you apply Sensitivity Labels to existing SHPT sites?  If so, is this, can this be automated (ex. PowerShell)

 

Answer #3

You can leverage PowerShell to apply SharePoint labels to multiple sites.  Here is the link that explains how to accomplish this.

Look for these two sections in the link below for details:

  • Use PowerShell to apply a sensitivity label to multiple sites
  • View and manage sensitivity labels in the SharePoint admin center

 

 

Use sensitivity labels with Microsoft Teams, Microsoft 365 Groups, and SharePoint sites - Microsoft Purview (compliance) | Microsoft Learn

 

How Sensitivity Labels conflicts are resolved

 

Question #4

If you have an existing file with an existing Sensitivity Label that is stricter than the Sensitivity Label being inherited from SharePoint Site label, which Sensitivity Label is applied to the file? 

 

Answer #4

Please refer to the link and table below for how Sensitivity Label conflicts are handled.  Notice that any Higher priority label or user applied label, would not be overridden by a site label or an automatic labeling policy.

 

Configure a default sensitivity label for a SharePoint document library - Microsoft Purview (compliance) | Microsoft Learn

 

James_Havens_2-1682101300207.png

 

File label inheritance from the Site label

 

Question #5

Where can you find the documentation on SharePoint Site labels and how label inheritance applies to files in that SharePoint site?

 

Answer #5

 

Here are 2 links that can help you with Sensitivity Labels and how they relate to SharePoint sites:

 

 

 

 

When it comes to default Sensitivity Labels for SharePoint sites/libraries (what I have called “label inheritance” above, this link is of use.

 

 

"When SharePoint is enabled for sensitivity labels, you can configure a default label for document libraries. Then, any new files uploaded to that library, or existing files edited in the library will have that label applied if they don't already have a sensitivity label, or they have a sensitivity label but with lower priority.

 

For example, you configure the Confidential label as the default sensitivity label for a document library. A user who has General as their policy default label saves a new file in that library. SharePoint will label this file as Confidential because of that label's higher priority."

 

 

Appendix and Links

 

 

 

 

 

 

 

 

 

 

 

Synapse Database Templates for airlines & travel services plus seven industries are now GA

 

In response to continued enthusiastic adoption of the twenty previously published Synapse Database Templates (SDTs), we’re pleased to announce today that we are releasing two Industry Data Models (IDMs), for Airlines and for Travel Services, that have not previously been published as SDTs, along with enhanced versions of seven previously published SDTs.

 

The IDM for Airlines is a comprehensive data model that addresses the typical data requirements of organizations operating one or more airlines for passengers and/or cargo. The IDM for Travel Services is a comprehensive data model that addresses the typical data requirements of organizations providing booking services and/or hospitality services for airlines, hotels, car rentals, cruises, and vacation packages.

 

We have released two new SDTs for Travel Services and Airlines, in addition to updated versions of the previously released SDTs for Automotive Industries, Consumer Packaged Goods, Healthcare Insurance, Healthcare Service Providers, Manufacturing, Retail, and Utilities. All twenty-two SDTs can now be accessed in Azure Synapse, either through the Gallery or by creating a new lake database from the Data tab and selecting '+ Table', and then 'From template'.

 

We have also continued to expand the scope and content of the previously published seven SDTs by releasing new versions. Additionally, we are working to ensure that our customers and partners who use selected Microsoft solution offerings can fully integrate their application data into relevant subsets of a data lake created using the new versions of SDTs.

 

The new versions for Healthcare Insurance and Healthcare Providers have been fully mapped from Microsoft’s Industry Cloud for Healthcare to ensure customers can land all their data from our Healthcare Cloud into the relevant subset of a data lake published using the applicable SDT to make it easier for our customers to land data from Microsoft’s Healthcare Cloud, along with data from their organization’s many other applications and data sources, into a comprehensive integrated and harmonized lake database deployed using the SDTs. We have also expanded the IDMs for healthcare to enable customers to provision the specific clinical data content required to support OMOP consumption in the gold layer of their Azure data lake if that is of interest.  

 

We’ve expanded the IDM for Retail to accommodate smart store and shopper journey data generated by solutions from key Microsoft smart store partners such as AiFI in support of current and anticipated future smart store analytics use cases. Similarly, the expanded IDM for Utilities provides full data support for data sourced from Microsoft’s 24x7 Sustainability offering. We’ve also expanded the IDMs for Automotive, Consumer Packaged Goods, and Manufacturing to provide full alignment with the recently announced Microsoft Supply Chain Center solution offering. Each of the SDTs that is part of this latest release also benefits from enhanced and expanded support for readings and data streams acquired from leading edge IoT devices and sensors used in each of those industries.

 

SDTs contain coverage of many different “business areas”, some very industry-specific and others cross-industry, that together comprise each of these very large IDMs. For example, in addition to comprehensive industry-specific business areas (such as Reservations, Ticketing, and Cargo and Departure Control Services for the Airline industry), most SDTs also include cross-industry business areas such as Accounting & Financial Reporting, Human Resources, Inventory, and an Emissions business area which provides support for data used to report greenhouse gas emissions (including scope 1, scope 2, and scope 3 emissions), combining to provide unparalleled coverage of the data typically found in the integrated data estates of large organizations in specific industries, thereby providing comprehensive best practices based accelerators as our customers continue to move their enterprise data estates to the cloud.

 

DataModels_0-1682028606569.png

 

 

 

To learn more, check out the following:

 

Tags: Database templates     Industry    Lake databases

Securely Migrate and Optimize with Azure

Join us for this year’s digital event, Securely Migrate and Optimize with Azure, where you’ll learn how to achieve more efficiency and maximize the value of your IT investments. Get tips to migrate and modernize your infrastructure, apps, and data—with a spotlight from experts on optimizing migration for Windows Server and SQL Server workloads.

 

Here’s 5 good reasons to Register Now

 

  1. Get expert guidance to securely migrate and optimize your Windows Server and SQL Server workloads.
  2. Find tools, programs, and resources from Azure to migrate and optimize your cloud investments.
  3. Hear real-life success stories of customers who moved to Azure.
  4. See demos with step-by-step guidance on how to stay secure and manage complex hybrid IT environments.
  5. Get a walkthrough of tools for self-guided migration—including how to discover, assess, and migrate with Azure Migrate.

You’ll also get expert answers to your migration questions during the live chat Q&A.

 

Securely Migrate and Optimize with Azure

Wednesday, April 26, 2023

9:00 AM–11:00 AM Pacific Time (UTC-8)

Unable to delete Azure EventHub Cluster/Namespace/Entity/ConsumerGroup from Portal/ PowerShell/ CLI

Issue:

Unable to delete Azure EventHub Cluster/Namespace/ Entity from Portal/ PowerShell/ CLI.

 

Case 1:

When EventHub tier is Premium and you are unable to delete Azure EventHub namespace and getting conflict operation error 409.

anuja_nirula_0-1682046002240.png

Sample error message : "statusMessage":"{\"error\":{\"code\":\"Conflict\",\"message\":\"Namespace provisioning in transition. For more information visit https://aka.ms/eventhubsarmexceptions.

 

Reason:

The reason for stuck state of EventHub namespace or its provisioning failure is due to a known race condition which initiates between two or more internal microservices from premium EventHub  architecture from different namespaces. If we trigger a Premium namespace provisioning and an event hub creation right after that then this race condition initiates and both provisioning may fail and the runtime creation will fail with internal server error 500.

 

Recommendation:

It is recommended to not perform back to back create operation on EventHub until the first EventHub namespace is not created successfully and if you want to delete the namespace just after its creation then it is recommended that the operation should be performed after 1 hour of its created time.

 

Action to be taken:

In such situation where your EventHub is stuck in activating/ Deleting state, raise support request with Microsoft to fix the state of namespace and to bring it in active state.

 

Case 2:

While deleting EventHub you have received a conflict error 409  but it's not a Premium EventHub.

 

Reason:

The reason for this conflict error could be due to any pending operations executing in the backend associated with the EventHub components and you might be trying to delete the EventHub while its execution is not completed.

 

Recommendation and Action to be taken:

In such situation, wait for some time to finish the pending operations on EventHub or its component and then retry after sometime.

 

Case 3:

Getting successful message on deletion of EventHub entity within a namespace but after sometime it is recreating and reappearing on portal.

 

Reason:

The recreation of entities in namespace could be due to any diagnostics settings enabled against the namespace entity or the Application insights might be using the EventHub entity and so Azure Monitor resource provider could be recreating the EventHub.

 

Action to be taken:

In such case, Please follow the below steps:

  1. Check if the entity is operational using PowerShell/ CLI. You may test using any Get command, example - Get-AzEventhub
  2. If the EventHub is recreated, check the content of EventHub. You can check the content either by Processing Data option on EventHub control pane on portal or by using Service Bus explorer tool.
  3. Once you see any content or record in EventHub entity, then identify the resource ID which is sending the events to that EventHub by looking at content data.
  4. Go to that resource from Azure portal and disable the diagnostic settings or application Insight settings which is using the EH entity.

Recommendation:

If you want to delete the EventHub entity or namespace, you should check whether none of the resource from this document are streaming logs to EventHub.

 

Case 4:

When you have deleted the EventHub and all operations on the deleted EventHub entity are failing but it is still showing on Portal

 

Reason:

The reason why we are still seeing a non-operational and deleted EventHub could be due to its stale entry in ARM cache

 

Action to be taken:

In such case, Please follow the below steps:

  1. Check if the entity is operational using PowerShell/ CLI. You may test using any Get command, example - Get-AzEventhub
  2. If the operation is failing with error code 404 i.e. entity not found, but it is still showing on portal then raise a support ticket with Microsoft to perform the synchronize operation on ARM cache of resource.

Case 5:

When you are unable to delete Kafka enabled EventHub topic.

 

Reason:

One of the reason why you are still seeing the Kafka enabled EventHub topic after its deletion could be because the Kafka producer keeps re-creating the EventHub due to

Auto topic creation is ON by default.

 

Action to be taken:

  1. Check the activity logs to make sure that you see the deleted operation.
  2. Set the Auto topic creation property as OFF.

anuja_nirula_2-1682046329912.png

Recommendation:

You can either stop the Kafka producers or pick another EventHub name.

 

Case 6:

Unable to delete Dedicated Event hub cluster and getting error message "BadRequest"

 

Reason:

It is known issue that a dedicated EventHub Cluster cannot be deleted until four hours after its creation time.

 

Recommendation and Action to be taken:

Please rerun this operation after that time has elapsed, or contact EventHub team through a support request if the cluster is stuck in a particular state.

Details to be included in support ticket should be resource ID, correlation ID of operation and timestamp of issue.

Earth Day 2023: Evolving Surface design

As we observe and honor Earth Day on April 22, we're mindful of the importance of meeting our customers’ needs alongside responsible environmental stewardship. And here on the Surface team, we’re committed to producing devices with as little impact on the planet as possible. 

 

Our commitment to sustainability goes back years and has evolved into three pillars that help us contextualize our goals for Surface: reducing carbon impact, designing with circularity in mind, and having integrity built in. These elements make up the design language of every major product we launch.

 

Reducing carbon impact

Microsoft Surface devices are integral to achieving the company's commitment to be carbon negative by 2030. We're also delivering technology to help our customers measure and manage their Surface carbon emissions more effectively.

 

Launched earlier this year, the Surface Emissions Estimator is a tool that helps you calculate the carbon footprint of your Surface devices1. It provides an estimate of the carbon emissions associated with the production, use, and disposal of your device. You can use this tool to calculate the carbon footprint of your Surface devices by entering information about your device, such as its model, usage, and power settings. The calculator can even recommend ways of reducing your carbon footprint.

 

Surface Emissions Estimator.png

Sample results showing estimated carbon emissions for three devices

 

Ocean plastic

One of the more promising advances in device manufacturing is the use of ocean-bound plastic, recovered from plastic waste. First, it’s cleaned and processed into recycled plastic resin pellets and then blended in with virgin plastic during manufacturing.

 

Two years ago, we launched the Ocean Plastic Mouse with a shell made with 20% recycled ocean plastic, the first consumer electronics application of this material. Going beyond ocean-bound plastic (plastic collected within 50 km of shorelines), each mouse contains recycled resin derived from recycled water bottles taken directly from oceans, beaches, and waterways.

 

Recycling-Flow-Ocean-Plastic-Mouse.png

We’ve since carried this innovation to our newest accessory, Surface Thunderbolt 4 Dock for Business. The dock and power supply unit enclosures (excluding the AC cable) are attributed to 20% ocean-bound plastic2 and feature lighter materials than our previous docks. Single-use plastics have been removed from its packaging, making the packaging about 99% recyclable in OECD countries.3

 

 

surface-dock-rear.png

 

Design for circularity

The traditional “take, make and waste” model of electronics is becoming unviable. That's why, at Surface, we design products with the circular economy in mind, meaning we follow a reduce, reuse and recover model.

 

By 2025, our goal is for our packaging to contain zero single-use plastics and by 2030 will be 100% recyclable. We continue to integrate innovations from our most recyclable products into the rest of our products. We also make recycling convenient and secure with global recycling programs and data-wiping.

 

Designing for circularity minimizes waste and extends the lifespan of our devices for as long as possible, thanks to a modular design that lets commercial customers replace parts rather than throw away their devices. Surface Pro 9, for example, comes with 14 modular components, including the display, hard drive, motherboard, and battery. 4

 

Integrity in manufacturing

Our design process focuses on building products of the highest craftsmanship with a responsible supply chain that meets higher ethical and environmental standards. Integrity also reflects our commitment to transparency on the impact of our products and supply chain, which is why we produce eco profiles for all our major devices. As the EPEAT requirements become more rigorous, our products and operations are evolving to meet more stringent standards. We plan for our products to meet the new EPEAT requirements at the Gold level.Surface registered products can be found on the EPEAT Registry.

 

Crafting for longevity is vital to long-term sustainability across all three focus areas, as it can reduce emissions and increase circularity by keeping materials in use for longer. It's why our latest Surface products are the most repairable devices in their product lines.

 

This is also where our material innovation can shine as we weave in recycled materials. You'll see it in our packaging, made of sustainably forested material that's 99% recyclable6 for Surface Laptop 5. We're also excited for our latest products to continue to bring hardware and software together to optimize energy performance. All Surface laptop and tablet devices are ENERGY STAR certified with a focus on energy efficiency and battery life. And our Surface Laptop 5 and Pro 9 devices are over twice as energy efficient as the Energy Star recommended limits. They can all also take advantage of new sustainability features in Windows 11, the first PC operating system to offer a carbon-aware feature.7  

 

Ready for a new device?

There are multiple ways to responsibly recycle your device or give it new life.

  • Trade it in: The Microsoft Store Trade-In Program8 offers cash back for certain used devices suitable for refurbishment or reuse. See aka.ms/tradein.
  • Sell or donate: Consider selling or donating your used device to an authorized refurbisher to give it a potential second life for a new user. See aka.ms/refurbishers.
  • Recycle: Microsoft and other device manufacturers offer free mail-back recycling programs for used devices. See aka.ms/recycle.

 

Learn more

 

References

1. Emissions Estimator report provided for informational purposes only. You should not interpret the report you receive to be a commitment on the part of Microsoft; actual emissions may vary based on your location, purchase method, usage, and other factors.

2. Ocean-bound plastic is plastic waste recovered from oceans and waterways, cleaned, and processed into recycled plastic resin pellets. These recycled pellets are blended in with virgin plastic during the manufacturing process. To learn more, see Sustainable Products & Solutions | Microsoft CSR.

3. In OECD countries, Microsoft operates recycling programs either independently or through third parties covering Microsoft Devices. In addition, check local recycling programs for availability.

4. Customer Replaceable Units (CRUs) are components available for purchase through your Surface Commercial Authorized Device Reseller. Components can be replaced on-site by a skilled technician following Microsoft’s Service Guide. Opening and/or repairing your device can present electric shock, fire and personal injury risks and other hazards. Use caution if undertaking do-it-yourself repairs. Device damage caused during repair will not be covered under Microsoft’s Hardware Warranty or protection plans. Components will be available shortly after initial launch; timing of availability varies by component and market.

5. EPEAT rating availability may differ by market.

6. Recyclability dependent on recycling options in markets where products are discarded.  Check local recycling programs for availability. Learn more at aka.ms/recycle

7. See Windows Update is now carbon aware

8. Available in select countries only.

Skilling snack: Using Windows Update for Business

Simplify your update management experience with Windows Update for Business. It's an umbrella term for multiple products and services that we're unpacking in a series of skilling snacks. Today, let's learn about what it is, how to configure it, and how to use it with Intune or other familiar tools. Take it one bite at a time or allow a learning module to take you through the four-course meal. Leave some room for policies at the end!

timer-icon.png Time to learn: 87 minutes

watch icon.pngWATCH

Managing Windows updates in the cloud (The Blueprint Files)

Review the different ways to manage Windows updates in the cloud, how they work, and what's the difference. Find management tools you need and compare them with on-premises management with this easy-to-follow demo.

(24 mins)

WU + WSUS + Group Policy + Intune + App Compat + Safeguard Holds + Driver

 

read-icon.pngREAD

Configure Windows Update for Business

Learn how to configure Windows Update for Business. Follow this guide to group devices, configure them for the appropriate service channel, schedule or pause feature and quality updates, enable features that are off by default, and more.

(14 mins)

Windows 11 + Windows 10 + Windows Server + Windows Insider Preview + GA + MDM + GPO + ConfigMgr + WUfB

     

read-icon.pngREAD

Learn about using Windows Update for Business in Microsoft Intune

Already using Intune? Read how you can easily use Windows Update for Business with your familiar tools. Review Intune policy types to manage updates and follow guidance to switch from update ring deferrals to feature updates policy.

(4 mins)

Windows 11 + Windows 10 + Intune + Policy + Rings + Quality Updates + Feature Updates + WUfB

 

read icon.pngREAD

Integrate Windows Update for Business

Learn how to integrate Windows Update for Business with other management solutions. Specifically, find guidance and examples to integrate with Windows Server Update Services and Microsoft Configuration Manager.

(2 mins)

WSUS + ConfigMgr + Deferral + Office + Drivers + Windows 11 + Windows 10 + WUfB

     

read-icon.pngREAD

Enforce compliance deadlines with policies in Windows Update for Business

Need to set compliance deadlines? Learn how to do so from our official documentation. Configure deadlines, grace period, and reboot behavior for feature and quality updates with a policy made just for that.

(2 mins)

Compliance + Deadlines + Policies + GPO + Windows 11 + Windows 10 + WUfB

 

read icon.pngREAD

Update clients using Windows Update for Business

Leverage Windows Update for Business to control the distribution and methods for Windows Update delivery. Walk step by step with this 800-XP learning module to configure Windows Update for Business, enable Delivery Optimization, and to evaluate different scenarios.

(17 mins)

Maintenance + Deployment + Peer-to-Peer + ConfigMgr + Intune + Active Directory Domain Services + GPO + WUfB

     

watch-icon.pngWATCH

What is a policy? And why shouldn't I set registry keys?

When should you use policies, reg keys, or the graph API? Watch this video to learn about the differences and practical tips for managing Windows updates with Windows Update for Business.

(8 mins)

Policies + Reg Keys + Graph API + Intune + GPO + CSP + WUfB

 

read-icon.pngREAD

The Windows Update policies you should set and why

Your policy needs and practices may be different based on your industry and the types of devices you manage. Find policy names, paths, description, and setting recommendations for your context.

(16 mins)

CSP + GP + Personal Devices + Multi-User + Education + Kiosks + Billboards + Factory Machines + Teams Rooms + WUfB


If that got your appetite going for Windows Update for Business, I'll spill the beans! In the upcoming weeks, we'll package snacks on more control over approval, scheduling, safeguarding of updates (the deployment service), and getting information on compliance (reports).

For more general information on update management, revisit our Skilling snack: Windows monthly updates and Skilling snack: Windows feature update management.

As always, welcome to the table!


Continue the conversation. Find best practices. Bookmark the Windows Tech Community and follow us @MSWindowsITPro on Twitter. Looking for support? Visit Windows on Microsoft Q&A.

4 Essential Conditional Formatting Tips to Make Your Data Pop

Did you know these conditional formatting tips? In this article, you'll discover methods for highlighting patterns in your data, emphasizing insights or trends with the use of colors, icons or formulas. These 4 tips cover common scenarios at home or in small businesses, like managing expenses, home renovation and more.

Highlight the highest expenses in scenarios like personal budgetingHighlight the highest expenses in scenarios like personal budgeting

Create icons based on data values in scenarios like product ratingsCreate icons based on data values in scenarios like product ratings

Highlight rows using formulas in scenarios like home renovationHighlight rows using formulas in scenarios like home renovation

 

Getting Started

To get started, click on the template.

  1. Go to File › Save As › Download a Copy
  2. Go to OneDrive
  3. Select Upload › Files
  4. Select the Excel file you downloaded in step 1.
  5. Once the upload is complete, you can open the file to use it.

 

Sharing your file with family for free

You can easily collaborate with family and friends using the Share button within Excel on the top right corner of the app. Learn more about sharing here.

 

ShareShare

 

More Personal Templates and Examples

If you liked this, check out more templates for your personal life like Retirement and FIRE Estimators.

Retirement and FIRE EstimatorsRetirement and FIRE Estimators

 

Also see Mr Excel, Bill Jelen’s, post on Unusual Uses of Excel to learn examples of ways people are using Excel to do more in their lives.

Unusual Uses of ExcelUnusual Uses of Excel

 

Comments and Suggestions

Liked the template and want more like this? Or have comments and suggestions? You can leave comments below this blog post.

Or in Excel, go to Help>Feedback. Choose an option from the menu. Copy and paste this “#highlight" as you type a response. 

Happy Highlighting!

Translate documents using Azure Document Translation

Translate documents using Azure Document Translation

Azure Document Translation service enables the translation of multiple and complex documents across all supported languages and dialects while preserving the original document structure and data format. However, it currently does not support the translation of text from images in digital documents. To address this, there are two options:

  • Convert the digital document into a scanned document in its entirety.
  • Split the document into two files and process them separately:
    • One file containing all pages that only have text. The digital pages are preserved in their original form. The Document Translation service takes advantage of the structure/layout information from the digital text page and translates it more accurately than a scanned page.
    • The second file will contain a scanned version of all the pages that contain images.
    • Note: As we have processed these files by parsing the original document page by page, we have the required information to stitch the translated documents into a single translated document. This codebase does not have that code. We will add that code in the near future. If you have the time to help, please let us know.

Dealing with various File Types

You will need to handle different file types as follows:

  • PDF
    • Scanned PDF:You do not need to do anything. Translator service translates all the text from the scanned pages.
    • Digital PDF: As mentioned earlier, the Translator service does not translate text from images, but it can translate the remaining text.
    • This solution analyzes each page and creates multiple files based on the content on each page (text only page, text plus image page, image only page, etc.). You can configure this behavior through various parameters. The following files will be created:
      • A copy of the original file as is.
      • A scanned version of the whole document.
      • Two other files: One containing all pages that only have text, and a second file with a scanned version of all the pages that contain image(s).
      • Note: The Azure Document Translator has a limit of 40MB per document. The scanned version of the file is typically larger. Therefore, we split the scanned document into multiple files based on the number of pages (configurable).
  • Image Files(BMP,PNG,JPG)
    • We need to convert these files into PDF, and this solution takes care of it.
  • Office files (Word, Powerpoint, Excel)
    • The Translator service handles these files in the same way as PDF, meaning it does not translate text from images but translates the remaining text. Therefore, we need to convert/process these files in the same way as PDF. One approach is to convert office files into PDF and leverage the solution we have for PDF.
    • There are several open-source Python packages available to convert office documents, but some of them require Microsoft Office to be installed on the machine where the code runs. You can explore the following options.

Solution Approach

We took the following approach and all the code is shared in this repository. 

 

smallangi_1-1681586940372.png

 

 

Both the functions get triggered using an event subscription from the storage container.

 

Document Convertor Function

We have two options here. One is based on PyMuPDF python package and another is based on pdf2image python package.

  • pdf2image function requires docker as we need to install poppler utils which is not available as Python package. This function only supports converting the PDF document to scanned version
  • PyMuPDF based function has extensive functionality and configurable options.Following are the configurations
    • "translatordocs_storage: "storage connecttion string for input and converted data",
    • "pdf_conversion":"all OR scanned OR original or hybrid",
      • scanned : Create scanned version of the document. Name of the scanned file will have "--scanned" appended to original file.
      • original: copy the original the document
      • hybrid:
        • For pages with text only data, create a file with the digital copy of the pages. Name of the text pages file will have "--textonlyPage" appended to original file. Note that if the document has no pages with only text, this file will not be created.
        • For pages with images or images and text, create a file with the scanned versio copy of the pages. Name of the image pages file will have "--ImagePages" appended to original file. Note that if the document has no pages with images, this file will not be created.
      • all: Create all of the above.
    • "pdf_page_limit":"no of pages in one file for scanned version"
      • There is a limit of 40MB per file for Azure Document Translator. adjust this page number based on the type of PDF files you are dealing with
    • "output_container_name":"Output Container for converted documents"

Note

  • Hybrid is the best option. This is because Translator can translate digital pages better than images as it can get the proper word block and full layout information.
  • As we are processing the original file page by page, we can create a mapping file that defines mapping of original document page to converted documen's pages(imagepages and textonlypages documents). we can use this mapping to stitch these documents and generate final translated document. We did not get to this. Hopefully we will get some time soon to take care of this.

Document Translator Function

This is a simple function that takes a document and submits to translator service. Translator service stores the translated document to specified container. Following are the configurations

  • "converteddocs_storage": "storage connecttion string for input documents",,
  • "translator_endpoint":"translator service endpoint",
  • "translator_key":"translator key",
  • "target_blob_url":"target blob url for translated output"

Note

  • Currently we have hardcoded the target language. You can make it configurable option.
  • We are also leveraging auto language detection by Translator service to identify source language. You get better accuracy if you can specify source language.  Details here
  • We generate multiple files when we are dealing with large documents as mentioned above. So you need to merge them into single file.

Solution

All the code is available here in this GIT Repository

Deployment

To deploy this solution :

  1. Create Azure Translation Service.
  2. Create Storage account to store original, converted and translated documents
  3. Create Azure functions based on the source code from this repository. Refer to this article to create functions that leverage storage container's event subscription

 

Contributors

Brandon Rohrer, Krishna Doss Mohan, Narasimhan Kidambi, Nicolas Uthurriague and Sreedhar Mallangi

イメージのランダム化を強制する(必須ASLR) に「既定でオンにする」を設定した場合、Express Edition の日本語版のインストーラの起動に失敗します

こんにちは。日本マイクロソフト SQL Server サポートチームです。

 

事象:

イメージのランダム化を強制する(必須ASLR) に「既定でオンにする」を設定した場合、SQL Server 2016 ならびにSQL Server 2019 Express Edition の日本語版のインストーラ(SQLEXPR_x64_JPN.exe) の起動に失敗します。

なお、イメージのランダム化を強制する(必須ASLR) の規定値は「既定でオフにする」であり、既定値では発生しません。

また、SQL Server 2014、SQL Server 2017、SQL Server 2022 のExpress Edition の日本語版のインストーラー(SQLEXPR_x64_JPN.exe) では発生しません。
※ 公開時点

 

イメージのランダム化を強制する(必須ASLR)の設定手順は以下となります。

 

スタートメニューより[設定] -> [更新とセキュリティ] -> [Windows セキュリティ] -> [アプリとブラウザーの制御] -> [Exploit Protection の設定] -> [システム設定] -> [イメージのランダム化を強制する (必須ASLR)]

 

 図.Windows セキュリティ画面(ご参考)

 

 Masafumi_Hori_2-1681952427058.png

 

原因:

SQL Server のセットアップは、イメージのランダム化を強制する(必須ASLR) を想定しておらず、SQL Server 2016 ならびにSQL Server 2019 Express Edition の日本語版のインストーラ(SQLEXPR_x64_JPN.exe) では、対応が行われてません。

イメージのランダム化を強制する(必須ASLR) に「既定でオンにする」を設定した環境で、インストーラ(SQLEXPR_x64_JPN.exe)  の起動に失敗する以外の問題はありません。

 

回避策:

恐れ入りますが、SQL Server 2016 ならびにSQL Server 2019 のExpress Edition の日本語版のインストーラ(SQLEXPR_x64_JPN.exe) を変更し、再リリースする予定はありません。

そのため、イメージのランダム化を強制する(必須ASLR) に「既定でオンにする」を設定している環境では、次のように一時的に「既定でオフにする」に変更して展開し、SQL Server をインストールします。

 

 1) イメージのランダム化を強制する(必須ASLR)を、「既定でオフにする」に変更します。

 2) SQL Server 2016 またはSQL Server 2019 のExpress Edition の日本語版のインストーラー(SQLEXPR_x64_JPN.exe) を起動してファイル群を展開します。

 3) イメージのランダム化を強制する(必須ASLR)を、「既定でオンにする」に変更します。

 4) SQL Server のインストールを進めます。

 

なお、SQL Server Express を再配布されている場合、現時点でパッケージを展開して再配布することは、モジュールの変更とはみなされないことを確認してます。

Upcoming April 2023 Microsoft 365 Champion Community Call

Blog banner small.png

Join our next community call on April 25 to dive into the Microsoft 365 Experience Insights dashboard. We will be starting the call at 5 minutes past the hour (at 8:05 AM and 5:05 PM PT, respectively), and it will still end at 9:00 AM and 6:00 PM PT, respectively.

 

If you have not yet joined our Champion community, sign up here to get access to the calendar invites, program assets, and previous call recordings.

We look forward to seeing you there!

ADF private DNS zone overrides ARM DNS resolution causing ‘Not found’ error.

##Steps to Migrate:

  1.             Navigate to existing Private DNS zone privatelink.adf.azure.com
  2.             Go to portal.azure.com
  3.             Type ‘private DNS zones’ on the search bar and click on the option
  4.             Click on the privatelink.adf.azure.com private zone

 

Sachin215_0-1681851013038.png

 

  1.             Get the private IP of the existing private endpoint and delete the private zone
  2.             In the overview blade (default) you’ll see a table with the DNS records
  3.             Look for the one with the name ‘adf’ with Type ‘A’ and write down the IP under ‘Value’ for the next steps.
  4.             Click on ‘Virtual network links on the left panel, write down all the Virtual Networks for the next steps and then delete all the virtual network links
  5.             Go back go ‘overview’ and delete a private zone

 

 

Sachin215_2-1681851013056.png

 

 

 

 

  1.             Create a new Private DNS zone with the name ‘privatelink.adf.azure.com’
  2.             In the main Private DNS zones page, click on ‘add’ on the toolbar
  3.             Select the subscription and resource group and add ‘privatelink.adf.azure.com’ as the name

 
 

 

 

  1.             Add Virtual network links and DNS ‘A’ record
  2.             In the privatelink.adf.azure.com private zone click on ‘Virtual network links’ on the left panel and then add a network link for each of the virtual networks from step 2c

 


 
 

 

 

 

 

 

  1.             Add DNS ‘A’ record
  •              Go back to the overview panel and click on ‘+ Record set’, type ‘adf’ as the name, TTL: 10, TTL unit: seconds, and type the IP form step 2b

 


 
 

 

 

Azure Database for PostgreSQL : Logical Replication

With Azure Database for PostgreSQL Flexible Server (Shortened to Flex for this article), customers now have the option of using the standard PostgreSQL Native Logical Replication feature to implement data replication solutions. These solutions can be in either direction, for example, on-prem to Azure or vice versa. There are many scenarios that can take advantage of this feature, for example:

 

  • One-time data migration from practically anywhere to Azure
  • Ongoing Change Data Capture
  • Replication between two separate Flex instances in different Azure regions for reporting, analytics or even disaster recovery purposes
  • Customer managed multi-master replication between separate Flex instances or even on-prem to azure or vice-versa. Please note that data conflicts can happen, and you will need to manage these – be sure to read the docs (link below)

 

To get the most from this article it is probably a pre-requisite for you to have an intermediate to advanced understanding of PostgreSQL and knowledge of networking and security concepts. Here is a high-level logical diagram showing the PostgreSQL Logical Replication components:

 

bmckerrMSFT_0-1681852999538.png

 

One or more ‘Publications’ can be setup on a source database that wishes to be a publisher for tables. These can then be consumed by a subscriber database, by configuring a ‘Subscription’.

 

The official documentation link PostgreSQL Native Logical Replication provides details on the following;

  • Architecture
  • Configuration Settings
  • Publications and Subscriptions
  • Replication Slot Management
  • Conflicts
  • Restrictions
  • Monitoring
  • Security

 

This article will hopefully provide you with enough information on the core concepts involved in logical replication and should then allow you to adapt those learnings to your own environment, where you may already have the source database and existing tables with data. In this article I will focus on using logical replication to setup a selective one-way replication from an on-prem instance of a PostgreSQL database to a Flex instance on the Azure cloud in line with the diagram below:

 

bmckerrMSFT_1-1681852999544.png

 

 

Our Flex instance has been configured to use the Public Access networking option. We won’t cover the option of Private Access, however the same principles would apply to the database configuration to use PostgreSQL Logical Replication and all that would change would be the underlying networking configuration.

 

We are going to set this up so that all 4 tables in public schema of the the On-Premises server are replicated to the Flex instance but into the postgres database into its public schema. The source tables to be replicated are:

 

  • public.pgbench_accounts
  • public.pgbench_branches
  • public.pgbench_history
  • public.pgbench_tellers

You might notice these table names if you have ever used the standard PostgreSQL benchmarking utility pgbench. I am using that tool here to simplify the creation of tables and data.

 

 

At a high level these are the configuration steps necessary to tie this all together to produce a working example. Configuration of:

 

  • Required source database settings
  • Firewall – allow access to and from both source and target databases
  • Schema objects and data
  • Publications
  • Subscriptions

 

Let’s get started….

 

PostgreSQL Settings

For logical replication to work, the source PostgreSQL instance must have the parameter ‘wal_level’ set to ‘logical’. You can check this by issuing the following command in psql or pgadmin:

 

show wal_level;

 

If it responds with anything other than ‘logical’ you will need to make that change in your PostgreSQL instance’s configuration file, typically postgresql.conf. On my Debian source system that file is located in /etc/postgresql/14/main directory. Other items to check when you are there are:

 

  • listen_addresses (defaults to ‘localhost’ but you will need to change that and update pg_hba.conf to allow connections from Azure)
  • port (defaults to 5432, but on hosts where you have many PostgreSQL clusters your specific database might be listening on a non-default port)

After making any necessary changes, you will need to restart your PostgreSQL cluster, on Debian I use the helper script like so:

  • pg_ctlcluster 14 main restart

Once you have confirmed that your wal_level is now set to ‘logical’ you can proceed to the next step.

 

Firewall Settings

As mentioned above, I’m using public addressable networking for my Flex instance and not Private Networking with VNET. With either access method, you’ll need to allow traffic to flow in both directions between your databases:

 

  • Azure to On-prem
  • On-prem to Azure

There are certain times when configuring or modifying logical replication that the target database will initiate a connect to the source. You’ll have to ensure that your ‘on-premises’ PostgreSQL database is contactable from your Flex instance and vice-verssa when you complete the logical replication setup. Typically, this will involve setup of NAT and a firewall rule, do note that your on-prem PostgreSQL port number (default 5432) will need to be known to set this up (see the section below on logical replication ‘subscriptions’). On my on-prem server.

 

Setup of Database Objects & Data

As I mentioned earlier, I’m using pgbench to create both schema objects and data for this article to explain the whole process. If you already have your source tables with data, you can skip this step and simply create the same objects in the target database by either using pg_dump and pg_restore or simply running the DDL on the target database. I have pgbench installed on my on-prem database server. The steps to create the pgbench schema objects, which contains 4 tables, are straightforward for both source and destination databases.

 

On the source I ran this command:

 

postgres@deb:~$ pgbench -h 127.0.0.1 -s 10 -i -I dtpg -p 5436 -d mydb -U postgres

 

This tells pgbench to connect to local postgresql listening on port 5436, specifically the database ‘mydb’ and drop tables, create tables with primary keys and create data. Output should be similar to this:

 

dropping old tables...

creating tables...

creating primary keys...

generating data (client-side)...

1000000 of 1000000 tuples (100%) done (elapsed 0.03 s, remaining 0.00 s)

done in 0.29 s (drop tables 0.01 s, create tables 0.00 s, primary keys 0.01 s, client-side generate 0.27 s).

 

Moving on to my Azure Flex instance, it already allows access from my on-prem IP address as I added a rule for that:

 

bmckerrMSFT_2-1681852999547.png

 

This allows me to run this command from my on-prem server:

 

postgres@deb:~$ pgbench -h aue-flex13.postgres.database.azure.com -i -I dtp -p 5432 -d postgres -U dbadmin

 

This tells pgbench to connect to my flex instance, specifically the database ‘postgres’ and drop tables, create tables and primary keys but, importantly, not create any data for the tables. Output should be like this:

 

dropping old tables...

creating tables...

creating primary keys...

done in 0.20 s (drop tables 0.03 s, create tables 0.08 s, primary keys 0.09 s).

 

Now we have the tables matching between on-prem and Flex but there is data only in the on-prem ‘mydb’ database.

 

PostgreSQL Logical Replication - Publications

This step involves setting up a Logical Replication ‘Publication’ on your source database for the tables you want to publish. See the official doc PostgreSQL: Documentation: 30.1. Publication for more details on the options available. The creation of the publication can be performed with the individual commands below or it can be scripted if, for example, you want to publish a larger number of tables, or perhaps all tables from one or more schemas:

 

create publication mydbpublic;

alter publication mydbpublic add table public.pgbench_accounts;

alter publication mydbpublic add table public.pgbench_history;

alter publication mydbpublic add table public.pgbench_branches;

alter publication mydbpublic add table public.pgbench_tellers;

 

Note that setting up a publication does not start any replication.

You can check on the publication configuration by using the following commands:

 

select * from pg_publication;

select * from pg_publication_tables;

 

Output should be similar the screenshots below:

 

bmckerrMSFT_3-1681852999550.png

 

bmckerrMSFT_4-1681852999554.png

 

 

PostgreSQL Logical Replication - Subscriptions

This step involves setting up a Logical Replication ‘Subscription’ on your target database for the tables you want to subscribe to. Here is the link to the official documentation PostgreSQL: Documentation:: 30.2. Subscription which explains more details and the options available.

Running this command (from psql, pgadmin or your favourite tool) will now finalize the setup and start the replication by default, it is possible to disable the initial sync:

 

create subscription sub_onprem_mydb_public connection 'host=dns.or.ip.of.yourserver port=yourport user=youruser dbname=postgres password=yourpassword' publication mydbpublic;

 

Note that executing the above command will lead to the target database (Flex in Azure) connecting to your source database with the details you provided. This means that network traffic will originate from an Azure managed IP address as described above.

You can check on the publication configuration by using the following command:

 

select * from pg_subscription;

 

And you should see output like this for your subscription;

bmckerrMSFT_5-1681852999556.png

 

 

Now to the final test, did my data replicate from mydb on-prem to postgres in Azure ? Give it a couple of minutes and try running something like this on your target database:

 

select count(1) from public.pgbench_accounts;

 

bmckerrMSFT_6-1681852999558.png

 

Yes, all 1 million rows have now been copied from source to target as part of the initial sync that happens by default when you create the subscription.

 

Now that we have set it up and can see it working, how about we run some pgbench benchmarks and prove that the solution works under a bit of pressure? Running the command below on my on-prem server will execute a read/write benchmark on the source database:

 

postgres@deb:~$ pgbench -h 127.0.0.1 -p 5436 -j 8 -c 50  -d mydb -U postgres

…. SNIP ….

pgbench: client 12 receiving

transaction type: <builtin: TPC-B (sort of)>

scaling factor: 10

query mode: simple

number of clients: 50

number of threads: 8

number of transactions per client: 10

number of transactions actually processed: 500/500

latency average = 40.184 ms

initial connection time = 180.443 ms

tps = 1244.288715 (without initial connection time)

 

How will I know the replication has worked ? Well, pgbench by default keeps a history of changes in the pgbench_history table, with a ‘delta’ column keeping a sort of transaction history. If you run something like this on both source and target databases the results should be identical:

 

select sum(delta) from public.pgbench_history;

 

First on the source;

bmckerrMSFT_7-1681852999561.png

 

And then on the target

bmckerrMSFT_8-1681852999563.png

 

Now is a good time to highlight some very important restrictions to consider. These are detailed in the PostgreSQL community docs here PostgreSQL: Documentation: 31.4. Restrictions and you should make yourself familiar with them.

 

Managing DDL Changes

It is probably a fairly common requirement to change some table structures at some point in time during an application’s lifecycle. It is important to note that Logical Replication will not replicate any DDL changes in either direction. For example, if you want to make a table definition change, like adding a column, this will break the replication. Perhaps the best way to manage this is to first stop the replication by using the ALTER SUBSCRIPTION command E.g.:

 

alter subscription sub_onprem_mydb_public disable;

 

This has the effect of pausing replication. Once complete you can alter the target table and add the column, then alter the source table and add the column. After that you can re-enable replication:

 

alter subscription sub_onprem_mydb_public enable;

 

Managing Conflicts

I mentioned Multi-Master replication in the intro but would like to touch on it a little more. There is nothing to stop you from designing, implementing and testing a setup like this where your application can use both databases and write to both databases, and even the same tables on these databases. It is important to note that PostgreSQL will not stop you from doing that, but should any conflict arise, replication will stop until you resolve the conflict. Details here PostgreSQL: Documentation: 31.3. Conflicts

 

Summary

Hopefully this article has helped you understand what PostgreSQL Logical Replication is and what it can and can’t do and provided some food for thought when either migrating to Azure Flexible Server or designing a new multi-master database for a business critical application.

 

References

Here is the link to the Azure documentation for Logical replication and logical decoding - Azure Database for PostgreSQL - Flexible Server | Microsoft Learn

 

Enable App Volume Replication for Horizon VDI on Azure VMware Solution using Azure NetApp Files

Overview

Cloud bursting is one of the common use-cases for Azure VMware Solution (AVS). It could be in the form of extending VDI solution, such as VMware Horizon, to AVS. This article explains a solution that addresses one of the concerns related to App Volume Replication when implementing multi-site Horizon deployment where AVS Private Cloud is one of the sites.

 

Background

When customers decide to expand their existing Horizon infrastructure to Cloud, VMware recommends a multi-site Horizon implementation using a multi-instance model with separate instances per site. Each instance works independently, with its own set of App Volumes Managers and its own database instance. Using separate instances per site is simple to implement and allows for easy scaling. Moreover, it provides resiliency therefore in case of outage, the remaining instance in the running sites can provide access to Packages and AppStacks with no intervention required.

VMware Horizon leverages App Volumes to provide real-time application delivery and lifecycle management. In multi-site implementation App Volume Replication uses Storage Groups as a solution that can synchronize Packages and AppStacks across sites. Storage Groups, which define logical groupings of datastores, makes this possible by looking at anything that resides in one datastore (i.e., on-premises) within the Storage Group and making sure that it exists in all the datastores within that Storage Group. Thus, if the same datastore is part of two different Storage Groups at two different sites (on-premises and AVS), then this will guarantee that Packages and AppStacks are being replicated across multiple (two) sites.

Storage groups containing a shared, non-attachable datastore can be used to automatically replicate packages from one instance of App Volumes to another. As this shared datastore is configured to be non-attachable, it will not be used to deliver attachments to user machines, so does not need to be overly performant, and often a low-cost NFS share is used to provide this.

 

Problem

One of the questions that needs an answer when configuring Storage Groups for App Volume Replication in a multi-site Horizon deployment is: What is the datastore that can be part of the Storage Group at AVS in Azure (Site 2) and, at the same time, part of on-premises (Site 2) Storage Group?

 

Solution

First, let’s understand the requirements; notice that the datastore needs to be part of two separate Storage Groups, each configured at both Horizon infrastructure sites. Thus, technically we need a datastore that we can mount to VMware vSphere stack in Azure (AVS) and, at the same time, mount it to VMware vSphere stack at on-premises.

This is where Azure NetApp Files (ANF) volume as a datastore comes in (see diagram below). ANF is a fully managed Cloud service that provides high-performance file storage. With Azure NetApp Files, you can create a high-performance NFS datastore that can be used as a storage solution for your AVS Private Cloud. Currently, ANF volume is the only feasible and economical solution that allows adding additional datastore for expanding AVS Private Cloud cluster storage without scaling up with additional hosts.

 

Main.png

Now, keep in mind that, ANF account uses a concept in Azure networking known as delegated subnet. To simplify things a bit, you can imagine that the ANF account is connected to a network subnet in an Azure Virtual Network (vNet). Thus, this allows customers to get a predictable private address for the ANF volume once created. Thus, in addition to mounting that NFS datastore to AVS cluster, which is a feature, you can also mount it to on-premises VMware vSphere environment as long as connectivity is established either through Azure ExpressRoute or Site-to-Site VPN.

 

 

Implementation

The instructions below help you achieve the solution discussed above. Also, make sure to follow the Best Practices section in this article:

  1. Create Azure NetApp Files account. More details here.
  2. After creating an ANF account in Azure, create a Capacity Pool with minimum of 2 TB capacity, with Standard storage tier. You may keep QoS as Auto.
  3. Create a Volume inside the Capacity Pool created in the previous step. You may set the Quota to 2048 GB. If it was not done before, you may need to create a designated subnet in existing Azure vNet that is connected to AVS Private Cloud through an Azure ExpressRoute Connection. Make sure the selected Protocol Type is NFS and use version 3. Last and foremost important is to check Azure VMware Solution Datastore checkbox (Enabled).
  4. Mount the Volume to AVS Private Cloud, as explained here.
    HusamHilal_1-1681785600023.png
  5. Mount the Volume to On-Premises vCenter, assuming you have the line of sight (connectivity) from on-premises to Azure vNet where ANF subnet delegation is configured. You’ll need the private IP address of the Volume with the mounting pathHusamHilal_2-1681785600024.png

    Mount-to-on-premises.gifNew-Datastore-Task.PNG
  6. Test by uploading a file to the datastore at one of the sites (i.e., on-premises). Now, you may notice that the file is immediately shown on the other site (AVS). Technically, that is because it is the same datastore.
    File-Uploaded-Onprem.PNGFile-Uploaded-AVS.PNG
  7. The last step would be configuring VMware App Volumes, by creating Storage Group at both sites, and selecting the ANF Volume datastore in each of the Storage Groups, so it becomes the common ground for replication.

 

Best Practices

To achieve the best results, make sure you follow the best practices when configuring AVS and creating ANF account as documented here. For example:

  • Make sure to use UltraPerformance or ErGw3Az SKU for the Azure ExpressRoute Gateway when connecting between AVS Private Cloud and Azure Virtual Network (vNet).

    ER-GW-SKU.png
  • Make sure to enable FastPath on the Azure ExpressRoute Connection.
    ER-Connection-FastPath.png
  • Make sure to place ANF account in the same region and same availability zone (AZ) of AVS Private Cloud.
  • Make sure to choose the appropriate storage tier for ANF Capacity Pool. For this specific use-case (Horizon/App Volume Replication), Standard storage tier should be fine.
  • Make sure to choose Standard network features when creating the ANF volume to enable optimized connectivity from AVS Private Cloud.
  • Make sure to maximize Volume size to Capacity Pool size. Keep in mind that customers will pay for the Capacity Pool size not the Volume capacity. Moreover, capacity and performance in ANF accounts are proportional. In other words, the higher capacity the higher performance. For example, if you create a 2 TB Capacity Pool, then create 2 TB volume as well if you are not using that Capacity Pool for other purposes.ANF-Volume-Prop.png

 

Summary

In this article, we addressed a concern at configuring App Volume Replication leveraging Storage Groups when implementing VMware Horizon multi-site design where AVS is one of the sites. We also explained how Azure NetApp Files service was used to provide a datastore that is mounted to AVS Private Cloud cluster and mounted to on-premises environment, and best practices associated with that.

 

Resources

Here are some resources that include supportive materials to this article:

Horizon on Azure VMware Solution (Microsoft - Learn)

Horizon on Azure VMware Solution (VMware - TechZone)

App Volume Architecture – Multi-site Design

Attach Azure NetApp Files datastores to Azure VMware Solution hosts

How-to: Deploy Azure VMware Solution with Azure NetApp Files datastore

Simulator: Deploy Azure VMware Solution with Azure NetApp Files datastore

Azure VMware Solution Learning Resources

Azure VMware Solution Landing Zone Accelerator (Enterprise Scale Landing Zone)

SQL Server でオフライン バックアップおよびオフライン リストアを行うと問題が出ることがある

こんにちは。日本マイクロソフト SQL Server サポートチームです。

 

公開ドキュメントとして公開されていた内容を、ブログとして掲載します。

 

現象
Microsoft SQL Server で SQL Server のサービスを停止した状態の場合、データベース ファイルをバックアップしたり (オフライン バックアップ)、ファイルを復元する (オフライン リストア) 操作を実行すると問題が出ることがあります。

 

回避策
オフライン バックアップおよびオフライン リストアは SQL Server が提供している機能ではありません。

SQL Server の機能として提供しているバックアップまたはリストアの方法を使用してください。

 

詳細
オフライン バックアップまたはオフライン リストアの操作を実行する場合、SQL Server は関与できないため、問題が起こらないことを保証できません。また、テストなど動作確認も実施されていません。したがって、オフライン バックアップまたはオフライン リストアはお客様の責任において実施する必要があります。

 

全てのデータベースと全てのファイルを対象とした場合 :
全てのシステムおよびユーザ データベースの全てのファイルをあるタイミングでオフライン バックアップし、これら同じタイミングでオフライン バックアップした全てのファイルをオフライン リストアした場合には、SQL Server がある時点から処理を再開するために必要な全てのデータベース ファイルが揃っており、ファイル バックアップ後に操作ミスや、何らかの要因によるファイル破損などの問題が発生していなければ、SQL Server の起動が可能で、その後のデータベースの整合性に問題が起こることなく、オフライン バックアップおよびオフライン リストアが利用可能な場合があります。
例えば、ディスクやボリュームのバックアップおよびリストアなどで全てのデータベースと全てのファイルがオフライン バックアップまたはオフライン リストアする場合、データベースの整合性の問題などを含め障害が発生することは通常ありません。ただし、オフライン バックアップまたはオフライン リストアは SQL Server の機能ではないため、SQL Server としてデータベースの整合性を保証できないことにご注意ください。
 

違うタイミングでオフライン バックアップしたファイルをオフライン リストアして使用する場合 :
次の条件の場合、データベースの整合性エラーが発生する可能性があります。また、この問題がシステム データベースで発生すると SQL Server が起動できないなどの状態に陥る可能性があります。

 ・各データベースごとに違うタイミングでバックアップしたファイルをオフライン リストアする。

 ・ある単一のデータベースの構成ファイル (データ ファイルやトランザクション ログ ファイル) の一部のファイルだけ、違うタイミングでオフライン バックアップしたファイルをオフライン リストアする。

 

デタッチした後にファイルをバックアップする場合 :

デタッチしたファイルは、SQL Server がデータベースを安全に終了する (デタッチする) ために必要な作業を実施した後に、これらデータベース ファイルを SQL Server から切り離しているため、デタッチした後のファイルのバックアップについては SQL Server としてデータベースの整合性などの問題がないことを保証しています。ただし、操作ミスやハードウェアの問題などの何らかの要因によってファイルが破損していないことが条件となります。

Productivity and Training Acceleration with Azure Container for PyTorch

Introduction

 

The Azure Container for PyTorch  (ACPT) is a curated environment in the Azure Machine Learning (AzureML) service with the latest PyTorch version and optimization software such as DeepSpeed and ONNX Runtime for training. The image is tested and optimized for the Azure AI infrastructure and ready to use by model developers for their training scenarios. The General Availability of ACPT in AzureML was launched with PyTorch 2.0  and Nebula for performance efficient training. 

 

The Microsoft Ads team serves Microsoft Search Network which powers 37.5% of U.S. desktop searches and 7.2 billion monthly searches around the globe. In the U.S., the Microsoft Search Network has 117 million unique searchers. (source: Microsoft Advertising | Search Engine Marketing (SEM) & more) The team’s mission is to empower every user and advertiser in the network to achieve more. As a part of the mission, Ads team builds state-of-the-art Ads relevance quality models to achieve the most satisfying experience for both users and advertisers, including TwinBERT which is continually contributing to Ads quality improvement.  

 

TwinBERT is a SOTA distilled model with twin-structured BERT-like encoders to represent query and document respectively and a crossing layer to combine the embeddings. It is a highly efficient transformer-based model with CPU latency for inference reduced to milliseconds. This allows TwinBERT to be servable in large-scale systems, while keeping the performance advances comparable to BERT. Due to its high effectiveness and efficiency, TwinBERT is widely adopted in the Microsoft Ads stack across different components with significant business impact.  

 

This blog presents a TwinBERT model trained with ACPT and demonstrating significant improvements in developer productivity and training efficiency. 

 

Developer Productivity 

 

The ACPT support for various acceleration techniques makes it effortless for users to optimize their training process and bring their projects to fruition in a timely manner. By taking advantage of ACPT, the users can focus on their own research, development, and experimentation instead of worrying about the compatibility and stability of their development environment, which leads to faster model development. With the support for advanced acceleration techniques provided by the ACPT, users can now take full advantage of these improvements to quickly train their models and bring their projects to market much faster. 

 

Incorporating ACPT into our machine learning workflow has never been easier. By selecting the latest version of the ACPT docker image and adding the necessary libraries for our specific training task, we were able to quickly create a customized docker image for our development tasks. The latest versions of ONNX and DeepSpeed frameworks have significantly simplified the integration of ACPT into our workflow, requiring only minor code alterations.  

 

Peng_Wang_0-1681766100339.png

Figure 1: Selecting the latest ACPT image in AzureML 

 

 

Peng_Wang_1-1681766100342.png

Figure 2: ACPT Image details 

 

Our best practice for storing customized Docker images is on cloud storage platforms, such as Microsoft Azure Container. By sharing these images via links, TwinBERT users can work on the same containerized environment without having to set up their own separate instances. This practice is very useful for TwinBERT users to work on their own tasks with the provided code and development frameworks. 

 

Efficiency Improvement 

 

Training state-of-the-art deep learning models such as TwinBERT can be a challenging and time-intensive task, especially when it comes to large amounts of data. Using traditional deep learning frameworks, training a TwinBERT model can take several days, making it difficult for developers to experiment with new ideas and iterate quickly. Fortunately, by leveraging cutting-edge advancements in parallel computing and hardware acceleration through ACPT, the development cycle for us can be significantly accelerated, allowing us to spend more time focusing on innovation and experimentation, instead of waiting for models to train. So, if you are a developer looking to accelerate your deep learning workflow, ACPT is a great option for you. 

 

We would like to highlight the advancements in training machine learning models made possible through the ACPT image. The table presented below displays the difference in acceleration techniques during the training process and it is evident that the combination of DeepSpeed and ORT results in incredible improvements. With a +25% training speedup and a 37% reduction in GPU memory usage, users are now able to train their models faster and more efficiently. This combination also maintains a comparable performance in terms of AUC, demonstrating that the use of these acceleration techniques yields both improved efficiency and accuracy for users training their machine learning models. The configuration used for training was V100-32G machines, batch size = 4096, 5 epochs. The chart below visualizes the data summarized in Table 1. 

 

Peng_Wang_2-1681766100345.png

Figure 3: Throughput and Memory improvements with ACPT Technologies 

 

Setting 

Normalized Training Speedup 

Memory usage (%) 

PyTorch 

1 

84 

PyTorch + DeepSpeed 

1.09 

63 

PyTorch + ORTModule 

1.22 

65 

PyTorch + DeepSpeed + ORTModule 

1.25 

53 

Table 1: Throughput and Memory Usage using ACPT

 

Looking Forward 

 

For the Bing Ads team, working with the ACPT image has greatly simplified the training process and opened new opportunities for model development. We invite model developers to try out the new ACPT image for accelerating your model training tasks. Along with the efficiencies from the latest version of PyTorch, you can also leverage ORT and DeepSpeed for training. ORT Training is available as the backed for acceleration in the Optimum library for training acceleration in Hugging Face. It is also interoperable across multiple hardware platforms (Nvidia and AMD) and composes well with the optimizations in both the torch.compile mode and DeepSpeed to provide accelerated training performance for large scale models. DeepSpeed has trained powerful language models like Megatron (530B) and Bloom and continues to bring the latest advancements in model training for its users. Large language model developers can use the ACPT image as an easy and efficient way to finetune and develop their models for their domain specific data. 

❌
❌