Microsoft Flow, SharePoint, 429, and Throttling a Workflow to Death

We’re here to mourn the death of many a workflow instance at the hands of SharePoint’s HTTP throttling. Except it’s not SharePoint’s throttling that is the true killer. SharePoint’s just the accomplice in this crazy dance that will get your workflows killed. Though it’s possible to protect your workflow instances from being throttled to death, it isn’t as easy as it might seem. In this post, we’ll talk about what happens every day and then what strategies you can use to protect your workflow.

Request Throttling

Before we can explain how a flow gets throttled to death, we first must understand a bit about throttling. Web servers are under constant assault from well-intended users, the code written by bumbling idiot developers (of which occasionally I am one), and malicious people. One of their defenses is to respond with “I’m too busy right now, come back later.” This comes back as the code HTTP status code 429. Sometimes these responses are kinder and will indicate when the person should come back. “Hey, I should be able to take care of that request for you after seven seconds” tells the program when it should retry the request – and here’s the kicker – expect that it will be able to be serviced.

The problem is that, when you’re dealing with so many different users and so many variables, it’s sometimes difficult to predict when the server will be willing or able to service a request.

Automatic Retries

Because a server may respond with a 429, most programs know to retry the request. The strategies vary but the default answer is an exponential interval. The first time you get a 429, you stop for a period of time – say four seconds – and then each time you retry, you square the interval. So, four seconds becomes sixteen on the second retry, and 256 seconds on the second interval. Flow allows you to follow the default policy, which is described as exponential, or explicitly set the interval to exponential. To do this, click the ellipsis at the right of the action and select Settings.

The action’s space in the flow will change to the settings view, where you can explicitly set the retry policy to exponential – which will further change the view to provide spaces for a maximum retry count, an interval, minimum interval, and maximum interval.

The default settings, however, are supposed to do exponential waiting on retries and four retries. So that seems like a good place to start. The default is implemented as one try plus three retries. That is how they get to four retries.

Retry Schedule Conflicts

What happens when there is a conflict between the provided retry schedule and what the server responded with as a part of its 429 response? The good news is that Flow will use the larger of the two numbers. In theory, at least, you’ll never hit a retry more than once or twice. Sure, the server could make a mistake with its guidance once but surely not twice. Unfortunately, that’s not always the case. Sometimes the response from the server – and the default exponential interval – will be too small, and you’ll exhaust the three retries and end up failing your request. This typically happens when you have multiple flows running at the same time.

Parallelism

Any given Flow may not be making that many requests, but what happens when there are many Flow instances running at the same time? If you do work on a queue that information is dropped into, you can’t necessarily control how many items will come in at the same time. With the scalability of the Flow platform, what’s to stop you from running hundreds or thousands of Flows at the same time – against the same poor server that’s just trying to cope?

If you have 100 Flows all starting in a relatively short time talking to the same back end server, it may be getting thousands of requests from Flow every minute. Even if each Flow is told to retry later, even the server telling the consumer to retry later may cause the server to need to push off work even further. The result is that every Flow gets throttled to death – until so few remain that they can be handled inside of the capacity of the server.

Luckily, Flow offers a technique for limiting the number of active flows at any time. This can be done by going to the ellipsis and then Settings for the trigger. This changes the trigger to its settings display, which allows you to limit the number of concurrent Flow instances – or the degree of parallelism.

SharePoint Request Throttling

Generic request throttling is fine but how does my favorite server – SharePoint – do it? Well, the answer isn’t clear. Microsoft published an article “Avoid getting throttled or blocked in SharePoint Online“, which says that they won’t publish the rules – because they’re changing them.

At the same time, they make clear that some of the criteria being used to manage the workflow are things like user agents, accounts, and App IDs. However, when it comes to Flow, we have some limitations.

Flows always run as the user that created the Flow – not as the user initiating the request. So, from the point of view of SharePoint, one user is making all the requests – and they’re making them from the same application, Flow. This makes Flows a high target for throttling, even when you consider it’s a well-known and well-trusted application.

It turns out there’s more to it than that. There’s an interface layer that the connectors – including the connector to SharePoint – use.

Connectors and Infrastructure

Many of Microsoft’s new offerings like Flow, PowerApps, and PowerBI need access to the same service data inside of multiple tenants. As a result, the connectors use a common architecture that allows multiple services to interact with Microsoft’s online service offerings. This architecture has its own throttling built in. It’s designed to protect the back-end services and has its own rules for throttling requests that’s more aware of the uniqueness of each of the fixed number of Microsoft internal consumers.

One of the things that this infrastructure can use to manage throttling is the connection to the service. In the first figure, you’ll notice two connections in the settings menu – with the same identity. This is one way that you can help the infrastructure avoid throttling you.

Multiple Connections

When there aren’t many things to differentiate requests on, the connection is one. It’s got its own identifier. Because of that, it’s easy for the back end to see which connection a request is associated with – and throttle too many requests from a single connection. Thus, if you want to help your Flow avoid getting throttled, you can make multiple connections to the same data source. This allows your requests to get spread across different thresholds and for more to get through.

The Fingerprints Match

For my case, the Flows were getting throttled not by SharePoint directly but through the infrastructure hub. I set the maximum degree of parallelism and assigned every action in the flow to a different connection, but it wasn’t enough. I didn’t set the retry settings manually, and the default settings continued to allow my poor Flows to be throttled to death by the infrastructure.

In the end, to spare more Flows from being throttled, we moved some of the data to Azure SQL. However, we saved many Flows just by adjusting the retry strategy and concurrency and creating multiple connections.

water flowing

Microsoft Flow and Azure Logic Apps Quick Formula Expression Guide

Microsoft Flow and Azure Logic Apps are powerful tools for automation, with tons of connectors and the things that you need to do work. Microsoft Flow in particular makes the design of a workflow easy with a rich design experience – except when it comes to expressions. You’re expected to know some relatively arcane formula expression rules to make your steps work together. I’ve gathered up the things that I felt like are the most important for writing expressions into this quick guide. Let’s get started with some basic math.

Arithmetic

If you want to get tripped up in a hurry, try to write your basic math statements with the operators that we’ve come to know and love: + – * /. They’ll make your expression fail – unless you swap them for their function call equivalents:

+ add()
sub()
* mul()
/ div()

The parameters to each of the functions occur in the order you would expect. Thus, you can swap:

A*B/C+D with…

add(div(mul(variables(‘A’), variables(‘B’)), variables(‘C’), variables(‘D’))

It’s not super easy to read, but once you get the pattern, you can create even more complex statements. For now, ignore the variables() function, we’ll come back to that in a bit.

Nulls

With basic math out of the way, next it’s important to know how to deal with nulls. Dealing with nulls is two pieces. First, separating a property with a question mark (?) instead of a period (.) will automatically handle nulls for you – return a null in the end. This greatly reduces the amount of testing that you must do to ensure that you don’t accidentally reference a null value.

The corollary to this is what do you do to get a default when a value is null. Here we use the manipulation function coalesce(), which returns the first non-null object. So, if I were to do:

coalesce(variables(‘A’), variables(‘B’), ‘Default’)

I’d get A if it is non-null. If A is null, I’ll get B if it’s non-null. If both A and B are null, this will return the string ‘Default’. If you’re wanting to get a zero instead of a null, you can wrap the result in a float() or int() conversion, which will return zero when the value is null.

One Thing About Strings

There are many string functions, but one that’s particularly useful and necessary. That’s the concat() – short for “concatenate” – function. It takes two or more parameters and concatenates them into one large string. If I wanted to create an ODATA compatible filter, I might write:

concat(‘SalesProID eq ‘, variables(‘A’))

If I needed to put a single quote around the values, I’d use the escape syntax, which is two single quotes together. If I wanted to filter by OrderType, the command might look something like:

Concat(‘OrderType eq ”’, variables(‘A’), ””)

That provides a string that has the quotes needed around the value.

Getting Values

That’s good basics, but how do you get the values from other parts of your Flow into your equation? You’ve already seen one approach. That is to use variables() to fetch one of the variables you’ve defined in your flow. To do this, you simply include the variable you want to use in single quotes like this:

variables(‘MyVariableNameHere’)

This will return the value of the variable. However, it’s more likely that you want to get a variable from another step. Here, we start with the values from the trigger that are fetched with triggerBody(). If I were fetching the ID of the SharePoint item that triggered the flow (assuming that’s our trigger), I’d write:

triggerBody()?[‘ID’]

This would return the ID of the item or null. If I had another step that fetched a different item, or had some other output, I could use the body() function with the parameter, which is the name of the step. So, if the step was named “Quote” and I wanted the symbol property, I could write:

body(‘Quote’)?[‘Symbol’]

There’s one caveat here. If the step has spaces in its name, then you must replace those with underscores. To get the Quote property from the Quote of the Day step, you’d write:

body(‘Quote_of_the_Day’)?[‘Quote’]

One other common condition is when you’re working on a set of items in a For Each step. If you’re in a For Each step, you can get the values from the current item with item(). If I wanted to get the ID of the current item, I’d write:

item()?[‘ID’]

If you’re inside nested For Each you can refer to the specific item, you want to pull from by using the function items() and specifying the name of the For Each. If I had a name of a For Each Step called ‘Each Item’ inside of a For Each Step called ‘Each Order’, I could write an expression to get the order number and item number in a string like this:

concat(‘Order Number :’, items(‘Each_Order’)?[‘OrderNumber’], ‘ Item Number :’, items(‘Each_Item’)?[‘ItemNumber’])

If you want to see the full list of functions, you can go to https://docs.microsoft.com/en-us/azure/logic-apps/logic-apps-workflow-definition-language — for warning, there are lots of methods listed there, but you’ll have to search for exactly what you’re looking for.