In the previous post we briefly touched on holding the main method from restarting, and as a trick, we used the Console.ReadLine() blocking call. But as we postulated, it's good to avoid any thread-blocking function calls. Furthermore, the process termination and recycling is a more complicated story, especially for code that is running on the edge, perhaps on a remote device with no keyboard, mouse or monitor to interact with. A better strategy in these cases is to control the process termination and use it as a recovery mechanism from unexpected situations, commonly called fatal exceptions.

Application termination and recycling

The best troubleshooting strategy

Recycling

The Azure IoT Edge hosting mechanism can be configured to restart a stopped module through the restartPolicy configuration in the application manifest. The main point here is that the hosting runtime assumes that the application might crash and exit, and allowing the application to restart is a very appealing crash recovery mechanism that we should definitely leverage.

The restartPolicy configuration dictates how the IoT Edge agent restarts a module. Possible values include:

  • never – The IoT Edge agent never restarts the module.
  • on-failure - If the module crashes, the IoT Edge agent restarts it. If the module shuts down cleanly, the IoT Edge agent doesn't restart it.
  • on-unhealthy - If the module crashes or is considered unhealthy, the IoT Edge agent restarts it.
  • always - If the module crashes, is considered unhealthy, or shuts down in any way, the IoT Edge agent restarts it.

The product has other ways to remotely stop and start any module in an ad-hoc manner, so setting the restartPolicy to always is, generally speaking, a good strategy.

Cancellation

In the second code sample of the previous post, we saw how to use the async/await pattern to implement non-blocking code parallelism. But in that approach, there was no way to signal the application to exit gracefully, and this can become particularly challenging in highly parallel code. The .NET Core runtime provides a parallel operation cancellation mechanism through the CancellationTokenSource.

When a fatal exception issue occurs, a graceful exit is always preferred versus an application crash, because this allows debugging information to be written in the application logs for post-mortem analysis.

Now, let's see how our code evolves with the usage of a CancellationTokenSource:

using System;
using System.Runtime.Loader;
using System.Threading;
using System.Threading.Tasks;

namespace Example3
{
    class Program
    {
        // This is the program entry point
        static async Task Main(string[] args)
        {
            // Create the cancellation token source
            var cancellationTokenSource = new CancellationTokenSource();
            AssemblyLoadContext.Default.Unloading += 
                (cts) => cancellationTokenSource.Cancel();
            
            Console.CancelKeyPress +=
                (sender, cts) =>
                {
                    Console.WriteLine("Ctrl+C detected.");
                    cts.Cancel = true;
                    cancellationTokenSource.Cancel();
                };

            // Register the Reset command callback 
            await RegisterCommandCallbackAsync("Reset", 
                OnReset, 
                cancellationTokenSource.Token);

            // A non-blocking telemetry emission invocation
            await EmitTelemetryMessagesAsync(cancellationTokenSource.Token);

            // Wait until the app unloads or gets cancelled
            await WhenCancelled(cancellationTokenSource.Token);

            // Let the other threads drain
            Console.WriteLine("Waiting for 2 seconds..");
            await Task.Delay(TimeSpan.FromSeconds(2));

            Console.WriteLine("Exiting..");
        }

        public static Task WhenCancelled(CancellationToken cancellationToken)
        {
            var taskCompletionSource = new TaskCompletionSource<bool>();
            cancellationToken.Register(
                s => ((TaskCompletionSource<bool>)s).SetResult(true), 
                taskCompletionSource);
            return taskCompletionSource.Task;
        }
        private static async Task RegisterCommandCallbackAsync(string command,
            Action callback,
            CancellationToken cancellationToken)
        {
            // Perform the command registration
            // Code omitted
            return;
        }

        // A method exposed for RPC
        static void OnReset()
        {
            // Perform a temperature sensor reset
            // Code omitted
        }

        // Emit telemetry messages
        static async Task EmitTelemetryMessagesAsync(
            CancellationToken cancellationToken)
        {
            while(true)
            {
                if (cancellationToken.IsCancellationRequested)
                {
                    Console.WriteLine($"Exiting telemetry pump..");
                    break;
                }
                Console.WriteLine($"Sending telemetry message..");
                await Task.Delay(TimeSpan.FromSeconds(1));
            }
        }
    }
}

The cancellation token source allows us to signal a cancellation by invoking the Cancel() method, as we decided to do when the AssemblyLoadContext.Default.Unloading fires, an event that generally means your application is already exiting, or when the Console.CancelKeyPress event fires (Ctrl+C), useful for testing the cancellation mechanism in our development environment. A requested cancellation can be detected by the cancellationToken.IsCancellationRequested property, and then make sure we gracefully terminate any running parallel operation, similarly to what we did in the EmitTelemetryMessagesAsync.

Now in the EmitTelemetryMessagesAsyncwe can loop forever and break based on the cancellation token signal.

Finally, we can register to this cancellation event and when this event fires, let the Main method return by awaiting the WhenCancelled function. In other words, now we have an event driven mechanism to gracefully exit from all active threads of our program.

The rules of thumb here are:

  1. Pass a reference of the CancellationToken to every async function, and to every the long-running synchronous. Use this token signal to gracefully return from these functions.
  2. Pass a reference to the CancellationTokenSource to every function that performs critical operations that cannot recover from possible exceptions, e.g. initialization of used dependencies. Use the source to cancel the execution of the program.
  3. Let the main function return when a cancellation is signaled. Allow some time for the other threads to complete their shutdown process.

In the next post we will examine more elegant ways to implement telemetry pumps without the usage of while loops.