Application Insights for App Services

Last updated: 2023-08-07

Application Insights has been a stalwart of the Azure observability stack for many years. It is a powerful and user-friendly tool for manual debugging and investigation of application issues and for gaining an overview of application performance. One of the main objections that tends to be levelled against Application Insights is that, as the name suggests, it is application-specific. In most of the documentation and samples you will see a one-to-one relationship between an application and an Application Insights instance. Each of these is an individual entity. There is not really a concept of the applications existing within a particular domain. In this pattern, Application Inights does not provide you with a single overview. Instead, each app service is siloed into its own space with its own dashboards and metrics views. As we will see, the one-to-one mapping is not the only possible model for running Application Insights and we can configure it to take a more holistic view of the landscape.

In the wild, the primary use case for Application Insights tends to be for monitoring Web Apps and Function apps. Technically though, it can also be wired up to .Net apps running in containers. The main strength of Application Insights is in collecting and visualising metrics and telemetry, although it can also be used for custom logging.

For the purposes of this exercise we will be running Application Insights with an Asp.Net Core Web App running .Net 7.0 on a Linux Host. In order to test features like tracing and dependency mapping this application connects with:

  • another web app
  • an Azure Sql database
  • an Azure Storage Container
  • an Azure Service Bus
Our application is a very trivial app called 'daisychain'. We have created three instances of the app. It is designed so that one instance passes calls to the next instance. This provides us with a simple framework for evaluating observability features such as distributed tracing.

App Service Tiers

A number of the features we cover in this discussion are not available if you are using the free tier of web apps. This is because a number of features rely on the Always On setting being enabled and this feature is not available in the free tier. The Basic Tier seems to work just fine.

AutoInstrumentation

Application Insights can, essentially, either be run in a loosely coupled mode which requires almost zero configuration or in more tightly coupled modes which may require installation of packages at the solution level as well as creating explicit dependencies at the source code level. If you are following the 12 Factor App philosophy, then the former approach may be preferable. AutoInstrumentation is applied at the platform level and is transparent to the application itself. The Application Insights agent will automatically be installed for the Web App and run alongside it in Azure, but you don't need to make any code changes or install any packages or Connected Services in Visual Studio. At this time, AutoInstrumentation is not supported for apps running on AKS. The table below shows the support for AutoInstrumentation across different application types and platforms as of July 2023:

Source: https://learn.microsoft.com/en-gb/azure/azure-monitor/app/codeless-overview

According to the table, a Web App running on a Linux host and published as code has App Insights "on by default and enabled automatically". Even though the instrumentation is auto-enabled for this scenario, you still need to enable Monitoring in the Azure portal (or in code) to view your telemetry in Application Insights.

Set Up Application Insights

  1. Select Application Insights in the left pane for your app service. Then select Enable.

  2. Create a new resource or select an existing Application Insights resource for this application.

    Note

    When you select OK to create a new resource, you're prompted to Apply monitoring settings. Selecting Continue links your new Application Insights resource to your app service. Your app service then restarts.

It is worth paying careful attention to the contextual information when you are creating a new Application Insights instance. Firstly, an Instrumentation Key will be generated. This is used to identify your application when it sends its telemetry to the Application Insights ingestion endpoint.

Equally, Application Insights is not a free lunch. The telemetry it collects needs to be stored somewhere - in a Log Analytics Workspace - and this will attract a storage cost. As we will see later - these costs may be minimal, but given large enough data volumes they can also start to really hit your wallet.

  1. After you specify which resource to use, you can choose how you want Application Insights to collect data for your application. These options are grouped according to the runtime your application is using - i.e. .NET, .NET Core, Node.js Java or Python. ASP.NET Core.

There are a few things to bear in mind here

  1. This process will default to selecting an existing workspace - which is most likely the behaviour you will desire.If you want to run queries or create dashboards spanning all of your production applications then it makes sense to consolidate your logging into a single workspace. So, when and why might you want to have multiple Log Analytics Workspaces? Really, the usual considerations for many cloud architectural decisions are also applicable here:

    1. Separation of environments. Most companies will have separate subscriptions mapping to each of their environments - e.g./ Dev, QA, Prod etc. If you run Dev and QA in the same subscription, you may want a workspace for each environment
    2. Security - it is possible that you may wish to protect access to certain logs and may therefore wish to separate out more sensitive app logs into separate workspaces and apply RBAC to restrict access
    3. Resource limits - if you are working at very large scale you may come up against the physical limits for log ingestion. There are, for example, physical limits to the number of records that can be ingested per second for a workspace. In this case you will need to split your ingestion streams aross multiple workspaces.
  2. You will be presented with this rather cryptic option:

This determines the behaviour of (AutoInstrumented) Application Insights if it finds that the Application Insights SDK has already been installed in your application. Unless you have a compelling reason to switch this on. it is best to leave it switched off.

  1. Snapshot Debugger:
    This can potentially be a very powerful feature - as it will give visibility of the values of variables at the point at which an exception occurred. In order to take advantage of this feature though you will need to carry out some further configuration - as well as inserting the TrackException call into your code. The up-front effort is not too burdensome in relation to the rewards you will reap.

We are running a .Net Core Web App targetted at the .Net 7.0 framework. We are going to send telemetry to an existing Log Analytics Workspace rather than creating a new one:

We are going to enable Application Insights with the following settings:

NB - when you apply these changes they will kick off a re-start of your site!

Viewing Our App Insights

We have created our Application Insights resource. Now it is time to look into the analytics it provides. As with most Microsoft tooling, there are a few different paths we can take through the UI and different approaches to viewing the same data.

If we look at the Overview page for our app we will see that an Application Insights instance has now been created:

If we click on the link for our instance we are taken to the overview page where we see some key metrics:

Even though we have added not added any internal instrumentation to our app, we are nevertheless are provided with a rich array of data and insights The Application map successfully identifies the calls being made to each of the other Azure services we are connecting to :

If we click on the Azure Service Bus node on the App we can drill down and get further insights into our calls into the service:

This is a summary view but we can also drill down further to see details of each individual call by clicking on the 'Drill into...' option on the right hand pane:

This gives us timings for the overall method call as well as each of the messaging interactions:

Service Bus Analytics

Application Insights will display the lines of communication between your application and the Service Bus instance. If you need any diagnostics on messaging errors or the health of the Service Bus endpoint you will need to navigate to the home page for the Service Bus instance in the Azure portal.

The Transaction Search view gives us a listing of web requests:

The options on the Investigate section of the menu sidebar really present us with some excellent insights into our Web App. We have immediate insight into key indicators such as Availability, Failures and Performance:

We can also see that tracing works successfully - we have a trace of a method call from daisychain1 to daisychain2:

This has not required any specific configuration on our part. The key at the top explains the meaning of the different colours and icons being used in the chart:

This icon:

informs us that traces are available. If we click on it we see a timeline of process calls as well as 'trace' entries.

These trace entries actually correspond to log entries we have created in our application using the built-in .Net core ILogger interface:

Let us not forget that we get all of this out of the box without having to do any customisation of our app configuration or our code and without having to explicitly install any libraries or agents.

Profiling

The Microsoft documentation describes the Profiler as follows:

https://learn.microsoft.com/en-us/azure/azure-monitor/profiler/profiler-overview


Diagnosing performance issues can be difficult, especially when your application is running on a production environment in the cloud. The cloud is dynamic. Machines come and go, and user input and other conditions are constantly changing. There's also potential for high scale. Slow responses in your application could be caused by infrastructure, framework, or application code handling the request in the pipeline.

With Application Insights Profiler, you can capture and view performance traces for your application in all these dynamic situations. The process occurs automatically at scale and doesn't negatively affect your users. Profiler captures the following information so that you can easily identify performance issues while your app is running in Azure:

  • Identifies the median, fastest, and slowest response times for each web request made by your customers.
  • Helps you identify the "hot" code path spending the most time handling a particular web request.

Enable the Profiler on all your Azure applications to catch issues early and prevent your customers from being widely affected. When you enable Profiler, it gathers data with these triggers:

  • Sampling trigger: Starts Profiler randomly about once an hour for two minutes.
  • CPU trigger: Starts Profiler when the CPU usage percentage is over 80 percent.
  • Memory trigger: Starts Profiler when memory usage is above 80 percent.

Each of these triggers can be configured, enabled, or disabled on the Configure Profiler page.

To access the Profiler just click on the Performance option in side menu:

Within the Performance view, you will also see a Profiler option:

If you click on this option you can see that some profiling sessions have automatically been created:

This gives us some really rich metrics on CPU usage at a really granular level:

Live Metrics

This is a really powerful feature - and as the name suggests, gives us a real-time view of some key metrics for our application. This can be especially valuable if you have just deployed an update to your application and want some visual confirmation that it appears to be running correctly. This is a feature which really lives up to its name as there is a latency of around one second for displaying your application metrics. The metrics are streamed and then discarded - so using Live Metrics does not incur any storage charge - equally there is also no charge for streaming the metrics. If you need to, you can also use the SDK to to fine-tune or extend the Live Metrics confiuration.

SQL Commands

Our application makes back-end calls to a SQL Server database hosted in Azure. We have already seen from the Application Map that Application Insights has identified that we are connecting to a SQL database:

If we have the SQL Commands option enabled we can drill down and see the actual query being sent to the Sql server. To be able to achieve this level of end to end visibility with very light touch configuration and no up-front cost is quite a persuasive selling point for Application Insights. Naturally, Microsoft have an inherent advantage here as Azure is their platform so they can enjoy the benefit of directly accessing the internal plumbing whilst third parties will have to build agents dependent on API's.

We can also drill down and see transaction details:

Snapshot Debugger

The thought of having a 'debugger'' running on a production system is enough to bring most sys admins out in a cold sweat. Luckily, the snapshot debugger is not a tool that enables devs to remotely step through their code as it is running on a live system. Instead, it collects a snapshot to give a fuller picture of your application state when an exception is thrown.

Enabling the debugger is simple:

Next we need to install the Application Insights SDK Nuget package

Then install the Application Insights Snapshot Collector Nuget package:

Then we need to enable Telemetry in our startup configuration in Program.cs: Insert the following code prior to the builder.Services.AddControllersWithViews() statement. This code automatically reads the Application Insights connection string value from our configuration. The AddApplicationInsightsTelemetry method registers the ApplicationInsightsLoggerProvider with the built-in dependency injection container that will then be used to fulfill ILogger implementation requests.

    
     builder.Services.AddApplicationInsightsTelemetry(); 

so that your code looks like this:

You also need to add this line to enable the Snapshot Collector

                    
     builder.Services.AddSnapshotCollector(config => builder.Configuration.Bind(nameof(SnapshotCollectorConfiguration), config));
        
    


In your application code add dependency injection in your classes to obtain an instance of the TelemetryClient:


                         
    private TelemetryClient _telemetryClient;

    public ImageService(TelemetryClient telemetryClient)
    {         
        _telemetryClient = telemetryClient;
    }
    

Then add a call to the TrackException method in your error handler. In the code below we are calling the Application Insights TrackException method. One of the benefits of this method is that it allows us to pass in dictionary objects which might contain any additional information that might be useful to us analysing our error. The real beauty of the Snapshot Debugger is that it will also capture the value of local variables in our method. We have created two dummy variables - x and y - and populated them with values. Our aim is that we should see these values being picked up in our snapshot. We are also deliberately generating a divide-by-zero error which we hope to be able to track later on.

                 
    int x = 0; 
    int y = 1; 
    try 
    { 
        int z = x / y; 
    } 
    catch ( Exception ex ) { 
        var properties = new Dictionary<string, string> 
        { 
            ["Application"] = "App1" 
        }; 
 
        var measurements = new Dictionary<string, double> 
        { 
            ["Level"] = 1 
        }; 
 
        telemetry.TrackException(ex, properties, measurements); 
    }  
    

Unfortunately, at the moment the Debugger is only supported on Applications running on a Windows host. Developers have made requests for the functionality to be extended to Linux hosts but the official response on this ticket suggests that Linux support will not be added any time soon. From a technological point of view this should not be a deal-breaker if you are developing platform independent Web Apps. From a cost point of view though, running on a Windows Service Plan can be significantly more expensive than the Linux equivalent, this differential is especially pronounced at the Basic Tier. Snapshot Debugging is a great feature but a lot of customers may baulk at the prospect of shelling out $55 per month for a Windows plan vs $12 per month for a Linux one.

To view our snapshot we can start by clicking on the Failures option in the Investigate section of the sidebar:

If we click on the right hand column we can select an exception to inspect:

We have added code to our method to generate a DivideByZero exception and we can see these errors listed here. If we click on one of the exceptions we see we have a link to open a Debug Snapshot. When we open the snapshot we can see the call stack as well as a Locals pane, where we can see the values of all the local variables at the time the exception occurred. This is a fantastic aide for debugging and diagnostics.

Logging

In our sample application we are using the ILogger interface to record custom log entries. The good news is that Application Insights captures these automatically. When you come to viewing the logs though, there are two different routes you can take - and this can be the cause of some confusion and ambiguity. The first route is to use the Logs option in the Monitoring section of our Application Insights side menu:

This opens up a window where we can run queries against a number of Application Insights tables.If we want to see our log entries, we need to query the traces table. Here, we are just running the most simple query possible - one which retrieves all columns from the traces table for a specified time range - in this case 24 hours;

Alternatively, we can query tables in the Log Analytics Workspace that we attached our Application Insights instance to when we created it. To do this, we can select the workspace from our Application Insights overview page:

If we navigate to the Workspace and then click on Logs and expand the Log Management folder we will see a listing of tables in our Log Analystic Workspace. This workspace can be the target data store for multiple different ingestion sources. Therefore the topology and naming conventions are slightly different to those for the dedicated Application Insights Logs instance.

On expanding the Logs Management Folder, we will see the following tables:

The AppTraces table is analagous to the traces table we queried earlier in our Application Insights instance. One fundamental difference though is that this table can be a repository for logs from any number of different Application Insights instances - this is an aggregation. Therefore there are additional metadata fields in the table. Equally, if we wish to view data just for a specific application we will need to apply a filter:

In the above case we are filtering on the AppRoleName field to retrieve log entries for a specific application.

Tables Architecture

You may assume that the App Insights traces table is simply a view pointing to the Log Analytics AppTraces table (or vice versa). In fact these are two separate physical data sources, so the logging data is actually being duplicated. The Application Insights tables are now deprecated and will be retired as from January 2024. You can find out more about this in this article. See Appendix 1 of this document for a mapping of legacy tables to the new Workspace tables

Viewing Metrics in Log Analytics

We have seen that in our App Insights portal we could view the following metrics in the Live Metrics pane:

  • Request Rate (per second)
  • Request Duration
  • Failure Rate
  • Dependency Call Rate
  • Dependency Call Duration
  • Dependency Call Failure Rate
  • Committed Memory
  • CPU Total
  • Exception Rate

You might think that, since our Log Analytics workspace captures logs from multiple Application Insights instances, we could use it as a portal for viewing these metrics across multiple applications simultaneously. Unfortunately, Log Analytics does not have a direct equivalent of the Application Insights Live Metrics view.

If we click on the Metrics option in the Monitoring section of the side menu we can only select metrics for a single source:

The 'Upvote' option in the above screen shot would seem to suggest that Microsoft are aware that users would like the option to multi-select on this screen. Once we select our App Service, the Metric Namespace is assigned as "App Service Standard Metrics" and the following metrics are made available from the drop down list.

The full set of metrics available for Web Apps and App Service Plans is listed in Appendix 2

By using the 'Add metric' option

we can combine multiple metrics in the same chart

There are not really any useful charts or queries for viewing metrics for multiple applications simultaneously. We can run queries against the AppPerformanceCounters table but, out of the box, this only captures a small set of metrics:

  • % Processor Time
  • Private Bytes
  • % Processor Time Normalised
  • Available Bytes
  • IO Data Bytes/Sec

We can see this by running a simple query on the table itself:

A query such as the following:


    AppPerformanceCounters
    | where Name== "Available Bytes"
    | summarize avg(Value) by AppRoleName

will render a chart spanning multiple applications:

However, this leaves us having to manually craft a number of queries and attach the resulting charts to custom Dashboards.

Adding further performance counters
It is possible to capture metrics for additional performance counters. The bad news is that you can only do this

"if your application is running under IIS on an on-premises host or is a virtual machine to which you have administrative access.""

Obviously, this does not have much mileage in a cloud environment (unless you are running web applications on Windows VM's)

What about System.Diagnostics.Metrics?
You may have come across the System.Diagnostics.Metrics namespace, which is part of the .Net specification. Could we somehow use this to surface additional resource and performance metrics? Unfortunately, this will not help you with retrieving physical resource metrics such as disk i/o, memory usage etc. Instead, this library provides constructs for creating, collecting and forwarding custom metrics to other observability tools. The documentation envisages that you would use the library for defining business related metrics - e.g. "number of widgets sold per minute". The classes in the library help to ensure that your counters and other constructs are compatible with metrics scraping tools such as Prometheus.

Application Insights SDK
In this investigation we have limited ourselves to using the Auto Instrumentation capabilities of Application Insights. The Application Insights SDK is a sophisticated library offering a wealth of programmable features. This is a topic in its own right and we will take a deep dive into it in a separate article

Data Retention

As we have noted earlier, Application Insights telemetry is sent to a Log Analytics workspace - and this means that you will incur storage costs. Over time, as you bring more services on stream and as logs accumulate these can put quite a dent in your budget. Apart from burning a hole in your pocket, storage costs can also be difficult to pin down.

You therefore need to be proactive and stay on top of data sources, ingestion volumes and retention periods. As your application and monitoring landscape becomes more populous and heterogeneous, this can present a formidable challenge.

Fortunately, Log Analytics is pretty strong on cost and retention transparency. The Tables view (available from the Settings/Table menu option) provides a listing of workspace tables as well as their configured retention periods:

You can use the Manage Table option to adjust the retention settings:

You can see here that the default setting for interactive retention for the AppMetrics table is 90 days. Depending on your needs you can reduce this to thirty days or extend it up to two years. As metrics are ephemeral you might want to re-set this to 30 days to trim your storage costs. Unfortunately, this is the minimum - you have no choice but to store the data for at least 30 days. Depending on your regulatory needs you can set an archive period of up to seven years. Interactive retention means that you can retrieve the data by running interactive (Kusto) queries on the table.

Archival

Archival means that the data remains in the table - which may be necessary for regulatory purposes - but cannot be queried. If you need access to the data once it has been archived you have two options:

  • run a Search job
  • run a Restore Logs job

Search jobs are asynchronous queries that fetch records into a new search table within your workspace for further analytics. The search job uses parallel processing and can run for hours across large datasets.

The restore operation makes a specific time range of data in a table available in the hot cache for high-performance queries. You can also use the restore operation to run powerful queries within a specific time range on any Analytics table. The main use case for this would be when the standard Kusto queries you run on the source table can't complete within the 10 minute timeout period.

When you restore data, you specify the source table that contains the data you want to query and the name of the new destination table to be created.

The restore operation creates the restore table and allocates additional compute resources for querying the restored data using high-performance queries that support full KQL.

The destination table provides a view of the underlying source data, but doesn't affect it in any way. The table has no retention setting, and you must explicitly dismiss the restored data when you no longer need it.

Ingestion and Costs

The Monitoring/Insights pages provide both an overview as well as a detailed breakdown of ingestion sources and volumes. This can be a good starting point for keeping tabs on your log storage costs. You can see breakdowns both by table as well as by resource:

There are also some useful ready-to-run queries to help you slice and dice your ingestion stats:

The Settings/Usage and estimated costs section does what it says on the tin. It will show you a table of your monthly costs for your current tier and enable you to compare that with other pricing tiers:

The screenshot above shows a very minimal monthly cost for our test subscription. If you are ingesting large volumes, however, costs can really soar. The 100GB per day tier will see you forking out the best part of $100k per year - or more depending on your usage.

Luckily, Azure has tools for creating budgets and alerts so you can control costs and avoid nasty surprises. You can also set daily ingestion caps:

and set your retention policy at a global level:

Running a global instance

So far, we have looked at the typical pattern for running application insights - where each Web App has its own Application Insights instance. This is the most commonly used pattern. The downside is that, whilst we get rich insights and an intuitive UI, we don't have a single place where we can get an overview of either all our applications or a set of applications within a certain domain. Even though the one-to-one mapping is the dominant approach, this does not mean that we cannot view multiple different apps within the same instance. As we have seen earlier, a Web App is linked to an Application Insights resource via an Instrumentation Key. There is no physical or technical impediment to assigning the same instrumentation key to multiple applications so that they all send their telemetry to the same endpoint. The question is, how will the Application Insights make sense of this? Will it be able to differentiate between different sources in the UI when we look at metrics and graphs. Let us see!

Set Up

We will start off by creating a standalone Application Insights instance. Next, we will remove all of the Ai configuration from our existing applications and then connect each of them to our new global instance.We will also create a number of new apps to try to create a more realistic simulation of a microservice system. We will add the following apps:

  • oc-portal
  • oc-pricing
  • oc-sales
  • oc-products
  • oc-customer
  • oc-orders
  • oc-logistics
so we will have a total of 10 applications

Viewing metrics

We will start by clicking on the Logs option of the Monitoring sidebar section:

So lets imagine we are interested in viewing average response times for our Web Apps. We could run a very simple query like this:

This is not very useful - it has just averaged out response times for all of the applications sending telemetry to this instance. Is there a telemetry attribute we can use for grouping our data by application? Well, it turns oiut that there is - we can use the cloud_Role_Name attribute for grouping - and suddenly our chart becomes much more informative:

This opens up all sorts of possibilities because it shows that the query engine is not tied to one-to-one mappings and will alow us to group our data flexibly, so that we can run reports and diagnostics across multiple app services within the context of a single Application Insights workspace.

We can see the same principle in action below, where once again we are running a query and using the Apply Splitting option to group reults by cloud_RoleName - which essesntially resolves to the name that you have given to an App Service:

In this instance we are looking at CPU usage across a number of different apps. Whilst this is helpful in showing us the most cpu-hungry applications, it also has its limitations. Unfortunately it is not possibly to click on a line in the graph and drill down deeper into more metrics for the particular service. For most of us, a principal concern will be to gain a overview of errors in our procudtion systems and then to be able to drill down into those errors. So, how does a global Application Insights instance deliver on this score? Well, let's start by clicking on the Failures menu option in the Investigate section of the menu sidebar. The default view shows an overview across all of our applications:

Where things get interesting is when we look at Roles view. As we have said earlier, the cloud role refers to the logical application or service. Each cloud role maps to one or more instances - i.e. the physical server(s) on which the app is running. For a Web App, the instance is ephemeral and beyond our control. The instance name may be more valuable if you host your app services on Azure VM's or on your own infrastructure. The cloud role name will default to the name you have given to your Web App. If you are using the Application Insights SDK you can override this using a TelemetryInitializer:

                
    public class CloudRoleNameTelemetryInitializer : ITelemetryInitializer
    {
        public void Initialize(ITelemetry telemetry)
        {
            // set custom role name here
            telemetry.Context.Cloud.RoleName = "Your Role Name";
        }
    }
                    
                

The Roles view gives us a breakdown of Dependency Failures, Failed Requests and Total Exceptions for each application:

We can then expand the Cloud Role to view figures across each instance - but there is little value in this if you are running an app service

The same reservation would apply to the Metric deviations section on the upper right hand side of the page.

The figures here represent the average and standard deviation across the instances of individual web apps. They do not represent averages across Web Apps.Again, this is really only of value if you have access to the server instance. Probably the main shortcoming of this view is that it only gives us totals. We can't drill down from those to see the detail of the actual exceptions or request failures. To understand the problem, let us look at our 'app-ocportal-uks' application. We can see that there are 35 failed requests and 30 exceptions.Clicking on that row, however, just expands the individual instances. If I click on the link for one of those instances I just get taken to the overview page for the Web App. If I click on the Application Insights link for that Web App I get a link to go to our global instance. So we end up going round a circle. We can't drill down directly from this view. If I wanted to view errors for a specific application, I would need to go back to the Exceptions view and filter accordingly:

This kind of works but it is not the most elegant solution in terms of user experience. Until a time comes when we can create our own custom pages in Azure and pass in variables via our Http request, it does not seem possible for us to create our own custom drill-down functionality.

Conclusion

The Application Insights/Azure Monitor offering is strong on ease of implementation and supports the three observability pillars of tracing, metrics and logging (although the scope of metrics is limited when querying across applications in a Log Analytics workspace). Its major strength lies in being very developer-friendly for a specific range of needs which are supported in the Application Insights portal. That is, for a team developing a particular service it is very easy to obtain visibility on performance, exceptions and usage without requiring production access or a high level of knowledge of observability technologies.

That is the good news. The downside though, is that this model does not lend itself to a holistic view across your applications. There is not a single pane of glass and the range of metrics and performance counters shrinks dramatically when you need to scale your analysis beyond the scope of a single application.

At times, using the App Insights/Azure Monitor combination, does not really confer the feeling of using a coherent product stack. Often, the experience is one of using a disparate set of features, tools and agents. Furthermore, coverage can be something of a guessing game. There are myriad permutations depending on runtime, platform, version, tier etc etc. Are you using Java? Are you using a paid tier? Are you using the SDK? Are you using AutoInstrumentation? Are using .Net Core? Which version are you using? The feature matrix can get overwhelming and hard to navigate. Sometimes, the Azure Monitor landscape sometimes seems to work as a reminder that less is more.

Aside from the technical quibbles there are also questions of branding and identity which perhaps inhibit customer engagement with the stack. Systems such as DataDog or Dynatrace seem to have a very much more tangible and visible public persona. They have online communities and an active social media presence. Azure Monitor, by comparison, feels somewhat anonymous. You don't really get a sense of a rapport with a tangible entity. In fact Microsoft are somewhat under-selling themselves here because if you do contact them via the support email address you will often get a fast and helpful response straight from the relevant development team.

The identity issue is further compounded by the subtle tensions between being a platform vendor - and therefore being notionally product agnostic, whilst also being a product developer. Microsoft provides the Azure Monitor product, but it also embraces Managed Prometheus and Managed Grafana on the platform. This is great in terms of flexibility as customers can mix and match but it can result in uncertainty - or outright confusion - as well as diluting the Azure Monitor brand identity. You get the feeling that this is a product still in the process of defining itself and working out its place in the world.


Appendix 1

Legacy Table Mapping

Legacy table name New table name Description
availabilityResults AppAvailabilityResults Summary data from availability tests.
browserTimings AppBrowserTimings Data about client performance, such as the time taken to process the incoming data.
dependencies AppDependencies Calls from the application to other components (including external components) recorded via TrackDependency(). Examples are calls to the REST API or database or a file system.
customEvents AppEvents Custom events created by your application.
customMetrics AppMetrics Custom metrics created by your application.
pageViews AppPageViews Data about each website view with browser information.
performanceCounters AppPerformanceCounters Performance measurements from the compute resources that support the application. An example is Windows performance counters.
requests AppRequests Requests received by your application. For example, a separate request record is logged for each HTTP request that your web app receives.
exceptions AppExceptions Exceptions thrown by the application runtime. Captures both server side and client-side (browsers) exceptions.
traces AppTraces Detailed logs (traces) emitted through application code/logging frameworks recorded via TrackTrace().


Appendix 2

App Service Metrics

Metric Description
Response Time The time taken for the app to serve requests, in seconds.
Average Response Time (deprecated) The average time taken for the app to serve requests, in seconds.
Average memory working set The average amount of memory used by the app, in megabytes (MiB).
Connections The number of bound sockets existing in the sandbox (w3wp.exe and its child processes). A bound socket is created by calling bind()/connect() APIs and remains until said socket is closed with CloseHandle()/closesocket().
CPU Time The amount of CPU consumed by the app, in seconds. For more information about this metric, see CPU time vs CPU percentage.
Current Assemblies The current number of Assemblies loaded across all AppDomains in this application.
Data In The amount of incoming bandwidth consumed by the app, in MiB.
Data Out The amount of outgoing bandwidth consumed by the app, in MiB.
File System Usage The amount of usage in bytes by storage share.
Gen 0 Garbage Collections The number of times the generation 0 objects are garbage collected since the start of the app process. Higher generation GCs include all lower generation GCs.
Gen 1 Garbage Collections The number of times the generation 1 objects are garbage collected since the start of the app process. Higher generation GCs include all lower generation GCs.
Gen 2 Garbage Collections The number of times the generation 2 objects are garbage collected since the start of the app process.
Handle Count The total number of handles currently open by the app process.
Health Check Status The average health status across the application's instances in the App Service Plan.
Http 2xx The count of requests resulting in an HTTP status code ≥ 200 but < 300.
Http 3xx The count of requests resulting in an HTTP status code ≥ 300 but < 400.
Http 401 The count of requests resulting in HTTP 401 status code.
Http 403 The count of requests resulting in HTTP 403 status code.
Http 404 The count of requests resulting in HTTP 404 status code.
Http 406 The count of requests resulting in HTTP 406 status code.
Http 4xx The count of requests resulting in an HTTP status code ≥ 400 but < 500.
Http Server Errors The count of requests resulting in an HTTP status code ≥ 500 but < 600.
IO Other Bytes Per Second The rate at which the app process is issuing bytes to I/O operations that don't involve data, such as control operations.
IO Other Operations Per Second The rate at which the app process is issuing I/O operations that aren't read or write operations.
IO Read Bytes Per Second The rate at which the app process is reading bytes from I/O operations.
IO Read Operations Per Second The rate at which the app process is issuing read I/O operations.
IO Write Bytes Per Second The rate at which the app process is writing bytes to I/O operations.
IO Write Operations Per Second The rate at which the app process is issuing write I/O operations.
Memory working set The current amount of memory used by the app, in MiB.
Private Bytes Private Bytes is the current size, in bytes, of memory that the app process has allocated that can't be shared with other processes.
Requests The total number of requests regardless of their resulting HTTP status code.
Requests In Application Queue The number of requests in the application request queue.
Thread Count The number of threads currently active in the app process.
Total App Domains The current number of AppDomains loaded in this application.
Total App Domains Unloaded The total number of AppDomains unloaded since the start of the application.

Note

App Service plan metrics are available only for plans in Basic, Standard, and Premium tiers.

Metric Description
CPU Percentage The average CPU used across all instances of the plan.
Memory Percentage The average memory used across all instances of the plan.
Data In The average incoming bandwidth used across all instances of the plan.
Data Out The average outgoing bandwidth used across all instances of the plan.
Disk Queue Length The average number of both read and write requests that were queued on storage. A high disk queue length is an indication of an app that might be slowing down because of excessive disk I/O.
Http Queue Length The average number of HTTP requests that had to sit on the queue before being fulfilled. A high or increasing HTTP Queue length is a symptom of a plan under heavy load.

Source: https://learn.microsoft.com/en-us/azure/app-service/web-sites-monitor


Like this article?

If you enjoyed reading this article, why not sign up for the fortnightly Observability 360 newsletter. A wholly independent newsletter dedicated exclusively to observability and read by professionals at many of the world's leading companies.'

Get coverage of observability news, products events and more straight to your inbox in a beautifully crafted and carefully curated email.