Service Discovery And Health Checks In ASP.NET Core With Consul

In this post we'll take a quick look at what service discovery is, play with Consul and implement a basic service infrastructure in C# using the ASP.NET Core MVC framework and use DnsClient.NET to implement DNS based client side service discovery.

All the source code of this post is available as a working demo project here on GitHub.

Service Discovery

In a modern micro service architecture, services may run in containers and can be started, stopped and scaled dynamically. This results in a very dynamic hosting landscape with maybe hundreds of actually endpoints, impossible to manually configure or find the right one.

That being said, I believe that service Discovery is not only for granular micro services living in containers. It can be used by any kind of application which has to access other resources. Resources can be databases, other web services or just parts of a website which are hosted somewhere else. Service discovery helps to get rid of environment specific configuration files!

Service discovery can be used to solve this problem, but as usual, there are many different ways to implement it

  • Client Side Service Discovery

    One solution is to have a central service registry where all service instances are registered. Clients would have to implement logic to query for a service they need, eventually validate if the endpoints are still alive and maybe distribute requests to multiple endpoints.

  • Server Side / Load balancing

    All traffic goes through a load balancer which knows all the actual, dynamically changing endpoints and redirects all requests accordingly

Consul is a service registry and can be used to implement client side service discovery.

Apart from many great features and advantages using this approach, it has the disadvantage that each client application needs to implement some logic to use this central registry. This logic might get very specific as Consul and any other technology has custom APIs and logic how it works.

Load balancing might also not be done automatically. The client can query for all available/registered endpoints of a service and then decide which one to pick.

The good thing is Consul not only comes with a REST API to query the service registry. It also provides a DNS endpoint which returns standard SRV and TXT records.

The DNS endpoint does care about service health as it will not return unhealthy service instances. And it also does load balancing by returning the records in alternating order! In addition, it might give services a higher priority which are closer to the client.

Now, let's get started...

Consul Setup

Consul is a software developed by HashiCorp which not only does service discovery (as mentioned above), but also "Health Checking" and it provides a distributed "Key Value Store".

Consul is meant to be run in a cluster with at least three instances dealing with coordination of the cluster and "agents" on each node in the hosting environment. Applications always communicate with the local agent only, which makes the communication really fast and negates network latency to a minimum.

For local development though, you can run Consul in --dev mode instead of setting up a full cluster. But just keep that in mind, for production use, there will be some work needed to set Consul up correctly.

Download and Run Consul

The official documentation has a lot of examples and explains very well how to setup Consul in general. I will not go into too much detail and we'll just run it as local development agent.

To get started, download Consul.

Run Consul with agent --dev arguments. This will boot Consul in local service mode without the need of a configuration file and will be accessible on localhost only.

Go to http://localhost:8500, this should open the Consul UI.

Consul UI

Register the First Service

Consul offers different ways to add to or modify the service registry. One option is to throw JSON configuration files into Consul's config directory. The following would register a Redis endpoint for example:

{
	"service":{
		"name": "redis",
		"tags":[],
		"port": 6379
	}
}

The other, more interesting option is via REST API. Fortunately, there are already client libraries in many languages available for this REST API and we'll use https://github.com/PlayFab/consuldotnet, which works with .NET Core, too.

To register a new service via code, create a new ConsulClient instance and register a new service registration

var client = new ConsulClient(); // uses default host:port which is localhost:8500

var agentReg = new AgentServiceRegistration()
{
    Address = "127.0.0.1",
    ID = "uniqueid",
    Name = "serviceName",
    Port = 5200
};

await client.Agent.ServiceRegister(agentReg);

It is important to note that this registration will in theory live in the Consul cluster forever, even if the service is not running anymore. That's why we should at least de-register it the moment the service stops.

await client.Agent.ServiceDeregister("uniqueid");

In case the service crashes though, it might not always be possible to manually deregister the service. That's where another feature of Consul come into play: health checks.

Health Checks

Health checks in Consul can be used to monitor the state of all services within a cluster, but also to automatically remove unhealthy service endpoint registrations from the Consul registry. Consul can be configured to periodically run as many health checks per registered service as you want.

The most basic health check instructs Consul to just try to connect to the service via TCP:

var tcpCheck = new AgentServiceCheck()
{
    DeregisterCriticalServiceAfter = TimeSpan.FromMinutes(1),
    Interval = TimeSpan.FromSeconds(30),
    TCP = $"127.0.0.1:{port}"
};

Consul can also check HTTP endpoints. In this case, a service is healthy as long as the endpoint returns the HTTP status code 200.

A very simple health check controller could be implemented like this:

[Route("[Controller]")]
public class HealthCheckController : Controller
{
    [HttpGet("")]
    [HttpHead("")]
    public IActionResult Ping()
    {
        return Ok();
    }
}

In the registration, we now have to point Consul to that endpoint by specifying the Http property of the AgentServiceCheck instead of the Tcp property:

var httpCheck = new AgentServiceCheck()
{
    DeregisterCriticalServiceAfter = TimeSpan.FromMinutes(1),
    Interval = TimeSpan.FromSeconds(30),
    HTTP = $"http://127.0.0.1:{port}/HealthCheck"
};

Updating the registration to include the checks should be enough to have Consul run both health checks every 30 seconds. Note that I also configured the check to automatically deregister the service instance in case it is marked unhealthy for more than a minute.

var registration = new AgentServiceRegistration()
{
    Checks = new[] { tcpCheck, httpCheck },
    Address = "127.0.0.1",
    ID = id,
    Name = name,
    Port = port
};

await client.Agent.ServiceRegister(registration);

Those basic examples should be enough to get started. But health checks can do much more complex things and Consul supports running small scripts to validate the response.

Endpoint Name, ID and Port

As you might have noticed, to register a service, we have to know the actual endpoint the service is running on, we have to give it a name and an id. But what does all that actually mean?

The ID should be a unique enough string to identify the service instance, whereas the Name should be common for all instances of the same service.

Other clients will use the Name to query the service registry, the ID is used only to refer to the exact instance, e.g. when de-registering the service instance.

But how do we define the name and port and IP address?

It is simple if we host an ASP.NET Core application ourselves with Kestrel because we also configure Kestrel on which port and address to listen on. When hosting a service with IIS (or any other reverse proxy), this approach falls apart because in reverse proxy mode, Kestrel does get a dynamic configuration and the actual hosting information is not available to use in the application code.

To see how it works with hosting it with Kestrel, let's create an empty ASP.NET Core web api project.

Run dotnet new webapi or use the WebAPI template in Visual Studio.

This will create a Program.cs and Startup.cs. Modify the Program.cs to create the host. Instead of host.Run we'll use host.Start, which does not block the thread. After that, we'll register the service and deregister it when the service stops:

var host = new WebHostBuilder()
    .UseKestrel()
    .UseUrls("http://localhost:5200")
    .UseContentRoot(Directory.GetCurrentDirectory())
    .UseStartup<Startup>()
    .Build();

host.Start();

var client = new ConsulClient();

var name = Assembly.GetEntryAssembly().GetName().Name;
var port = 5200;
var id = $"{name}:{port}";

var tcpCheck = new AgentServiceCheck()
{
    DeregisterCriticalServiceAfter = TimeSpan.FromMinutes(1),
    Interval = TimeSpan.FromSeconds(30),
    TCP = $"127.0.0.1:{port}"
};

var httpCheck = new AgentServiceCheck()
{
    DeregisterCriticalServiceAfter = TimeSpan.FromMinutes(1),
    Interval = TimeSpan.FromSeconds(30),
    HTTP = $"http://127.0.0.1:{port}/HealthCheck"
};

var registration = new AgentServiceRegistration()
{
    Checks = new[] { tcpCheck, httpCheck },
    Address = "127.0.0.1",
    ID = id,
    Name = name,
    Port = port
};

client.Agent.ServiceRegister(registration).GetAwaiter().GetResult();

Console.WriteLine("DataService started...");
Console.WriteLine("Press ESC to exit");

while (Console.ReadKey().Key != ConsoleKey.Escape)
{
}

client.Agent.ServiceDeregister(id).GetAwaiter().GetResult();

Running this will register the service in consul:

first service registered

And (if you've added the health check controller) it will successfully run the two health checks:

first service registered

I'm using the assembly name as service name, and I'm hardcoding the port and IP Address. Clearly, this needs to be configurable and the solution to block the console thread is also not really nice.

More Sophisticated Implementation

Knowing the basics and how the registration process works, let's improve the implementation a little bit.

Goals:

  • Have the service name be configurable via appsettings.json
  • The host and port should not be hardcoded
  • Use Microsoft.Extensions.Configuration and Options to configure all we need properly
  • Setup the registration as part of the Startup pipeline

Configuration

I'm defining a few configuration POCOs to map with the following addition to appsettings.json:

{
...
  "ServiceDiscovery": {
    "ServiceName": "DataService",
    "Consul": {
      "HttpEndpoint": "http://127.0.0.1:8500",
      "DnsEndpoint": {
        "Address": "127.0.0.1",
        "Port": 8600
      }
    }
  }
}
public class ServiceDisvoveryOptions
{
    public string ServiceName { get; set; }

    public ConsulOptions Consul { get; set; }
}

public class ConsulOptions
{
    public string HttpEndpoint { get; set; }

    public DnsEndpoint DnsEndpoint { get; set; }
}

public class DnsEndpoint
{
    public string Address { get; set; }

    public int Port { get; set; }

    public IPEndPoint ToIPEndPoint()
    {
        return new IPEndPoint(IPAddress.Parse(Address), Port);
    }
}

Then Configure it during Startup.ConfigureServices:

services.AddOptions();
services.Configure<ServiceDisvoveryOptions>(Configuration.GetSection("ServiceDiscovery"));

Use this configuration to setup the consul client:

services.AddSingleton<IConsulClient>(p => new ConsulClient(cfg =>
{
    var serviceConfiguration = p.GetRequiredService<IOptions<ServiceDisvoveryOptions>>().Value;

    if (!string.IsNullOrEmpty(serviceConfiguration.Consul.HttpEndpoint))
    {
        // if not configured, the client will use the default value "127.0.0.1:8500"
        cfg.Address = new Uri(serviceConfiguration.Consul.HttpEndpoint);
    }
}));

The ConsulClient doesn't necessarily need configuration, if nothing is specified it will fall back to the defaults (localhost:8500).

Dynamic Service Registration

As long as Kestrel is used for hosting the service on a certain port, the app.Properties["server.Features"] can be used to figure out where the service is hosted at. As mentioned above, if IIS integration or any other reverse proxy is used, this solution does not work anymore and the actual endpoint the service is accessible at has to be used to register the service in Consul. There is no way to get to that information during startup though.

In case you want to use IIS integration with service discovery, don't use the following code. Instead, configure the endpoint via configuration, or register the service manually.

Anyways, for Kestrel, we can do the following: Get the URIs kestrel hosts the service on (this is not working with wildcards like UseUrls("*:5000)) and then iterate over the addresses to register all of them in Consul:

public void Configure(
        IApplicationBuilder app,
        IApplicationLifetime appLife,
        ILoggerFactory loggerFactory,
        IOptions<ServiceDisvoveryOptions> serviceOptions,
        IConsulClient consul)
    {
        ...

        var features = app.Properties["server.Features"] as FeatureCollection;
        var addresses = features.Get<IServerAddressesFeature>()
            .Addresses
            .Select(p => new Uri(p));

        foreach (var address in addresses)
        {
            var serviceId = $"{serviceOptions.Value.ServiceName}_{address.Host}:{address.Port}";

            var httpCheck = new AgentServiceCheck()
            {
                DeregisterCriticalServiceAfter = TimeSpan.FromMinutes(1),
                Interval = TimeSpan.FromSeconds(30),
                HTTP = new Uri(address, "HealthCheck").OriginalString
            };

            var registration = new AgentServiceRegistration()
            {
                Checks = new[] { httpCheck },
                Address = address.Host,
                ID = serviceId,
                Name = serviceOptions.Value.ServiceName,
                Port = address.Port
            };

            consul.Agent.ServiceRegister(registration).GetAwaiter().GetResult();

            appLife.ApplicationStopping.Register(() =>
            {
                consul.Agent.ServiceDeregister(serviceId).GetAwaiter().GetResult();
            });
        }

        ...

The serviceId must be unique enough to find the particular instance of the service again later, to de-register it. I'm using the host and port together with the actual service name, which should be good enough.

All goals accomplished I guess. Although, that's quite a bunch of code in the startup. To improve that even further, we could refactor the code and create extension methods.

Querying the Service Registry

The new service is running and registers itself in Consul, now it should be easy to find it via the Consul APIs or DNS.

Query Via Consul Endpoints

Using the consul client, there are two consul services we can use:

  • The catalog endpoint, which provides the raw information about a service. This one would return non-filtered results
var consulResult = await _consul.Catalog.Service(_options.Value.ServiceName);
  • The health endpoint, which can return already filtered results
var healthResult = await _consul.Health.Service(_options.Value.ServiceName, tag: null, passingOnly: true);

Important to note here is, that the list of services returned by those endpoints (if multiple instances are running), will always be in the same order. You'd have to implement logic to not call the same service endpoint all the time and spread the traffic across all endpoints.

Again, that's where we can use DNS. Apart from having build in load balancing, the advantage is also, that we do not have to do another expensive http call and care about eventually caching the results for a little while. With DNS, we get this all with just a few lines of code.

Query Via DNS

Let's check the DNS endpoint with dig to understand what a response looks like:

The syntax of a domain name to ask for SRV records is <servicename>.consul.service, which means we can query for our dataService with dig @127.0.0.1 -p 8600 dataservice.service.consul SRV:

; <<>> DiG 9.11.0-P2 <<>> @127.0.0.1 -p 8600 dataservice.service.consul SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25053
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;dataservice.service.consul.    IN      SRV

;; ANSWER SECTION:
dataservice.service.consul. 0   IN      SRV     1 1 5200 machinename.node.eu-west.consul.

;; ADDITIONAL SECTION:
machinename.node.eu-west.consul. 0 IN      CNAME   localhost.

;; Query time: 0 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Tue Apr 25 21:08:19 DST 2017
;; MSG SIZE  rcvd: 109

We get the port in the SRV record and the corresponding CNAME record contains the hostname or address we used to register our service with.

The Consul DNS endpoint also allows us to query for tags and limit the query to look only at one particular data center. To query for tags, we have to prefix the tag and service name with _: _<tag>._<serviceName>.service.consul. To query for a data center, the root domain changes to: <servicename>.service.<datacenter>.consul.

DNS Load Balancing

The DNS endpoint does loadblancing by returning the results in alternating order. If I start another instance of the service on a different port, we get:

;; QUESTION SECTION:
;dataservice.service.consul.    IN      SRV

;; ANSWER SECTION:
dataservice.service.consul. 0   IN      SRV     1 1 5200 machinename.node.eu-west.consul.
dataservice.service.consul. 0   IN      SRV     1 1 5300 machinename.node.eu-west.consul.

;; ADDITIONAL SECTION:
machinename.node.eu-west.consul. 0 IN      CNAME   localhost.
machinename.node.eu-west.consul. 0 IN      CNAME   localhost.

And if you run the query a few times, you'll see that the answers are returned in different order.

Using DnsClient

To query DNS via C# code, I'll be using my DnsClient library. I added ResolveService extension methods into the library to make such SRV lookups really simple one-liners.

After installing the DnsClient NuGet package, I can simply register a DnsLookup client in DI:

services.AddSingleton<IDnsQuery>(p =>
{
    return new LookupClient(IPAddress.Parse("127.0.0.1"), 8600);
});
private readonly IDnsQuery _dns;
private readonly IOptions<ServiceDisvoveryOptions> _options;

public SomeController(IDnsQuery dns, IOptions<ServiceDisvoveryOptions> options)
{
    _dns = dns ?? throw new ArgumentNullException(nameof(dns));
    _options = options ?? throw new ArgumentNullException(nameof(options));
}

[HttpGet("")]
[HttpHead("")]
public async Task<IActionResult> DoSomething()
{
    var result = await _dns.ResolveServiceAsync("service.consul", _options.Value.ServiceName);
    ...
}

The ResolveServiceAsync of DnsClient.NET does the DNS SRV lookup, matches the CNAME records and returns an object for each entry containing the hostname and port (and address if used).

Now, we can call the service with a simple HttpClient call (or a generated client):

var address = result.First().AddressList.FirstOrDefault();
var port = result.First().Port;

using (var client = new HttpClient())
{
    var serviceResult = await client.GetStringAsync($"http://{address}:{port}/Values");
}

Conclusions

Consul is a great, flexible and stable tool. I like that the API and usage patterns are not opinionated and you have a lot of choice how to use the service registry and other features. And, at the same time, it does just what it should and is really performant, too.

Using it in .NET is actually really easy with the tools we have today and does work really well if you have many different pieces in your application which have to talk to each other!

I put together a full demo project with more cleaned up code here on GitHub. Let me know what you think in the comments!