Balancing SQL Server Min and Max Memory on an Active\Active Cluster

Consider a 2-node cluster with 7 instances. Each node has 8CPU (logical) and 64GB of RAM. It’s a (so-called) active\active configuration.  Two of the instances are especially high-resource-use and are segregated to their preferred node. The other instances are relatively quiet, using minimal CPU and RAM, and are split-up 2 and 3 to their preferred nodes.

This is what I’m working with recently and I decided to write up a quick post on apportioning memory to these clustered instances.

To start, when I came on the scene, min and max memory for each instance was configured as if all the instances shared a single server (node).

Pref. Node Instance  Min Mem Max Mem
A SQL1  0 16384
B SQL2  0 12288
A SQL3  0 4096
A SQL4  0 4096
B SQL5  0 4096
B SQL6  0 2048
B SQL7  0 2048

Total RAM dedicated to SQL instances cluster-wide here is 45GB leaving plenty of head-room even when all instances are running on one node.

That is ok and it certainly ensures if (in the case of an unplanned outage) or when (in the case of a planned outage) all instances fail over to a single node, there will be enough RAM to go around for each instance. But, in the day-to-day preferred configuration, with one high-use instance on each node, that leaves many GB of RAM unused (and un-usable per the max memory setting).

So I want to re-apportion memory among the instances for the typical, day-to-day operating configuration and give more of that unused RAM to our two high-use instances.

I start by understanding there will be times when all instances will run on a single node. When that occurs, how much of the total RAM should SQL use? In this case I’d like to leave ~14 GB to the OS and other, non-sql engine processes. So that leaves me with 50GB for the SQL servers and a question of how to split it up among the instances.

This is where a multi-server query comes in handy. If you create a local server group in SSMS (or use a central server) to include all the clustered instances and run a query against them, you can begin to get a sense of which instances will need more memory and which will need less. Here is a query I use as a mulit-server query to capture a snapshot of each instance’s minimum and maximum memory configuration, total and target memory usage, and PLE.

;with cte as (
select @@SERVERNAME as 'Instancename',
max(case when name = 'min server memory (MB)' then value_in_use end) as 'MinServerMemoryMB',
max(case when name = 'max server memory (MB)' then value_in_use end) as 'MaxServerMemoryMB'
from sys.configurations)
select p.Instancename,
os.PhysicalCPUCount,
os.PhysicalMemoryMB,
c.MinServerMemoryMB,
c.MaxServerMemoryMB,
p.TargetServerMemoryKB/1024 as 'TargetServerMemoryMB',
p.TotalServerMemoryKB/1024 as 'TotalServerMemoryMB',
p.PLE ,
(p.TotalServerMemoryKB)/p.PLE as 'ChurnKB/sec'
from(
select @@SERVERNAME as 'Instancename',
max(case when counter_name = 'Target Server Memory (KB)' then cntr_value end) as 'TargetServerMemoryKB',
max(case when counter_name = 'Total Server Memory (KB)' then cntr_value end) as 'TotalServerMemoryKB',
max(case when counter_name = 'Page life expectancy' then cntr_value end) as 'PLE'
from sys.dm_os_performance_counters)
as p
join cte c on p.instancename = c.instancename
join
(SELECT @@SERVERNAME as 'Instancename',
cpu_count AS 'LogicalCPUCount',
hyperthread_ratio AS 'HyperthreadRatio',
cpu_count/hyperthread_ratio AS 'PhysicalCPUCount',
physical_memory_in_bytes/1048576 AS 'PhysicalMemoryMB'
FROM sys.dm_os_sys_info ) as os
on c.instancename=os.instancename

By analyzing total vs target and PLE counters I can start to generalize where the majority of the RAM should go and how to divvy it up. Once I get that figured out I can make the changes to the minimum memory configuration. Minimum because, remember, this is solving for the situation where all instances have failed over to one node. If we set minimum for each instance that leaves some RAM for the OS and etc. but also enforces the instances share the 50GB,

Turning now to the maximum memory configuration and considering day-to-day active\active usage, I can now give my high-use instances a big chunk of RAM since, once they are on their preferred node, they don’t have to share memory. Here is how it looks:

Pref. Node Instance  Min Mem Max Mem
A SQL1 20480 41984
B SQL2 22528 44032
A SQL3 2048 4096
A SQL4 2048 4096
B SQL5 1024 2048
B SQL6 1024 2048
B SQL7 1024 2048

if you add it up, total minimum memory is 50GB (50176MB). This is what each instance gets – guaranteed (some exceptions apply 😉 ) – when everything is running on one node. As for max memory, note that memory total for each node adds up to 50GB. So, in active\active mode the SQL servers are able to utilize close to the available RAM (50 of the 64GB) and thereby improve performance.

This all works out fine for any variation of a fail-over –  either the whole cluster, or just for specific sql services moving around. Of course, the only ideal situation is when each instance is on it’s preferred node since that is what we just configured for… Any other arrangement should be considered temporary.

If you have more than one NUMA node you may need to tweak the query above. Read-up on SQL Skills take on PLE per NUMA nodes here  (which also links to the source of the idea behind the ‘churn’ metric in the query). One last thing, changing max and min memory settings clears the procedure cache. Be careful out there.

Leave a Reply

Your email address will not be published.