Skip to main content

How to increase network resiliency?

Network design is not fixed process. Every time when we add or change something in the network, we should analyze if the network is still resilient, as it was in the original design. Let's analyze below scenario:

Firewall - Fortigate 5.x
Core switch - Nexus 5k NX-OS 7.X 
Routing between core and firewalls - static




With direct connection between FW01-Core01 and FW02-Core02 we can detect link failure easily. Firewalls here are in HA Active-Passive mode, what means the secondary box doesn't process any traffic. In case of Port1, Port2 or device failure - the secondary takes its role and sends ARP updates to the core switch. The same situation when Core01 or Core02 fails, FW01/02 can notice it and triggers failover.

Let's imagine your are tasked to put IDS between core switches and perimeter firewalls, like on the diagram below:




What is wrong with this scenario? Let's think if following failure scenarios are backed up:

1) FW01/Port1/Port2 failure - with port failure FW01 triggers failover, with device failure FW02 detects lack of heartbeats, triggers failover and updates MAC table on the core switch. In case of FW01 malfunction, FW02 will not see heartbeats, so we are covered too.




Status: PASS

2) IDS01 failure or external interface. FW01 can detect such incident and triggers failover.





Status: PASS

3) IDS01 can't process traffic (device malfunction) but its physical interfaces are up. In such case there is nothing what could trigger failover. Traffic will be dropped between FW01-IDS01 (ingress) and Core01-IDS01 (egress):



Status: FAIL

4) Core01 has physical interfaces failure. There is no mechanism in place to trigger FW01 failover. Traffic will be dropped by Core01:



Status:FAIL

5) Core01 can't process traffic (device malfunction). There is no mechanism in place to trigger FW01 failover. Traffic will be dropped by Core01:  



Status:FAIL



In above 5 failure scenarios, 3 of them are not resilient. The problem is the devices are not directly connected and they can't detect link or device failure (scenarios 3,4 and 5).

One of the method to fix the problem could be following change. We have to implement SLA monitor feature on core switches (available on Nexus 5k from version 7) to monitor interface on firewall (Port2) and use it with a static route. On the firewall we can use Link Health Monitor to track state of remote interface (on the core switch).

http://help.fortinet.com/fgt/handbook/cli52_html/index.html#page/FortiOS%25205.2%2520CLI/config_system.23.040.html

https://www.cisco.com/c/en/us/td/docs/switches/datacenter/nexus5500/sw/unicast/7_x/cisco_n5500_layer3_ucast_cfg_rel_6x/l3_object.html#92401


With above improvement there is no possibility to trigger failover based on status from these above features (object tracking on Nexus and Link Health Monitor on Fortigate). They can help you to remove static route only and use alternate path. It means we need backup links:




There are four possible paths:

a) FW01 (port2) -> IDS01 -> Core01   -> preferred
b) FW01 (port3) -> IDS02 -> Core02
c) FW02 (port3) -> IDS01 -> Core01
d) FW02 (port2) -> IDS02 -> Core02

Let's analyze last 3 scenario which weren't resilient in the previous design.


1) (former #3) - IDS01 can't process traffic (device malfunction) but its physical interfaces are up.

This is what happens:

- FW01 is still active as it isn't able to detect IDS malfunction
- Static route from FW01 via preferred path is removed as link monitor on Fortigate can't reach Vlan15 on Core01. Next available path from FW01 is via Port3 to IDS02 and then to Vlan25 on Core02
- Core01 detects lack of connectivity to Port2 on Active firewall (FW01) and only one available path is via Core02 and then IDS02 to Port3 on FW01. FW02 is in standby mode that's why path via Vlan25 on Core01 and via Vlan15 on Core02 are not available



2) (former #4) - Core01 has physical interfaces failure.

 This is what happens:

- Core01 can't reach Port2 (via Vlan15) on FW01 and that static route is removed

- Core01 has only one available path via Core02 then IDS02 and FW01 on Port3
- FW01 detects problem with reaching Vlan15 via Port2 and next preferred path is via Port3 to IDS02




3) (former #5) - Core01 can't process traffic (device malfunction).

  This is what happens:

- FW01 detects problem with reaching Vlan15 via Port2 and next preferred path is via Port3 to IDS02
- Core01 is not available and only one possible path is via Core02


  
I think I went through all possible failure scenarios. If not, please let me know. You may think that all this job could be done by dynamic routing protocols. Fortigate and Cisco Nexus support most of them. The problem is many organizations don't accept dynamic routing on firewalls.

Comments

Popular posts from this blog

What should you know about HA 'override enabled' setting on Fortigate?

High availability is mandatory in most of today's network designs. Only very small companies or branches can run their business without redundancy. When you have Fortigate firewall in your network you have many options to increase network availability. You can use Fortigate Clustering Protocol ( FGCP ) or Virtual Router Redundancy Protocol ( VRRP ). FGCP has two modes: 'override' disabled (default) and 'override' enabled . I'm not going to explain how to set up HA as you can find many resources on Fortinet websites: https://cookbook.fortinet.com/high-availability-two-fortigates-56/ https://cookbook.fortinet.com/high-availability-with-fgcp-56/ Let's recap what is the main difference between them. The default HA setting is 'override' disabled and this is an order of selection an active unit: 1) number of monitored interfaces - when both units have the same number of working (up) interfaces check next parameter 2) HA uptime - an

MAC Authentication Bypass

One of the method to control your network is using MAB feature. It is helpful in case you have devices without dot1x functionality. Today I will try to implement basic configuration and analyze log messages. There is only one switch SW1 and one device attached to port Fa1/0/2.   ! aaa new - model aaa authentication dot1x default group radius ! ! int Fas1 / 0 / 2 authentication host - mode single - host authentication port - control auto mab ! I haven’t configured ACS yet but let’s see what error message I receive:   SW1 ( config - if ) # mab - ev ( Fa1 / 0 / 2 ): Received MAB context create from AuthMgr mab - ev ( Fa1 / 0 / 2 ): Created MAB client context 0x1100000F mab : initial state mab_initialize has enter mab - ev ( Fa1 / 0 / 2 ): Sending create new context event to EAP from MAB for 0x1100000F ( 0000.0000 . 0000 ) mab - sm ( Fa1 / 0 / 2 ): Received event 'MAB_START' on handle 0x1100000F mab : during state mab_initia

Inpection of asymmetric sessions on FortiGate

There is one feature available on FortiGate, and I think you should know it, as it modifies a bit what we know about stateful firewalls. In past every packet was treated individually and you had to create policies in both directions. With stateful firewalls we can track connections, and by checking couple of attributes, we can treat them as part of the same session. For example when you initiate connection from a host1 to host2, the returning connection from host2 to host1 will be treated as part of the same connection (session). They have to have the same source/destination and destination/source IPs, port numbers and interfaces.There is an exception from this rule and FortiGate in some specific cases can accept connections on port which was not used in the initial connection. Let me explain how it works on the below example:      The host1 has a default gateway on R1 (10.0.1.2), but you may notice that it is not the optimal path to host2 subnet. When we analyze the packet flo