Monday, June 18, 2012

Maximum TCP Throughput



So let's go through a scenario. I have 2 DCs (in 2 different cities) and have 1 Gig link between them. However, max throughput I get between 2 servers is nowhere near my link capacity. So what could be the reason?

This is how the TCP throughput is calculated.

TCP Window size (in bits) / Latency in seconds = TCP Throughput.

So in above example, if my TCP windows size is standard 64K and assumes the latency between 2 DCs is 20 milliseconds, then the maximum throughput I will get is

524288 / .02 = 26214400 that is 26.21 Mbps

Now as we can see, 2 factors that decide the maximum throughput on a link are
·         TCP window size
·         Latency in seconds
Now, what can we do to maximize the throughput? One such solution is to increase the TCP window size on the servers.  So in my case, what should be the maximum window size on my servers to get the 1Gig throughput? Let's calculate

X / 0.02 = 1Gbps, that is 2500000 bytes or 2441.4 KBs

But then there are issues with increasing the TCP window size like the retransmission of the whole window is needed if a packet is dropped and it also requires deep buffered on servers until the acknowledgement is done from the other side.
For solving the above issues, we can have a WAN optimization device at the edge. This device can do the TCP selective retransmission and also the local ack to the server. They also do layer 7 optimization which can increase the link utilization.


Tuesday, June 12, 2012

MSS Clamping

I was reading through my past documents and came across a scenario where I had to reduce the TCP MSS on routers for the connection to work propoerly. So I am writing this post in order to note it down and to help anyone looking for similar solution.

We were implementing DMVPN network and during the migration, the network was very slow. The client did not accept the test and we had to rollback the migration. Then we set up a small POC mirroring the client environment (we did a POC before migration but did not reflect the specific client network) and started troubleshooting and soon we realized that we had to reduce the TCP MSS (Maximu segment size) on the router for DMVPN to work properly.

We faced similar issue with another client this time with GET VPN implementation and the same solution did the trick for us. This act of reducing the TCP MSS is called MSS Clamping. On cisco routers, we can use following commnds to reduce the TCP MSS.

interface gi0/1
  ip address 10.10.10.1 255.255.255.0
  ip tcp adjust-mss 1360

Unicast flooding - Follow up


Unicast Flooding with Asymmetric routing

As a follow up post to the unicast flooding, lets take the below scenario.(Diagram  taken from Cisco website). Here, Server 1 (S1) is in Vlan 1 and server 2 (S2) is in  vlan 2 connected to 2 different switches. S1 is connected to SW1 on int f1/1 and S2 is connected to SW2 on int f1/1. These 2 switches are connected to each other over a .1Q trunk carrying both the vlans on int f1/24 on both switches. Router A and Router B are doing routing for both the vlans (Router-on-a-stick). Router A is connected to SW1 on int f1/23 and router B is connected to SW2 on int f1/23. Router A is the gateway for Server S1 and Router B is the gateway for the Server S2.



Now when there is traffic passing from the S1 to S2, following path will be used

S1(Vlan 1) – SW1 – Router A – Vlan 2 – SW2 – S2 (Vlan 2)

The reverse path will be

S2 (Vlan 2) – SW2 – Router B – Vlan 1 – SW1 – S1 (Vlan 1)

Let us analyze why there will be unicast flooding each time a packet travels between S1 – S2

Scenario 1: Traffic traverses from S1 to S2
Here, S1 knows that the destination address is not in the subnet and hence needs to ARP for its own gateway (Router A Vlan 1). It will send the packet to the Router A vlan 1. Router A in-turn needs to send the packet to S2 which is in the network connected to its Vlan 2 interface. So it will use the Vlan2 source MAC and will ARP for the MAC of S2. This Arp reply will come from the trunk port. Router A will send the packet directly to S2. Now this packet will be sent out interface 1/23 on SW1 but since there is no MAC entry on SW1, this packet will flooded to ach port except the source port. SW2 will receive this packet on the trunk port and learn the MAC of Router A vlan 2 interface and store it in the CAM against the trunk port.  However, it does not have the CAM entry for S2 and hence it will also resort to unicast flooding. So the following MAC learning has been done yet

On SW1
For the S1 MAC address on interface 1/1
For Router A vlan 2 Mac on interface 1/23

On SW2
For the Router A vlan 2 Mac on interface 1/24

Scenario 2 : Return traffic from S2 to s1
On the return side, S2 needs to send the traffic to S1 which is not in its subnet hence S2 will ARP for its own gateway which is Router B Vlan 2 and send the traffic to it. Router B will receive the traffic and realizing that there is a route, will ARP for the S1 which it will receive on VLAN 1 and send the packet to S1. This packet will be sent on interface 1/23 on SW2. SW2 will learn the Router B Vlan 1 MAC on interface 1/23 however it does not have MAC entry for the S1 and hence SW2 will resort to unicast flooding.  This packet will be received on SW1 on interface 1/24 and SW1 will learn the MAC of Router B Vlan 1 on this interface.  This time around, the SW1 has the CAM entry for S1 and hence this will be unicast and not flooded. during this phase, following MAC learning has been done.

On SW2
For the Router B Vlan 1 on interface f1/23
For the S2 on Vlan 2 on interface f1/1
On SW1
For Router B vlan 1 on interface f1/24

As we can see, on SW1, we never learn MAC for S2 and hence every forward packet will be unicast flooded into Vlan 2. Similarly, on SW2, S1 MAC is never learnt hence every return packet will be unicast flooded into Vlan 1.

As always, I can be wrong so any corrections/suggestions welcome.

Monday, June 11, 2012

Why 53 bytes for the ATM cell?

Today I will post an interesting and somewhat shocking reason why the cell size of the ATM network was fixed at 53 bytes. The choice of 48 bytes was a political matter. When the ATM standard was being ratified, parties from the United States wanted a 64-byte payload because this was felt to be a good compromise between larger payloads optimized for data transmission and shorter payloads optimized for real-time applications like voice; parties from Europe wanted 32-byte payloads because the small size simplify voice applications with respect to echo cancellation. Most of the European parties eventually came around to the arguments made by the Americans, but France and a few others held out for a shorter cell length. With 32 bytes, France would have been able to implement an ATM-based voice network with calls from one end of France to the other requiring no echo cancellations. 48 bytes ((64+32)/2) was chosen as a compromise between the two sides. 5 bytes of header was also provisioned assuming that the 10% of the total payload was the fair price to pay for the routing info. hence 48+5 = 53 bytes size was standardized for the ATM cell.

Monday, June 4, 2012

Cisco switches and Microsoft NLB

Microsoft NLB uses 2 modes

1. Unicast - where the actual MAC address of each server in the cluster is replaced by a common NLB Unicast MAC address. This is ok if all the servers in the NLB Cluster are connected to different switches. however, when all the servers are connected to the same switch this does not work. to solve this issue, MS has done a work around whereby a bogus MAC address per server in the cluster is created and that is assigned to the server. However the difference being this Bogus MAC is used in the Ethernet frame and not the ARP Replies. For the ARP replies to the clients, the common NLB MAC is used. Hence switch does not have an entry into the CAM table for this common NLB MAC and hence has to resort to flooding. Solution here is to have a hub in front of the NLB server cluster.

2. Multicast - In this mode, the cluster members respond to the ARP replies by sending the Cluster Multicast MAC Address (Which is illegal btw). The problem here is that the cisco switches do not accept the ARP replies where the requested IP Address is unicast but the replied MAC is multicast. hence we use the static mac entries to populate the CAM tables of the switches where the NLB cluster servers are connected.

IOS commands to add these entries are

mac-address-table static 0300.5f11.0011 vlan 10 interface fa1/1 fa1/2