Blog Viewer

FIB Scale in Express5

By Chandrasekaran Venkatraman posted 19 days ago

  

FIB Scale in Express5

Express5 has leap frogged in terms of Route scale, thanks to a novel approach in implementing the route table memory.                                   

This article is part of a series of publications on Express5:

Introduction

Supporting a large FIB in ASIC poses the challenge of making the trade-off between Silicon internal memory area and external memory bandwidth. Embedding the entire FIB on-chip becomes physically impossible for large FIB scales, and on the other hand having the entire FIB only in the external memory makes the design expensive due to the need for dedicated memory bandwidth for route lookup. Hence in Express5, a hybrid approach is taken. The complete route table is stored in the external HBM main memory, while a ‘live’ working set of routes is contained in a sufficiently large internal cache. The process of promotion of the routes from main memory to cache is entirely handled in the hardware without any software intervention. This makes the design capable of meeting the twin-goal of scale and performance.

FIB Scale Comparison

Below is a comparison of FIB scale supported by different Express generations.

Chipset Entry Sizes Supported FIB Scale without Compression FIB Scale with Compression
Express1 and Express2 8, 16 words 4M 8-word entries n/a
Express3 5, 10 words 512K 5-word entries n/a
Express4 2.5, 5, 10 words 2M 2.5-word entries Up to 4M 2.5-word entries
Express5 2.5, 5, 10 words 10M 10-word entries Up to 16M 10-word entries

Table 1: FIB Scale Comparison

For more details on the FIB Compression principles, please have a look at this article: https://community.juniper.net/blogs/nicolas-fevrier/2022/09/19/ptx-fib-compression

FIB Cache

Express5 takes advantage of the fact that in real-life networks, a large percentage of the installed routes don’t receive much traffic while only a small percentage of the routes carry almost all of the active traffic. According to a study presented in NANOG for Comcast network (Ref 1), an Internet FIB table of 575K prefixes showed the following distribution:

Percentage of total installed Routes Percentage of Traffic carried by those Routes
0.5% 90%
4% 9%
23.5% 0.9%
72% 0.1%

Table 2: Internet FIB Distribution

As shown in Table 2 above, only a very small portion of the FIB gets frequently accessed. From this, we deduce the following: if a significant number of the active routes are continuously made available in an optimally sized high-bandwidth memory, then the input traffic can be sustained at its full throughput even if the read bandwidth to the main FIB table is limited.

In Express5 the internal fungible shared memory is used as the L1-cache for the FIB database. The shared memory also contains partitions for Nexthops and Encapsulation data structures. A dedicated hardware state machine in the route lookup function processes the input traffic to keep the cache occupied with the most frequently accessed routes.

When FIB entries in the main memory get added/deleted/modified by the Control Plane, the FIB Cache coherency is maintained.

FIB table subsystem

Figure 1: FIB Table Subsystem

To forward an IP address, a series of prefix-masks are applied on the address value based on results from the Bloom-filter and those masked prefixes are probed into the FIB. This will entail a series of probes into the cache for different prefix-lengths, of which the longest match is considered the final result. In the event of a cache miss, the entry is fetched from the main memory and updated in the cache.

The FIB cache size can be programmed to accommodate up to 1M routes. If the number of active routes at any given time is within this range, then the cache can serve the input traffic sufficiently well to meet the performance goals at the same time providing high scale.

Glossary

  • FIB : Forwarding Information Base
  • NANOG : North American Network Operators Group
  • HBM :  High Bandwidth Memory
  • IP : Internet Protocol (refers to IPv4 and IPv6)
  • L1 : Level 1
  • Word : 32-bits or 4 bytes

Useful Links

Acknowledgements

  • Dmitry Shokarev
  • Nicolas Fevrier
  • Sharada Yeluri
  • Swamy SRK

Comments

If you want to reach out for comments, feedback or questions, drop us a mail at:

Revision History

Version Author(s) Date Comments
1 Chandrasekaran Venkatraman April2024 Initial Publication


#Silicon

Permalink