Welcome to
High Performance Reconfigurable Computing System Engineering Group
Dr Noor Mahammad Sk - Sponsored & Consultancy Projects
Summary       FDP       About Me       Teaching      Research       Publications       Projects      Products      Students      Contribution      Outreach       Resources       @hprcse       
   

Energy Efficient and High Throughput Multi-match Packet Classification Architectures for NIDS
Vegesna S M Srinivasavarma & Noor Mahammad Sk

Sponsored by Ministry of Electronics and Information Technology, Govt. of India

  1. Introduction:

  2. Network Intrusion Detection System (NIDS) monitors the network traffic which passes through the endpoint and responds to malicious traffic by matching it with the library of known attacks. Deep Packet Inspection (DPI) is a key component in this identification process. There are two types of detection methods: Signature-based and Anomaly-based. Note, popularly deployed IDS systems such as Snort [1] are signature-based systems.

    Signature-based IDS refers to detecting attacks by looking for specific patterns in the packet payload known as signatures. These systems maintain a collection of signatures, representing a known security threat (for example, DoS attack, port scan). Incoming packet data streams are matched against these signatures, and if there is a match, the packet is detected as malicious, and the network administrator is alerted by a message. Otherwise, there is no action taken against the packet, and it is allowed to pass the network endpoint.

    Snort is a popularly used NIDS system and is often considered as the de-facto standard for rule-based NIDS system implementation [2, 3, 4, 5, 6]. Each Snort rule has two parts: 1) Header rule, and 2) Payload rule (within the braces). Its corresponding header fields are compared with the header rule for an incoming packet, and the packet payload is scanned for identifying a match with the content and regular expression specified in the payload rule. For a rule to be match, both header and content should match with the incoming packet. Header rule specifies match conditions with respect to the following five fields: Source and Destination address fields, Source and Destination port fields, and Protocol fields.

    In Snort, header inspection is used as a pre-processing module for 1) Quick filtering of the benign traffic and 2) pruning the number of patterns to be inspected [7, 8, 9]. From the analysis of various Snort databases, it can be observed that the number of unique header rules present in a given Snort rule database will be less than 10% of the total rules. For example, in terms of the number of rules, rule database v2.9.17 is the largest Snort database with 43,552 rules. However, the number of unique header rules was only 1711, which is just 3.92% of the total rules. This shows that several payload rules will share common headers rules. So, rather than inspecting all the pattern rules for a given packet, inspecting only those rules whose headers match with the incoming packet will significantly reduce the payload inspection overhead. Hence, header inspection improves the overall performance of the NIDS systems.

    In NIDS, header inspection and payload inspection are carried separately, where header inspection is used as a pre-filtering approach. So, to capture all the threat scenarios, all the header matching rules need to be reported. This is often referred to as multi-match packet classification. However, with the ever-growing link speeds, the software-based inspection cannot meet the line speed requirements. For instance, the maximum achievable throughput of notable HyperSplit [10] packet classification algorithm on a multi-core networking platform is just 2.6 Gbps in the worst-case, which is far below the current line-rate requirements. Hence, hardware-based packet classification stands as an alternative for performing high-speed packet classification. Hardware-based multi-match packet classification can be performed using either: 1) Ternary Content Addressable Memories (TCAMs) (or) 2) Field Programmable Gate Arrays (FPGAs).

  3. Objectives of the Research:

  4. The following are the major objectives of the work:

    • To improve the energy and memory (area) performance of the existing TCAM-based multi-match packet classification designs, without degrading the original search throughput.
    • To improve the resource efficiency and look-up throughput of the existing FPGA-based multi-match packet classification architectures which are primarily memory-based. The proposed FPGA-based design should also improve upon the search throughput of TCAM-based multi-match packet classification designs.
    • To propose an FPGA-based architecture that can support fast updates unlike existing FPGA-based architectures which require offline hardware reconstruction for every rule update, while delivering higher throughput when compared with the existing FPGA-based and TCAM-based designs.

  5. Deliverables of the research:

  6. The following are the key deliverables of this research:

    1. Novel rule algorithms used at the level of Network Processing Unit (NPU) for optimizing the key performance parameters of TCAM co-processor.
    2. Novel FPGA-based hardware architectures for performing high throughput multi-match packet classification while delivering superior resource efficiency and support for fast rule updates.

  7. Technology Developed:

  8. Four designs are developed in this research for performing energy efficient and high throughput multi-match header inspection in NIDS. The designs developed are either: 1) Algorithms to be deployed in the NPU of the routers/security appliances for optimizing the TCAM co-processor usage, and 2) Efficient FPGA-based multi-match packet classification architectures for achieving resource efficiency and high throughput. The designs developed in this research are as follows:

    1. Analysed the unique match condition properties of the real-world NIDS header rule databases and developed an algorithm for performing modified rule-mapping on TCAM for achieving optimal matchline activation during multi-match header inspection on the TCAM. The algorithm takes the rule database as an input and gives out the fields in the sorted order of the number of unique match conditions present in them. The NPU then maps these fields onto the TCAM by placing the field with the highest unique field count at the left-most segment and the field with the lowest count on the right-most segment. The proposed mapping can be seamlessly integrated with the existing TCAM-based multi-match designs. This results in the reduction of the overall TCAM search energy without any affect on the TCAM area and search throughput.

    2. Based on certain stable structural properties exhibited by the real-world NIDS header rule databases, a rule-entry compression algorithm is developed. An efficient pre-lookup table construction method is presented for achieving negligible pre-lookup memory and delay penalty. As the width of TCAM is reduced, integrating the compressed classifier entries along with discriminator bits used in the bitmap technique [8] is not possible directly for databases with a greater number of intersections. To overcome this, a hybrid solution named as hybrid bitmap is put forward. The proposed algorithm has compressed each rule entry which resulted in improving the TCAM memory (Area) and TCAM search energy with a noticeable improvement in the TCAM search delay as well.
    3. Designed and developed a resource-efficient and high-throughput FPGA-based multi-match packet classification architecture. The developed architecture design is optimized in accordance with the stable structural properties of the real-world NIDS header rule databases for achieving resource efficiency and higher look-up throughput. Developed a Python script which takes the rule database as an input and gives the synthesizable Verilog HDL code of the proposed architecture as an output. This code is then fed as an input to the Xilinx tool for FPGA implementation.
    4. Existing FPGA-based multi-match packet classification architectures could not support fast updates due to certain rule specifications present in NIDS header rule databases. In order to overcome this limitation, a high throughput FPGA architecture with fast update support is developed in this research. Range comparators that are key in the architecture are optimized for achieving optimal LUT - Delay performance. Developed a Python script which takes the rule database as an input and gives the synthesizable Verilog HDL code of the proposed architecture as an output. This code is then fed as an input to the Xilinx tool for FPGA implementation.

  9. Target Benefits of the Research:

  10. The following are the two primary requirements of any networking appliance. They are: 1) Energy Efficiency, and 2) High Throughput. In order to achieve high throughput packet inspection, dedicated hardware-based solutions employing TCAMs are required. However, TCAMs consumes more energy and also costlier. Two designs developed in this research helps in making TCAM-based NIDS header inspection more energy- and area-efficient without any impact on the search throughput.

    While TCAMs can be undisputedly considered as an effective solution for best-match packet classification, the same is not true in the context of multi-match packet classification. This is due to the presence of a priority encoder due to which extracting multiple rule match information from the TCAM is not straight forward. As a result of this structural mis-match, multi-match packet classification solutions based on TCAM suffers from memory-look-up performance trade-off which cannot be completed alleviated.

    Field Programmable Gate Arrays (FPGAs) can be used to overcome this limitation associated with the TCAMs. In modern-day networking applications, FPGAs are being widely used due to the combined advantage of superior flexibility and high throughput offered by the FPGAs. The following are the key requirements of an FPGA-based packet classification design. They are: 1) Resource Efficiency, and 2) High Throughput. The FPGA-based design proposed in this research achieves superior resource efficiency and high throughput simultaneously. In the construction of a full-featured FPGA-based NIDS system, the resource efficient FPGA-based design developed in this research offers the following unique advantage. Being resource efficient, it makes available, a significant portion of the FPGA resources for the payload inspection module. This provides more flexibility towards optimizing the throughput of payload inspection module (which is computationally intensive than header inspection) at the cost of increased LUT consumption.

  11. Major Achievements of the Research:

  12. Extensive simulations are performed using real world NIDS header rule databases for estimating the performance of the solutions developed as part of this research and thereby comparing them with the state-of-the-art existing designs. Based on the above, the following are the major achievements of the solutions developed as part of this research.

    1. The energy per packet of the existing TCAM-based multi-match packet classification designs supplemented by the proposed modified rule mapping has reduced significantly. The energy per packet has reduced on average by 24.07% and 28.17% for the Rule Discriminatory (RD)- and Rule Intersection (RI)-based designs, respectively, without any impact on the delay. The worst-case throughput of bitmap(e) is 22 Gbps and 5.49 Gbps for medium- and large-sized databases, respectively. Whereas for SSA-2(e), the worst-case throughput is 73.13 Gbps and 6.48 Gbps for medium- and large-sized databases, respectively. Note, bitmap [8] and SSA-2 [11] designs are the best existing representative designs in RD and RI categories, respectively. Whereas bitmap(e) and SSA-2(e) are the design versions supplemented by the proposed rule mapping.
    2. The proposed rule-entry compression mechanism has reduced the TCAM memory requirement by 50% assuming commercially available TCAM width configurations. This reduction in the TCAM memory has improved the TCAM search energy by 45% on average. For RI-based designs, the delay per packet has reduced by 7.13% on average. For RD-based designs, including hybrid bitmap, the average reduction in the delay per packet has ranged between 7.57% - 28.65%. For medium-sized databases, the worst-case throughput of SSA-2+ and bitmap+ designs is 83.03 Gbps and 31.01 Gbps respectively. Whereas for the larger databases, the worst-case throughput of SSA-2+ and hybrid bitmap designs is 6.66 Gbps and 6.65 Gbps respectively. Note, bitmap+ and SSA-2+ are the design versions supplemented by the proposed rule entry compression mechanism.
    3. The proposed FPGA-based resource-efficient multi-match packet classification architecture has sustained a high throughput of 203.84 Gbps and 38.72 Gbps in the worst-case for medium- and large-sized databases, respectively. The throughput achieved by the proposed architecture is 2.7x higher than the FSBV [3], whose throughput is highest among remaining of the FPGA-based designs. The proposed designs’ resource efficiency (Gbps/slice) is atleast 5 times higher than the existing designs, which shows the resource efficiency of the proposed design. Also, the throughput of the proposed design is atleast 5 times higher than the TCAM-based designs.
    4. The proposed FPGA-based multi-match packet classification architecture is the only FPGA-based design with the support for fast updates. It also sustained a worst-case throughput of 98.7 Gbps and 21.21 Gbps for the medium- and large-sized databases, respectively. The throughput offered by the proposed design is 29% higher than the FSBV design, whose throughput is higher among the remaining of the FPGA-based designs. Unlike existing FPGA-based designs that require offline hardware reconstruction for performing updates, the proposed architecture can support quick updates at the rate of 10K updates per second in worst-case for large-sized databases. The proposed FPGA architecture has superior throughput performance when compared to the TCAM-based designs. The memory efficiency (Gbps/bytes/rule) of the proposed designs is 61.98% and 30.94% higher than the bitmap [8] and bitmap+, respectively. This shows that the proposed design has significantly alleviated the memory-throughput performance trade-off present in the TCAM-based designs while overcoming the fast-update support limitation present in the existing FPGA-based designs.

  13. Ready to deployable solutions developed by from your research:

  14. Solutions (both TCAM- and FPGA-based) developed as part of this work can be deployed for performing energy-, resource-, and throughput-efficient header inspection in the security appliances used for protecting a particular subnet. The most appealing aspect of the proposed solutions is, they can be seamlessly integrated with any of the existing TCAM-based multi-match packet classification designs thereby making them more energy- and area-efficient. Also, the proposed solutions can be deployed using commercially available TCAM structures alone which makes the proposed designs readily deployable (thereby avoiding the larger development cost and time involved in the deployment of custom-based hardware solutions).

  15. References:

    1. Snort: Network Intrusion Detection/Prevention System. https://www.snort.org. [Online; Accessed: Nov-2020].
    2. C. E. Graves, C. Li, X. Sheng, W. Ma, S. R. Chalamalasetti, D. Miller, J. S. Ignowski, B. Buchanan, L. Zheng, S.-T. Lam et al., "Memristor TCAMs accelerate regular expression matching for network intrusion detection," IEEE Transactions on Nanotechnology, vol. 18, pp. 963 - 970, 2019.
    3. W. Jiang and V. K. Prasanna, "Field-split parallel architecture for high performance multi-match packet classification using FPGAs," in Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. ACM, 2009, pp. 188–196.
    4. K. Lee and S. Yun, "Hybrid memory-efficient multimatch packet classification for NIDS," Microprocessors and Microsystems, vol. 39, no. 2, pp. 113 - 121, 2015.
    5. S. Pontarelli, G. Bianchi, and S. Teofili, "Traffic-aware design of a high-speed FPGA network intrusion detection system," IEEE Transactions on Computers, vol. 62, no. 11, pp. 2322 - 2334, 2012.
    6. R. Shen, X. Li, and H. Li, "A space-and power-efficient multi-match packet classification technique combining TCAMs and SRAMs," The Journal of Supercomputing, vol. 69, no. 2, pp. 673 - 692, 2014.
    7. H. Song and J. W. Lockwood, "Efficient packet classification for network intrusion detection using FPGA," in Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays. ACM, 2005, pp. 238 - 245.
    8. D.-Y. Chang and P.-C. Wang, "TCAM-based multi-match packet classification using multidimensional rule layering," IEEE/ACM Transactions on Networking, vol. 24, no. 2, pp. 1125 - 1138, 2015.
    9. Y. Xu, Z. Liu, Z. Zhang, and H. J. Chao, "High-throughput and memory-efficient multimatch packet classification based on distributed and pipelined hash tables," IEEE/ACM Transactions on Networking, vol. 22, no. 3, pp. 982 - 995, 2013.
    10. Y. Qi, L. Xu, B. Yang, Y. Xue, and J. Li, "Packet classification algorithms: From theory to practice," in IEEE INFOCOM 2009. IEEE, 2009, pp. 648 - 656.
    11. F. Yu, T. Lakshman, M. A. Motoyama, and R. H. Katz, "Efficient Multimatch packet classification for network security applications," IEEE Journal on Selected Areas in Communications, vol. 24, no. 10, pp. 1805 - 1816, 2006.
Thank You for Visiting My Webpage!!