Article
01/27/2016

January 2013

Volume 28 Number 01

Test Run - Artificial Immune Systems for Intrusion Detection

James McCaffrey An artificial immune system (AIS) for intrusion detection is a software system that models some parts of the behavior of the human immune system to protect computer networks from viruses and similar cyber attacks. The essential idea is that the human immune system—which is a complex system consisting of lymphocytes (white blood cells), antibodies and many other components—has evolved over time to provide powerful protection against harmful toxins and other pathogens. So, modeling the behavior of the human immune system may provide an effective architecture against cyber attacks.

In this article I describe some of the principles of artificial immune systems and present a program to demonstrate these principles. Work on AIS protection is still relatively new and, in my opinion, no commercial implementations are quite ready for prime time. The code presented in this article will not directly enable you to create a realistic network-intrusion system, but there are at least four reasons why you might find this article worth reading. First, the code will give you a starting point for hands-on experimentation with a simple AIS system. Second, the principles explained will get you over the rather difficult initial hurdle to this area and allow you to understand research papers on AIS. Third, several of the programming techniques used in this article, in particular r-chunks bit matching and negative selection, can be useful in other programming scenarios. And fourth, you may just find the idea of modeling a software system based on the behavior of the human immune system interesting in its own right.

The best way to get a feel for where I’m headed is to take a look at the screenshot of a demo run, shown in Figure 1. The demo program begins by creating a set of six normal patterns (patterns that are known not to be part of some cyber attack) that represent TCP/IP network packets in binary form. This is called the self-set in AIS terminology. Of course, in a real AIS system, the self-set would likely contain tens or hundreds of thousands of patterns, and each pattern would be much larger (typically 48-256 bits) than the 12 bits used here. Next, the demo program creates three artificial lymphocytes. Each lymphocyte has a simulated antibody that has four bits (again, artificially small), an age and a stimulation field. The antibody field is essentially a detector. As you’ll see shortly, the lymphocytes are created so that none of them detect any of the patterns in the self-set. For now, observe that each lymphocyte has three consecutive 0s or 1s but none of the patterns in the self-set has three consecutive equal bit values.

Figure 1 Artificial Immune System Demo

After the system has been initialized, the demo program begins a tiny simulation with six input patterns. The first input is detected by lymphocyte 1, but because each lymphocyte has an activation threshold, the lymphocyte doesn’t trigger an alarm. At time t = 3, lymphocyte 1 detects another suspicious input but, again, is not yet over the threshold. But at time t = 5, lymphocyte 1 detects a third suspicious input packet and a simulated alert is triggered.

In the sections that follow, I’ll first explain the key concepts of the human immune system that are used to model an AIS. Then I’ll walk you through the code that produced the screenshot in Figure 1. I’ll conclude by giving you some additional references and a few opinions about AIS. This article assumes you have at least intermediate-level programming skills with a modern programming language. I use C# for the demo program, but I’ll explain where you’ll need to makes changes if you want to refactor my code into another language such as Visual Basic .NET or Python. I present all the code that generated the screenshot in Figure 1; the code is also available at msdn.com/magazine/msdnmag0113.

The Human Immune System

The key elements of a highly simplified immune system are illustrated in Figure 2. Harmful items are proteins called antigens. In Figure 2 the antigens are colored red and have sharp corners. The human body also contains many non-harmful antigens called self-antigens, or self-items. These are naturally occurring proteins and in Figure 2 are colored green and have rounded sides.

Figure 2 Key Elements of a Simplified Immune System

Antigens are detected by lymphocytes. Each lymphocyte has several antibodies, which can be thought of as detectors. Each antibody is specific to a particular antigen. Typically, because antibody-antigen matching is only approximate, a lymphocyte will not trigger a reaction when a single antibody detects a single antigen. Only after several antibodies detect their corresponding antigens will a lymphocyte become stimulated and trigger some sort of defensive reaction.

Notice that no lymphocyte has antibodies that detect a self-item. Real antibodies are generated by the immune system in the thymus, but any antibodies that detect self-items are destroyed before being released into the blood stream, a process called apoptosis.

In terms of an intrusion-detection system, antigens correspond to TCP/IP network packets whose contents contain some sort of harmful data, such as a computer virus. Self-antigens correspond to normal, non-harmful network packets. An antibody corresponds to a bit pattern that approximately matches an unknown, potentially harmful network packet. A lymphocyte represents two or more antibodies/detectors. Apoptosis is modeled using a technique called negative selection.

Overall Program Structure

The demo program shown in Figure 1 is a single C# console application named ArtificialImmuneSystem. I used Visual Studio 2010, but any version of Visual Studio that has the Microsoft .NET Framework 2.0 or later should work. I renamed the Visual Studio template-generated file named Program.cs to the more descriptive ArtificialImmuneSystemProgram.cs and renamed the corresponding class as well. The overall program structure is listed in Figure 3.

Figure 3 Artificial Immune System Program Structure

using System;
using System.Collections.Generic;
using System.Collections; // for BitArray
namespace ArtificialImmuneSystem
{
  class ArtificialImmuneSystemProgram
  {
    static Random random;
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin Artificial Immune System for Intrusion" +
        " Detection demo\n");
      random = new Random(1);
      int numPatternBits = 12;
      int numAntibodyBits = 4;
      int numLymphocytes = 3;
      int stimulationThreshold = 3;
      Console.WriteLine("Loading self-antigen set ('normal' historical patterns)");
      List<BitArray> selfSet = LoadSelfSet(null);
      ShowSelfSet(selfSet);
      Console.WriteLine("\nCreating lymphocyte set using negative selection" +
        " and r-chunks detection");
      List<Lymphocyte> lymphocyteSet = CreateLymphocyteSet(selfSet, numAntibodyBits,
        numLymphocytes);
      ShowLymphocyteSet(lymphocyteSet);
      Console.WriteLine("\nBegin AIS intrusion detection simulation\n");
      int time = 0;
      int maxTime = 6;
      while (time < maxTime)
      {
        Console.WriteLine("============================================");
        BitArray incoming = RandomBitArray(numPatternBits);
        Console.WriteLine("Incoming pattern = " +
          BitArrayAsString(incoming) + "\n");
        for (int i = 0; i < lymphocyteSet.Count; ++i)
        {
          if (lymphocyteSet[i].Detects(incoming) == true)
          {
            Console.WriteLine("Incoming pattern detected by lymphocyte " + i);
            ++lymphocyteSet[i].stimulation;
            if (lymphocyteSet[i].stimulation >= stimulationThreshold)
              Console.WriteLine("Lymphocyte " + i + " stimulated!" +
                " Check incoming as possible intrusion!");
            else
              Console.WriteLine("Lymphocyte " + i + " not over stimulation" +
                " threshold");
          }
          else
            Console.WriteLine("Incoming pattern not detected by lymphocyte " + i);
        }
        ++time;
        Console.WriteLine("============================================");
      } // Simulation loop
        Console.WriteLine("\nEnd AIS IDS demo\n");
        Console.ReadLine();
    } // Main
    public static List<BitArray> LoadSelfSet(string dataSource) {..}
    public static void ShowSelfSet(List<BitArray> selfSet) {..}
    public static string BitArrayAsString(BitArray ba) {..}
    public static List<Lymphocyte> CreateLymphocyteSet(List<BitArray> selfSet,
      int numAntibodyBits, int numLymphocytes) {..}
    private static bool DetectsAny(List<BitArray>
      selfSet, Lymphocyte lymphocyte) {..}
    public static void ShowLymphocyteSet(List<Lymphocyte> lymphocyteySet) {..}
    public static BitArray RandomBitArray(int numBits) {..}
  } // Program
  public class Lymphocyte
  {
    public BitArray antibody;  // detector
    public int[] searchTable;  // for fast detection
    public int age;            // not used; could determine death
    public int stimulation;    // controls triggering
    public Lymphocyte(BitArray antibody) {..}
    private int[] BuildTable() {..}
    public bool Detects(BitArray pattern) {..}
    public override int GetHashCode() {..}
    public override string ToString() {..}
  }
} // ns

I deleted all the template-generated using statements except for references to the System and System.Collections.Generic namespaces. I added a reference to the System.Collections namespace so I could have access to the BitArray class. After a start-up message, I instantiated a static Random object using an arbitrary seed value of 1.

If you compare the code in the Main method in Figure 3 with the screenshot in Figure 1, you shouldn’t have too much trouble understanding the program logic. The key to the AIS demo is the definition of the class Lymphocyte. Note that in order to keep the size of the demo code small and the key ideas clear, I removed all the normal error checking you’d likely include during experimentation.

The Lymphocyte Class

The Lymphocyte class has four data fields:

public BitArray antibody;  // Detector
public int[] searchTable;  // For fast detection
public int age;            // Not used; could determine death
public int stimulation;    // Controls triggering

I declare all fields with public scope for simplicity. The antibody field is a BitArray. If you’re not familiar with BitArray, the idea is that using a normal array of int to represent a collection of bits is highly inefficient because each int requires 32 bits of storage. A bit array condenses 32 bits of information into a single int of storage, plus some overhead for the class. Programming languages that don’t have a BitArray-like class require you to do low-level bit manipulation with bit masking and bit operations.

The searchTable field is an array that’s used by the Detects method to greatly increase performance. The age field isn’t used in my demo program; many AIS systems track the age of a simulated lymphocyte and probabilistically kill off and generate new lymphocytes based on age. The stimulation field is a counter that tracks how many times a Lymphocyte object has detected a possible threat. In this demo, when a lymphocyte stimulation value exceeds the stimulationThreshold value of 3, an alert is triggered.

The Lymphocyte constructor is:

public Lymphocyte(BitArray antibody)
{
  this.antibody = new BitArray(antibody);
  this.searchTable = BuildTable();
  this.age = 0;
  this.stimulation = 0;
}

The constructor accepts a single BitArray parameter that represents an antigen/detector. The searchTable array is instantiated using a private helper method named BuildTable, which I’ll present shortly.

One of the key parts of any AIS system is the routine that determines whether an antigen detects a pattern. Requiring an exact match isn’t feasible (and doesn’t mimic real antigen behavior). Early work on AIS used a technique called r-contiguous bits matching, in which both an antigen and an input pattern have the same number of bits and detection occurs when antigen and pattern match in r-consecutive bits. Later research indicated that a better detection algorithm is r-chunks bit matching. The r-chunks bit matching is similar to r-contiguous bits matching except that the antigen detector is smaller than the pattern to check, and detection occurs when the antigen matches ny subset of the pattern. For example, if an antigen is 110 and a pattern is 000110111, the antigen detects the pattern starting at index 3.

If you think about r-chunks matching for a moment, you’ll realize that it’s almost the same as a string substring function. The only difference is that r-chunks matching matches bits and substring matches characters.

In the preceding example, a simple approach to r-chunks matching would be to examine the pattern starting at index 0, then at index 1, then 2 and so on. However, this approach is very slow in most situations. There are several sophisticated substring algorithms that preprocess the smaller detector string to create an array (usually called a table). This search table can be used to skip ahead when a mismatch is encountered and greatly improve performance. In situations where the smaller detector string is repeatedly used to check different patterns—as in AIS intrusion detection—the time and memory needed to create the search table is a small price to pay for dramatically improved performance.

The Lymphocyte class Detects method uses the Knuth-Morris-Pratt substring algorithm applied to a BitArray. The Detects method accepts an input pattern such as 000110111 and returns true if the current object’s antigen, such as 110, matches the pattern. The code for the Detects method is listed in Figure 4.

Figure 4 The Detects Method

public bool Detects(BitArray pattern)  // Adapted KMP algorithm
{
  int m = 0;
  int i = 0;
  while (m + i < pattern.Length)
  {
    if (this.antibody[i] == pattern[m + i])
    {
      if (i == antibody.Length - 1)
        return true;
      ++i;
    }
    else
    {
      m = m + i - this.searchTable[i];
      if (searchTable[i] > -1)
        i = searchTable[i];
      else
        i = 0;
    }
  }
  return false;  // Not found
}

The Detects method assumes the existence of the search table. Recall that the Lymphocyte constructor calls a helper method BuildTable to create the searchTable field. The code for BuildTable is listed in Figure 5.

Figure 5 The BuildTable Method

private int[] BuildTable()
{
  int[] result = new int[antibody.Length];
  int pos = 2;
  int cnd = 0;
  result[0] = -1;
  result[1] = 0;
  while (pos < antibody.Length)
  {
    if (antibody[pos - 1] == antibody[cnd])
    {
      ++cnd; result[pos] = cnd; ++pos;
    }
    else if (cnd > 0)
      cnd = result[cnd];
    else
    {
      result[pos] = 0; ++pos;
    }
  }
  return result;
}

The Lymphocyte class defines overridden GetHashCode and ToString merthods. The GetHashCode method is used to prevent duplicate Lymphocyte objects and is:

public override int GetHashCode()
{
  int[] singleInt = new int[1];
  antibody.CopyTo(singleInt, 0);
  return singleInt[0];
}

This implementation assumes that the BitArray antibody field has 32 bits or less. In realistic situations where the antibody field has more than 32 bits, dealing with duplicate Lymphocyte objects is not so simple. One approach would be to define a custom hash code method that returns a BigInteger type (available in the .NET Framework 4 and later).

The Lymphocyte ToString method used in the demo program is:

public override string ToString()
{
  string s = "antibody = ";
  for (int i = 0; i < antibody.Length; ++i)
    s += (antibody[i] == true) ? "1 " : "0 ";
  s += " age = " + age;
  s += "  stimulation = " + stimulation;
  return s;
}

Creating the Self-Set

You are certainly familiar with standard (non-AIS) computer virus-detection software such as Microsoft Security Essentials. These systems work by storing a local database of known computer virus definitions. When a known virus pattern is detected, an immediate alert is triggered. Such antivirus systems have trouble dealing with variations on existing viruses, and fail entirely in most situations when faced with a completely new virus. AIS intrusion detection works in the opposite way by maintaining a set of input patterns that are known to be non-harmful and then triggering an alert when an unknown pattern is detected. This allows AIS intrusion-detection systems—in principle, at least—to detect new viruses. However, dealing with false positives—that is, triggering an alert on a non-harmful input pattern—is a major challenge for AIS systems.

A real AIS intrusion-detection system would collect many thousands of normal input patterns over the course of days or weeks. These normal self-set patterns would be stored either on a local host or a server. Also, a real AIS system would continuously update the self-set (and the induced set of lymphocytes) of normal patterns to account for normal changes in network traffic over time. The demo program in this article creates an artificial, static self-set using method LoadSelfSet:

public static List<BitArray> LoadSelfSet(string dataSource)
{
  List<BitArray> result = new List<BitArray>();
  bool[] self0 = new bool[] { true, false, false, true, false, true,
                              true, false, true, false, false, true };
  // Etc.
  bool[] self5 = new bool[] { false, false, true, false, true, false,
                              true, false, false, true, false, false };
  result.Add(new BitArray(self0));
  // Etc.
  result.Add(new BitArray(self5));
  return result;
}

The method accepts a dummy not-used dataSource parameter to suggest that in a realistic scenario the self-set data would not be hardcoded. The BitArray constructor, somewhat surprisingly, accepts an array of bool values where true represents a 1 bit and false represents a 0 bit. Observe that I generated the self-set data in such a way that no self-item has more than two consecutive 0s or 1s.

The demo program uses utility method ShowSelfSet, which calls helper method BitArrayAsString, to display the self-set:

public static void ShowSelfSet(List<BitArray> selfSet)
{
  for (int i = 0; i < selfSet.Count; ++i)
    Console.WriteLine(i + ": " + BitArrayAsString(selfSet[i]));
}
public static string BitArrayAsString(BitArray ba)
{
  string s = "";
  for (int i = 0; i < ba.Length; ++i)
    s += (ba[i] == true) ? "1 " : "0 ";
  return s;
}

Creating the Lymphocyte Set

If you refer back to the explanation of how the human immune system works, you’ll note that the lymphocyte set should contain only Lymphocyte objects that do not detect any patterns in the self-set. Method CreateLymphocyteSet is listed in Figure 6.

Figure 6 The CreateLymphocyteSet Method

public static List<Lymphocyte> CreateLymphocyteSet(List<BitArray> selfSet,
  int numAntibodyBits, int numLymphocytes)
{
  List<Lymphocyte> result = new List<Lymphocyte>();
  Dictionary<int, bool> contents = new Dictionary<int, bool>();
  while (result.Count < numLymphocytes)
  {
    BitArray antibody = RandomBitArray(numAntibodyBits);
    Lymphocyte lymphocyte = new Lymphocyte(antibody);
    int hash = lymphocyte.GetHashCode();
    if (DetectsAny(selfSet, lymphocyte) == false &&
      contents.ContainsKey(hash) == false)
    {
      result.Add(lymphocyte);
      contents.Add(hash, true);
    }
  }
  return result;
}

In AIS terminology, method CreateLymphocyteSet uses negative selection. A random Lymphocyte object is generated and then tested to see if it does not detect any patterns in the self-set, and also that the lymphocyte is not already in the result set. This approach is rather crude, and there are other approaches that are more efficient. I use a Dictionary collection with a dummy bool value to track existing Lymphocyte objects; the HashSet in the .NET Framework 4.5 and later is a more efficient alternative.

A random Lymphocyte object is created by generating a random BitArray:

public static BitArray RandomBitArray(int numBits)
{
  bool[] bools = new bool[numBits];
  for (int i = 0; i < numBits; ++i)
  {
    int b = random.Next(0, 2);  // between [0,1] inclusive
    bools[i] = (b == 0) ? false : true;
  }
  return new BitArray(bools);
}

Helper method DetectsAny accepts a self-set, and a lymphocyte scans through a self-set and returns true if any pattern in the self-set is detected by the antigen in the lymphocyte:

private static bool DetectsAny(List<BitArray> selfSet,
  Lymphocyte lymphocyte)
{
  for (int i = 0; i < selfSet.Count; ++i)
    if (lymphocyte.Detects(selfSet[i]) == true) return true;
  return false;
}

The demo program uses a utility method ShowLymphocyteSet to display the generated Lymphocyte objects:

public static void ShowLymphocyteSet(List<Lymphocyte> lymphocyteySet)
{
  for (int i = 0; i < lymphocyteySet.Count; ++i)
    Console.WriteLine(i + ": " + lymphocyteySet[i].ToString());
}

Wrapping Up

The code and explanations I’ve presented in this article should give you a solid basis for hands-on experimentation with an AIS. Researchers have suggested many options. For example, the demo program in this article fires an alert when a single lymphocyte detects unknown input patterns more than some threshold number of times. The idea here is that real pathogens emit many antigens. Another possibility is for the AIS system to trigger an alert only after multiple different lymphocytes detect the same unknown pattern.

It’s important to point out that an AIS is not intended to be a single solution for intrusion detection. Rather, it’s meant to be part of a multilayer defense that includes traditional antivirus software. Additionally, because work with an AIS is still relatively young, there are many unanswered questions. If you wish to examine some of the research on AIS, I recommend searching online for articles by authors S. Forrest, P. Williams and U. Aickelin.

Dr. James McCaffrey works for Volt Information Sciences Inc., where he manages technical training for software engineers working at the Microsoft Redmond, Wash., campus. He has worked on several Microsoft products including Internet Explorer and MSN Search. McCaffrey is the author of “.NET Test Automation Recipes” (Apress, 2006). He can be reached at jammc@microsoft.com.

Thanks to the following technical expert for reviewing this article: Dan Liebling