June 2015

Volume 30 Number 6


Test Run - Firefly Algorithm Optimization

By James McCaffrey

James McCaffreyIn machine learning, a numerical optimization algorithm is often used to find a set of values for variables—usually called weights—that minimize some measure of error. For example, logistic regression classification uses a math equation where, if there are n predictor variables, there are n+1 weight values that must be determined. The process of determining the values of the weights is called training the model. The idea is to use a collection of training data that has known correct output values. An optimization algorithm is used to find the values of the weights that minimize the error, which is the difference between computed output values and correct output values.

There are many different optimization algorithms. This article explains a relatively new (first published in 2009) technique called firefly algorithm (FA) optimization. FA optimization loosely models the behavior of a swarm of fireflies.

A good way to get an idea of what FA optimization is and to see where this article is headed is to take a look at the demo program in Figure 1. The goal of the demo program is to use FA optimization to find the minimum value of the Michalewicz function with five input variables. The Michalewicz function is a standard benchmark function used to evaluate the effectiveness of numerical optimization algorithms. With five input values, the function has a minimum value of z = -4.6877 located at x = (2.2029 1.5707, 1.2850, 1.9231, 1.7205).

The Firefly Algorithm Optimization in Action
Figure 1 The Firefly Algorithm Optimization in Action

The Michalewicz function is difficult for most optimization algorithms because it has several local minimum values and several flat areas (where all z values are almost equal). It’s not possible to easily visualize the Michalewicz function with five input values, but you can get an idea of the function’s characteristics by exam­ining a graph of the function for two input values, shown in Figure 2. The function’s definition in math terms is shown at the bottom of the figure.

The Michalewicz Function of Two Variables
Figure 2 The Michalewicz Function of Two Variables

The demo program sets the number of fireflies to 40. Each firefly has a virtual position that represents a possible solution to the minimization problem. More fireflies increase the chance of finding the true optimal solution at the expense of performance. FA optimization typically uses 15 to 40 fireflies.

The demo sets the problem dimension to 5 because there are five input values. FA is an iterative process and requires a maximum loop counter value. A loop counter variable in machine learning optimization is often named epoch and the demo sets the maximum value to 1,000 iterations. The maximum number of iterations will vary from problem to problem, but 1,000 is a reasonable starting value. FA has an element of randomness and the demo sets the seed value for the random number generator to an arbitrary value of 0.

In the demo run in Figure 1, the best (smallest) error associated with the best position found so far was displayed every 100 epochs. After the algorithm finished, the best position found for any firefly was x = (2.2033, 1.5711, 1.2793, 1.1134, 2.2216). This solution is close to, but not quite equal to, the optimal solution of x = (2.2029 1.5707, 1.2850, 1.9231, 1.7205). The value of the Michalewicz function at the solution found by FA was -4.45, which is close to the true minimum value of -4.69. The error of the FA solution is 0.0561.

This article assumes you have at least intermediate programming skills, but does not assume you know anything about numerical optimization or the firefly algorithm. The demo program is coded using C# but you shouldn’t have too much difficulty refactoring the code to another language, such as JavaScript or Python.

The complete demo code, with a few minor edits to save space, is presented in this article. The demo is also available in the code download that accompanies this article. The demo code has all normal error checking removed to keep the main ideas as clear as possible and the size of the code small.

Overall Program Structure

The overall program structure is presented in Figure 3. To create the demo, I launched Visual Studio and created a new C# console application named FireflyAlgorithm. The demo has no significant Microsoft .NET Framework dependencies, so any recent version of Visual Studio will work.

Figure 3 Firefly Demo Program Structure

using System;
namespace FireflyAlgorithm
{
  class FireflyProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("Begin firefly demo");
      // Code here
      Console.WriteLine("End firefly demo");
      Console.ReadLine();
    }
    static void ShowVector(double[] v, int dec, bool nl)
    {
      for (int i = 0; i < v.Length; ++i)
        Console.Write(v[i].ToString("F" + dec) + " ");
      if (nl == true)
        Console.WriteLine("");
    }
    static double[] Solve(int numFireflies, int dim,
      int seed, int maxEpochs) { . . }
    static double Distance(double[] posA,
      double[] posB) { . . }
    static double Michalewicz(double[] xValues) { . . }
    static double Error(double[] xValues) { . . }
  } // Program
  public class Firefly : IComparable<Firefly>
  {
    // Defined here
  }
}

After the template code loaded into the Visual Studio editor, in the Solution Explorer window I renamed file Program.cs to the more descriptive FireflyProgram.cs and Visual Studio automatically renamed class Program for me. At the top of the source code, I deleted all unnecessary using statements, leaving just the reference to System.

I coded the demo using a mostly static-method technique rather than using a full object-oriented programming approach. The demo has all the control logic in Main. The Main method begins by displaying the purpose of the demo:

Console.WriteLine("Goal is to solve the Michalewicz benchmark function");
Console.WriteLine("The function has a known minimum value of -4.687658");
Console.WriteLine("x = 2.2029 1.5707 1.2850 1.9231 1.7205");

Next, the parameters needed by FA are set:

int numFireflies = 40;
int dim = 5;
int maxEpochs = 1000;
int seed = 0;

The parameter values are displayed with these statements:

Console.WriteLine("Setting numFireflies = " + numFireflies);
Console.WriteLine("Setting problem dim = " + dim);
Console.WriteLine("Setting maxEpochs = " + maxEpochs);
Console.WriteLine("Setting initialization seed = " + seed);

The firefly algorithm is invoked like so:

Console.WriteLine("Starting firefly algorithm");
double[] bestPosition = Solve(numFireflies, dim, seed, maxEpochs);
Console.WriteLine("Finished");

The Main method concludes by displaying the FA results:

Console.WriteLine("Best solution found: ");
Console.Write("x = ");
ShowVector(bestPosition, 4, true);
double z = Michalewicz(bestPosition);
Console.Write("Value of function at best position = ");
Console.WriteLine(z.ToString("F6"));
double error = Error(bestPosition);
Console.Write("Error at best position = ");
Console.WriteLine(error.ToString("F4"));

The firefly algorithm is really more of a meta-heuristic than a prescriptive algorithm. By that I mean FA is a set of design guidelines that can be adapted for many different types of optimization problems.

Understanding the Firefly Algorithm

The firefly algorithm presented in this article is based on the 2009 research paper, “Firefly Algorithms for Multimodal Optimization,” by Xin-She Yang. The firefly algorithm process is illustrated in the graph in Figure 4. The graph represents a simplified dummy minimization problem in which there are just two input values, X and Y, and the global minimum value is at X = 0 and Y = 0. There are three fireflies. Firefly[0] is at (2, 1) and so is the closest to the correct solution. Firefly[1] is at (-4, -4). Firefly[2] is at (-8, 8) and is the farthest from the solution.

The Firefly Algorithm
Figure 4 The Firefly Algorithm

Real fireflies are flying insects that glow using bioluminescence, presumably to attract mates. Each firefly can glow with a different intensity. In FA, fireflies that are better, meaning a smaller error, have higher intensity. In Figure 4, then, firefly[0] has the highest intensity, firefly[1] has intermediate intensity, and firefly[2] has weak intensity.

The basic idea of FA is that a firefly will be attracted to any other firefly that has a higher intensity, and that attractiveness (the distance moved toward a more intense firefly) is stronger if the distance between the two fireflies is smaller. So, in Figure 4, firefly[0] has the highest intensity and will not move. Firefly[1] and firefly[2] will both be attracted to and move toward firefly[0]. Because firefly[1] is closer than firefly[2] to firefly[0], firefly[1] will move a greater distance than firefly[2].

Expressed in very high-level pseudo-code, the firefly algorithm is presented in Figure 5. At first glance, the algorithm seems very simple; however, it’s quite subtle, as you’ll see when the code implementation is presented.

The first major issue is to define the intensity of a firefly. Because FA is a meta-heuristic, you are free to define intensity however you like, as long as a higher intensity is associated with a better solution/position. The next major issue is to define attraction so that closer fireflies will move toward a more-intense target more than distant fireflies will move.

Figure 5 Firefly Algorithm

initialize n fireflies to random positions
loop maxEpochs times
  for i := 0 to n-1
    for j := 0 to n-1
      if intensity(i) < intensity(j)
        compute attractiveness
        move firefly(i) toward firefly(j)
        update firefly(i) intensity
      end for
    end for
  sort fireflies
end loop
return best position found

Implementing the Firefly Algorithm

The definition of method Solve begins as:

static double[] Solve(int numFireflies, int dim, int seed, int maxEpochs)
{
  Random rnd = new Random(seed);
  double minX = 0.0;
  double maxX = 3.2;
  double B0 = 1.0;
  double g = 1.0;
  double a = 0.20;
  int displayInterval = maxEpochs / 10;
...

Local variables minX and maxX establish boundaries for each firefly’s position. The values used here, 0.0 and 3.2 (approximately Pi) are specific to the Michalewicz function. For machine learning optimization with normalized data, values of -10.0 and +10.0 are common.

Local variables B0 (base beta), g (gamma) and a (alpha) control the attractiveness of one firefly to another. The values used (1.0, 1.0, and 0.20) were recommended by the source research paper. Local variable displayInterval controls how often to display a progress message.

Next, an empty swarm of fireflies is created:

double bestError = double.MaxValue;
double[] bestPosition = new double[dim]; // Best ever
Firefly[] swarm = new Firefly[numFireflies]; // All null

A Firefly object is program-defined and encapsulates a position, an associated error and the corresponding intensity. Initially, all fireflies are null objects. The Firefly class definition will be presented in the next section of this article. Next, the swarm is instantiated and placed at random positions. For each firefly, the Firefly constructor is called:

for (int i = 0; i < numFireflies; ++i)
{
  swarm[i] = new Firefly(dim);
  for (int k = 0; k < dim; ++k) // Random position
    swarm[i].position[k] = (maxX - minX) * rnd.NextDouble() + minX;
...

The constructor implicitly sets the position to (0.0, 0.0, 0.0, 0.0, 0.0) and the associated error and intensity to dummy values of 0.0. Then each component of the position array is set to a random value between minX and maxX (0.0 and 3.2). Next, the current firefly’s error and intensity are calculated:

swarm[i].error = Error(swarm[i].position);
  swarm[i].intensity = 1 / (swarm[i].error + 1);
...

The Error function will be presented shortly. Here, the intensity of the firefly is defined to be the inverse of the error so that small error values will have high intensity and large error values will have low intensity. The 1 in the denominator prevents division by zero when error is zero. Initialization concludes by checking the newly created firefly to see if it has the best position found:

...
  if (swarm[i].error < bestError)
  {
    bestError = swarm[i].error;
    for (int k = 0; k < dim; ++k)
      bestPosition[k] = swarm[i].position[k];
  }
} // For each firefly

The main processing loop begins with these statements:

int epoch = 0;
while (epoch < maxEpochs)
{
  if (epoch % displayInterval == 0 && epoch < maxEpochs)
  {
    string sEpoch = epoch.ToString().PadLeft(6);
    Console.Write("epoch = " + sEpoch);
    Console.WriteLine(" error = " + bestError.ToString("F14"));
  }
...

An alternative to a fixed number of iterations is to break when the value of bestError drops below some small threshold value (0.00001 is common). Each firefly is compared with all other fireflies using nested for loops:

for (int i = 0; i < numFireflies; ++i) // Each firefly
{
  for (int j = 0; j < numFireflies; ++j) // Others
  {
    if (swarm[i].intensity < swarm[j].intensity)
    {
      // Move firefly(i) toward firefly(j)
...

Notice that because each for loop index starts at 0, each pair of fireflies is compared twice in each iteration of the while loop. In order to move a firefly toward another firefly with higher intensity, first the attractiveness must be calculated:

double r = Distance(swarm[i].position, swarm[j].position);
double beta = B0 * Math.Exp(-g * r * r);
...

Variable beta defines attraction and will be used in a moment to move firefly[i]. Its value depends on the square of the distance between fireflies [i] and [j], which is calculated using helper method Distance. Method Distance returns the Euclidean distance between two positions. For example, if firefly[i] in two dimensions is at (3.0, 4.0) and firefly[j] is at (5.0, 9.0), the distance between them is sqrt((5 - 3)^2 + (9 - 4)^2) = sqrt(4 + 25) = sqrt(29) = 5.4. Notice that beta uses squared distance, which is the inverse of the square root operation, so the calculation of beta could be simplified, at the expense of flexibility, if you decided to use a different measure of distance.

The actual movement is accomplished with these statements:

for (int k = 0; k < dim; ++k)
{
  swarm[i].position[k] += beta *
    (swarm[j].position[k] - swarm[i].position[k]);
  swarm[i].position[k] += a * (rnd.NextDouble() - 0.5);
  if (swarm[i].position[k] < minX)
    swarm[i].position[k] = (maxX - minX) * rnd.NextDouble() + minX;
  if (swarm[i].position[k] > maxX)
    swarm[i].position[k] = (maxX - minX) * rnd.NextDouble() + minX;
}
...

The kth component of the position of firefly[i] is moved a beta-­fraction of the distance between firefly[i] and firefly[j] toward firefly[j]. Then a small random term is added to each kth position component. This helps prevent the algorithm from getting stuck in non-optimal solutions. Each position component is checked to see if it went out of range, and if so, a random in-range value is assigned.

The nested loops movement code finishes by updating the error and intensity of the just-moved firefly:

swarm[i].error = Error(swarm[i].position);
      swarm[i].intensity = 1 / (swarm[i].error + 1);
    } // If firefly(i) < firefly(j)
  } // j
} // i each firefly
...

Method Solve concludes with these statements:

...
   Array.Sort(swarm); // low error to high
    if (swarm[0].error < bestError)
    {
      bestError = swarm[0].error;
      for (int k = 0; k < dim; ++k)
        bestPosition[k] = swarm[0].position[k];
    }
    ++epoch;
  } // While
  return bestPosition;
} // Solve

After each pair of fireflies has been compared and less intense fireflies have moved toward more intense fireflies, the array of Firefly objects is sorted from low error to high error so that the best one is at swarm[0]. This object is checked to see if a new best solution has been found. Sorting the array of Firefly objects also has the important effect of changing their location within the array so that the objects are processed in a different order each time through the while loop.

The Helper Methods

Method Solve calls helper methods Distance and Error, which in turn calls helper method Michalewicz. Helper method Distance is defined as:

static double Distance(double[] posA, double[] posB)
{
  double ssd = 0.0; // sum squared diffrences
  for (int i = 0; i < posA.Length; ++i)
    ssd += (posA[i] - posB[i]) * (posA[i] - posB[i]);
  return Math.Sqrt(ssd);
}

Helper method Michalewicz is defined as:

static double Michalewicz(double[] xValues)
{
  double result = 0.0;
  for (int i = 0; i < xValues.Length; ++i) {
    double a = Math.Sin(xValues[i]);
    double b = Math.Sin(((i+1) * xValues[i] * xValues[i]) / Math.PI);
    double c = Math.Pow(b, 20);
    result += a * c;
  }
  return -1.0 * result;
}

If you refer to the math definition of the Michalewicz function at the bottom of Figure 2, you’ll see that the function has an exponent of 2m. However, the value of m is usually set to 10, so in the code, a constant value of 20 is used. Helper method Error is defined as:

static double Error(double[] xValues)
{
  int dim = xValues.Length;
  double trueMin = 0.0;
  if (dim == 2)
    trueMin = -1.8013; // Approx.
  else if (dim == 5)
    trueMin = -4.687658; // Approx.
  double calculated = Michalewicz(xValues);
  return (trueMin - calculated) * (trueMin - calculated);
}

The error method just returns the product of the squared difference between the known minimum value of the Michalewicz function, and the calculated value. This dummy error function can be calculated very quickly, but in most machine learning scenarios, the error function can be very time consuming.

The Firefly Class

The Firefly class definition begins with:

public class Firefly : IComparable<Firefly>
{
  public double[] position;
  public double error;
  public double intensity;
...

The class inherits from the IComparable interface so that arrays and lists containing the object can be automatically sorted. The data fields are defined using public scope for simplicity. Because there’s a one-to-one mapping between error and intensity, either of those two fields could be dropped. The class constructor is:

public Firefly(int dim)
{
  this.position = new double[dim];
  this.error = 0.0;
  this.intensity = 0.0;
}

There are many design alternatives you can consider. Here the constructor simply allocates space for the position array. The only other public method is CompareTo:

public int CompareTo(Firefly other)
  {
    if (this.error < other.error) return -1;
    else if (this.error > other.error) return +1;
    else return 0;
  }
} // Class Firefly

The CompareTo method orders Firefly objects from low error to high. An equivalent alternative is to order from high intensity to low.

A Few Comments

The implementation of the firefly algorithm presented in this article is based on the seed 2009 paper. The original algorithm has spawned several variations. The research paper presents some data that suggests FA is superior to particle swarm optimization, at least on some dummy benchmark optimization problems. I’m somewhat skeptical. However, in my opinion, a scenario in which FA is very useful is when the objective function to be minimized has multiple solutions. Although it’s not entirely obvious, as it turns out, FA automatically self-organizes into sub-swarms that can find multiple solutions simultaneously.


Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Internet Explorer and Bing. Dr. McCaffrey can be reached at jammc@microsoft.com.

Thanks to the following Microsoft technical experts for reviewing this article: Todd Bello, Marciano Moreno Diaz Covarrubias and Alisson Sol