From the February 2002 issue of MSDN Magazine

Article
02/26/2008

MSDN Magazine

New information has been added to this article since publication.
Refer to the Editor's Update below.

Array Types in .NET

Jeffrey Richter

rrays are mechanisms that allow you to treat several items as a single collection. The Microsoft® .NET Common Language Runtime (CLR) supports single-dimensional arrays, multidimensional arrays, and jagged arrays (arrays of arrays). All array types are implicitly derived from System.Array, which itself is derived from System.Object. This means that all arrays are always reference types which are allocated on the managed heap, and your app's variable contains a reference to the array and not the array itself.
The following code makes this point clearer:

  Int32[] myIntegers;            // Declares a reference to an array
myIntegers = new Int32[100];   // Create array of 100 Int32s

On the first line, myIntegers is a variable that is capable of pointing to a single-dimensioned array of Int32s. Initially, myIntegers will be set to null since I have not allocated an array. The second line shown above allocates an array of 100 Int32 values; all of the Int32s are initialized to 0. Even though Int32s are value types, the memory block large enough to hold these values is allocated from the managed heap. The memory block contains 100 unboxed Int32 values. The address of this memory block is returned and saved in the variable myIntegers.
You can also create arrays of reference types:

  Control[] myControls;       // Declares a reference to an array
myControls=new Control[49]; // Create array of 49 Control references

On the first line, myControls is a variable that is capable of pointing to a single-dimensioned array of Control references. Initially, myControls will be set to null since I have not declared an array. The second line allocates an array of 49 Control references; all of these references are initialized to null. Since Control is a reference type, creating the array only creates references; the actual objects are not created at this time. The address of this memory block is returned and saved in the variable myControls.

Figure 1 Arrays in the Managed Heap

Figure 1 shows how arrays of value types and arrays of reference types look in the managed heap. [Editor's Update - 7/20/2004: The text has been updated to match Figure 1.] The Controls array show the result after the following lines have executed:

  myControls[1] = new Button();
myControls[2] = new TextBox();
myControls[3] = myControls[2]; // 2 elements refer to the same object
myControls[46] = new DataGrid();
myControls[47] = new ComboBox();
myControls[48] = new Button();

      Common Language Specification (CLS) compliance requires that all arrays be zero-based. This allows a method written in C# to create an array and pass the array's reference to code written in another language such as Visual Basic®. In addition, since zero-based arrays are by far the most common, Microsoft has spent a lot of time optimizing their performance. However, the CLR does support non-zero-based arrays but they are discouraged. For those of you who do not care about performance and cross-language portability, I will demonstrate how to create and use non-zero-based arrays later in this section.
      You'll notice that Figure 1 shows that each array has some additional overhead information associated with it. This information contains the rank of the array (number of dimensions), the lower bounds for each dimension of the array (almost always 0), and the length of each dimension. The overhead also contains the type of each element in the array. Shortly, I'll mention the methods that allow you to query this overhead information.
      So far, I've shown examples demonstrating how to create single-dimension arrays. When possible, you should try to stick with single-dimensioned, zero-based arrays (sometimes referred to as SZ arrays or vectors). This is because SZ arrays give the best performance since there are specific intermediate language (IL) instructions (such as newarr, ldelem, ldelema, ldlen, and stelem) for manipulating them. However, if you prefer to work with multidimensional arrays, you may do so. Here are some examples to choose from:

  // Create a 2-dimensional array of Doubles
Double[,] myDoubles = new Double[10, 20];

// Create a 3-dimensional array of Strings
String[,,] myStrings = new String[5, 3, 10];

As mentioned earlier, the CLR also supports jagged arrays and single-dimensional arrays; zero-based jagged arrays have the same performance as normal SZ arrays. However, accessing elements of a jagged array means that two or more array accesses must occur. Note that jagged arrays are not CLS-compliant (because the CLS doesn't allow a System.Array object to be an element of an array) and cannot be passed between code written in different programming languages. Fortunately, C# does support jagged arrays. Here are some examples of how to create an array of polygons, where each polygon consists of an array of Point instances:

  // Create a 1-dimensional array of Point-arrays
Point[][] myPolygons = new Point[3][];

// myPolygons[0] refers to an array of 10 Point instances
myPolygons[0] = new Point[10]; 

// myPolygons[1] refers to an array of 20 Point instances
myPolygons[1] = new Point[20];

// myPolygons[2] refers to an array of 30 Point instances
myPolygons[2] = new Point[30];
// Display the Points in the second polygon
for (Int32 x = 0, l = myPolygons[1].Length; x < l; x++) 
   Console.WriteLine(yPolygons[1][x]);

Note that the CLR always verifies that an index into an array is valid. In other words, you cannot create an array with 100 elements in it (numbered 0 through 99) and then try to access the element at index 100 or -5. Attempting to do so will cause a System.IndexOutOfRangeException to be thrown. Allowing access to memory outside the range of an array would be a breach of type safety and a potential security hole and the CLR doesn't allow verifiable code to do this sort of access.
Usually, the performance associated with index checking is not substantial because the just-in-time (JIT) compiler normally checks array bounds once before a loop executes instead of once for each loop iteration. However, if you're still concerned about a performance hit of the CLR's index checks, then you can use unsafe code in C# to access the array.

All Arrays are Implicitly Derived from System.Array

The System.Array type offers several static and instance members. Since all arrays are implicitly derived from System.Array, these members can be used to manipulate arrays of value types or reference types. You'll also note that Array implements several interfaces: ICloneable, IEnumerable, ICollection, and IList. These interfaces allow arrays to be conveniently used in many different scenarios. Figure 2 summarizes the methods offered by System.Array and the interfaces that it implements.

Casting Arrays

For arrays with reference type elements, the CLR allows you to implicitly cast the source array's element type to a target type. For the cast to succeed, both array types must have the same number of dimensions and there must be an implicit or explicit conversion from the source element type to the target element type. The CLR does not allow the casting of arrays with value type elements to any other type. However, using the Array.Copy method, you can create a new array that has the desired effect (see Figure 3). The Array.Copy method is incredibly useful and is used frequently by the .NET Framework Class Library. Figure 4 shows another example of the usefulness of Copy.

Passing and Returning Arrays

      Arrays are always passed by reference to a method. Since the CLR doesn't support the notion of constant parameters, this means that the method is able to change the elements in the array. If you don't want to allow the method to modify the elements, then you must make a copy of the array and pass the copy into the method. Note that the Array.Copy method does a shallow copy, and therefore if the array's elements are reference types, the new array refers to the already existing objects.
      To obtain a deep copy, you may want to clone the individual elements, but this requires that each object's type implements the ICloneable interface. Alternatively, you could serialize each object to a System.IO.MemoryStream and then immediately deserialize the memory stream to construct a new object. Depending on the object's types, the performance of these operations can be prohibitive and not all types are serializable either.
      Similarly, some methods return a reference to an array. If the method constructs and initializes the array, then returning a reference to the array is fine. But if the method wants to return a reference to an internal array maintained by a field, then you must decide if you want the method's caller to have direct access to this array. If you do want it to have access, then just return the array's reference. Most often you do not, and the method should construct a new array and call Array.Copy, returning a reference to the new array. Again, you may want to clone each of the objects before returning the array reference.
      If you define a method that is to return a reference to an array and that array has no elements in it, then your method can either return null or a reference to an array with zero elements in it. When you are implementing this kind of method, you are strongly encouraged to implement the method returning a zero-length array because it simplifies the code that a developer calling the method must write. For example, this easy-to-understand code runs correctly even if there are no appointments to iterate over:

  // This code is easier to write and understand 
Appointment[] appointments = GetAppointmentsForToday();
for (Int32 a = 0, l = appointments.Length; a < l; a++) {
   •••
}

In comparison to the previous code, the following code runs correctly if there are no appointments to iterate over, but it's slightly more difficult to write and understand:

  // This code is harder to write an understand
Appointment[] appointments = GetAppointmentsForToday();
if (appointments != null) {
   for (Int32 a = 0, l = appointments.Length; a < l; a++) {
      •••
   }
}

If you always design your methods so that they return arrays with zero elements instead of null, then callers of your methods will have an easier time working with them. By the way, you should do the same for fields as well. If your type has a field that is a reference to an array, you should always try to have the field refer to an array even if the array has no elements in it. Allowing the field to be null will just needlessly complicate the use of your type.

Creating Arrays with a Non-zero Lower Bound

Earlier, I mentioned that it is possible to create and work with arrays that have non-zero lower bounds. You can dynamically create your own arrays by calling Array's static CreateInstance method. There are several overloads of this method, but they all allow you to specify the type of the elements in the array, the number of dimensions in the array, the lower bounds of each dimension, and the number of elements in each dimension. CreateInstance allocates memory for the array, saves the parameter information in the descriptor portion of the array's memory block, and returns a reference to the array. You can cast the reference returned from CreateInstance to a variable so that it is easier for you to access the elements in the array.
The following code demonstrates how to dynamically create a two-dimensional array of System.Decimal values.

  // We want a 2-dim array [1995..2004][1..4]
Int32[] lowerBounds = { 1995, 1 };
Int32[] lengths     = {   10, 4 };
Decimal[,] quarterlyRevenue = (Decimal[,]) 
   Array.CreateInstance(typeof(Decimal), lengths, lowerBounds);

The first dimension represents calendar years and goes from 1995 to 2004, inclusive. The second dimension represents quarters and goes from 1 to 4, inclusive. The code in Figure 5 iterates over all the elements in the dynamic array. I could've hardcoded the array's bounds into the code, which would've improved performance. But I decided to use some of System.Array's GetLowerBound and GetUpperBound methods for demonstration purposes.

Fast Array Access

Every time an element of an array is accessed, the CLR ensures that the index is within the array's bounds. This prevents you from accessing memory that is outside of the array, potentially corrupting other objects. If ever an invalid index is used to access an array element, the CLR throws a System.IndexOutOfRangeException.
As you might expect, the CLR's index checking incurs a performance cost. If you have confidence in your code and if you don't mind resorting to non-verifiable (unsafe) code, then there is a way to access an array without having the CLR perform its index checking (see Figure 6). To compile this code, you should use the following command line:

  csc.exe /unsafe UnsafeArrayAccess.cs

After building this small application, running it produces the following results:

Note that only five values should appear, but six values appear instead. This is due to a bug in the source code. In the for loop, the test expression should be x < l; not x <= l. This demonstrates how careful you must be when using unsafe code.
By the way, if you use ILDasm.exe to examine the IL for Main, you'll see the code in Figure 7, which I've commented. For comparative purposes, here is a version that doesn't use unsafe code:

  using System;

  class App {
    static void Main() {

        Int32[] arr = new Int32[] { 1, 2, 3, 4, 5 };

        for (Int32 x = 0, l = arr.Length; x <= l; x++) {
          Console.WriteLine(arr[x]);
        }
    }
  }

If you build this and use ILDasm to examine the IL code, you'd see the code in Figure 8.
It's true that there is less IL code for the type-safe version. However, it is the type-safe version's ldelem instruction that causes the CLR to do index checking. The unsafe version uses the ldind.i4 instruction instead; this simply obtains a 4-byte value from a memory address. Note that you can only use unsafe techniques to access an array of unmanaged types which includes SByte, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Char, Single, Double, Decimal, Boolean, and enumerated type, or a value type structure whose fields are any of the aforementioned types.

Redimensioning an Array

Array's static CreateInstance method allows you to dynamically construct an array when you don't know at compile time the types of elements that the array is to maintain. It is also useful when you don't know at compile how many dimensions the array is to have and the bounds of those dimensions. In the "Creating Arrays with a Non-zero Lower Bound" section, I already demonstrated how to dynamically construct an array using arbitrary bounds. The CreateInstance method can also be used to redimension an arbitrary array (see Figure 9). If you build and run this application, you'll see the following output:

  1 2 3
1 2 3 0 0
1 2

Send questions and comments for Jeff to dot-net@microsoft.com.

Jeffrey Richter is the author of Programming Applications for Microsoft Windows (Microsoft Press, 1999), and is a cofounder of Wintellect (http://www.Wintellect.com), a software education, debugging, and consulting firm. He specializes in programming/design for .NET and Win32. Jeff is currently writing a Microsoft .NET Framework programming book and offers .NET seminars.