.NET Matters: StringStream, Methods with Timeouts

Article
10/18/2019

.NET Matters

StringStream, Methods with Timeouts

Stephen Toub

Code download available at:NETMatters0507.exe(132 KB)

Q I'm trying to use a method that accepts a Stream as a parameter. However, the data I want to pass to that method is currently stored only as a string. Can you tell me what is the best way for me to pass the string to this method?

A There's not necessarily a best way, but there are a few solutions. And as with many scenarios in which you're provided with multiple approaches, you need to weigh space demands against time demands to choose the most appropriate solution. For example, is the string in question small? If so, consider copying that string into a System.IO.MemoryStream. With a small string, the duplicate data probably won't break the memory bank:

MemoryStream memStream = new MemoryStream(); byte [] data = Encoding.Unicode.GetBytes(theString); memStream.Write(data, 0, data.Length); memStream.Position = 0;

MemoryStream memStream = new MemoryStream(); byte [] data = Encoding.Unicode.GetBytes(theString); memStream.Write(data, 0, data.Length); memStream.Position = 0;

If you plan on doing this frequently, you might consider refactoring this functionality into its own class. The easiest way is to derive a class from MemoryStream, and then take advantage of MemoryStream's constructor that accepts the byte array of data to be contained within the stream. Moreover, you can use a related constructor that also accepts a Boolean value:

public MemoryStream(byte [] buffer, bool writable) { ... }

Setting the writable parameter to false forces the MemoryStream to return false from its CanWrite property and to prevent any write operations on the stream. This makes the stream immutable, which makes sense given that the source string is, by definition, also immutable. Such a class is shown here:

public class StringStream : MemoryStream { public StringStream(string str) : base(Encoding.Unicode.GetBytes(str), false) {} }

To pass your string to the method that accepts a Stream, now all you have to do is create a StringStream (passing your string to its constructor) and send that stream into the method.

Of course, this approach effectively doubles the amount of memory in use for the string's data, which might be prohibitive if the string is big. Moreover, if the string is sufficiently large, there's a good chance the allocated byte array could end up on the Large Object Heap. This means that even if the method call is brief and the array isn't used after the method call, the garbage collector won't clean up the array until it performs a full collection.

Figure 1 shows an alternative: a custom Stream-derived class. As opposed to the simple StringStream implementation that I showed previously, this implementation is meant to minimize the amount of overhead required to wrap a string as a stream.

Figure 1 Alternate StringStream Implementation

public class StringStream : Stream { private readonly string _str; private readonly long _byteLength; private int _position; public StringStream(string str) { if (str == null) throw new ArgumentNullException("str"); _str = str; _byteLength = _str.Length*2; _position = 0; } public override bool CanRead { get { return true; } } public override bool CanSeek { get { return true; } } public override bool CanWrite { get { return false; } } public override long Length { get { return _byteLength; } } public override long Position { get { return _position; } set { if (value < 0 || value > int.MaxValue) throw new ArgumentOutOfRangeException("Position"); _position = (int)value; } } public override long Seek(long offset, SeekOrigin origin) { switch(origin) { case SeekOrigin.Begin: Position = offset; break; case SeekOrigin.End: Position = _byteLength + offset; break; case SeekOrigin.Current: Position = Position + offset; break; } return Position; } public override int Read(byte[] buffer, int offset, int count) { if (_position < 0) throw new InvalidOperationException(); int bytesRead = 0; while(bytesRead < count) { if (_position >= _byteLength) return bytesRead; char c = _str[_position / 2]; buffer[offset + bytesRead] = (byte)((_position % 2 == 0) ? c & 0xFF : (c >> 8) & 0xFF); Position++; bytesRead++; } return bytesRead; } public override void Write(byte[] buffer, int offset, int count) { throw new NotSupportedException(); } public override void Flush() { throw new NotSupportedException(); } public override void SetLength(long value) { throw new NotSupportedException(); } }

Deriving from Stream requires the implementation of 10 abstract methods and properties: CanRead, CanSeek, CanWrite, Flush, Length, Position, Read, Seek, SetLength, and Write. For an immutable stream, CanWrite should return false, and the implementations of Flush, Write, and SetLength should all throw a NotSupportedException (they all deal with changes to the stream's data). CanRead should return true; otherwise, there would be no point to this wrapper.

The constructor accepts as a parameter the string to be wrapped and stores it. It then computes the length of the stream, which is the length of the string multiplied by two, since a System.String is a series of Unicode UTF-16 code points and each code point is 2 bytes. This value is exposed by the Stream's Length property.

The Position property is very straightforward, providing access to the current byte position within the stream. The Seek method then takes advantage of the Position property, using it to compute a new position based on the provided arguments and then storing that new value back to Position.

That just leaves the Read method, the core of the class. A client calls the Read method, passing in a byte array to which the read data should be stored, an offset into the array at which the data should start, and the number of bytes to be read. The method then returns the number of bytes read. To make the implementation simple, I loop until the requested number of bytes has been read (or until there are no more to read), storing 1 byte each time through the loop. This loop consists of picking up the next character from the string and extracting from it the appropriate byte by using some simple bit manipulation. I could have used System.BitConverter.GetBytes, but that would result in a 2-byte array being allocated each time I retrieved the bytes for a character, and since the whole point of this exercise is to avoid extraneous allocations for large strings, that would have been a bit counterintuitive.

Q I'd like to execute a method, but only allow it to run for a certain amount of time. Is there any way to do this?

A From a reliability perspective, a good way to do this is to incorporate knowledge of the time limit into the method you're calling. Internally, that method can poll for the amount of time it's used. Alternatively, if it blocks for a resource to become available, it can wait for only a specified period of time if that capability is available for the resource. One example of this is using Monitor.TryEnter instead of Monitor.Enter, or using WaitHandle.WaitOne(TimeSpan, bool) instead of WaitHandle.WaitOne(). Of course, if doing so is infeasible, if you don't want to mess with the method's logic to account for this, or if the code in the method to be executed is out of your control, you need an alternate approach.

One solution comes in the form of the Abort method on System.Threading.Thread. Thread.Abort raises a ThreadAbortException in the thread corresponding to the Thread object on which it is invoked, beginning the process of terminating that thread. Thus, you can create a separate thread to run the method in question and then wait the amount of time allocated for that method to complete. If it doesn't complete in the time allotted, you can abort the thread, which in most situations will cause the method to stop executing. I've implemented this solution in the TimedOperation class, shown in Figure 2.

Figure 2 Implementing Timeouts for Method Calls

public class TimedOperation { public static object Invoke(TimeSpan time, out bool aborted, Delegate method, params object[] parameters) { if (method == null) throw new ArgumentNullException("method"); if (time.TotalSeconds <= 0) throw new ArgumentOutOfRangeException("time"); Operation op = new Operation(); op.Method = method; op.Parameters = parameters; Thread t = new Thread(new ThreadStart(op.Run)); t.IsBackground = true; t.Start(); if (!t.Join(time)) { aborted = true; t.Abort(); // dangerous! return null; } else { aborted = false; if (op.Exception != null) throw op.Exception; return op.Result; } } private class Operation { public volatile Delegate Method; public volatile object [] Parameters; public volatile Exception Exception; public volatile object Result; public void Run() { try { this.Result = Method.DynamicInvoke(Parameters); } catch(ThreadAbortException) { throw; } catch(Exception exc) { this.Exception = exc; } } } }

Let's say you want to invoke the method YourMethod, allowing it only five seconds to run. You could use TimedOperation to do this, as shown in the following code:

bool timeExpired; object result = TimedOperation.Invoke(TimeSpan.FromSeconds(5), out timeExpired, new SomeDelegate(YourMethod), new object[]{...}); if (timeExpired) Console.WriteLine("Ran out of time."); else Console.WriteLine("Result: {0}", result);

The first parameter is the amount of time for which you want to let the method run. When TimedOperation.Invoke returns, the second parameter, an out Boolean, indicates whether the called method returned on its own accord or due to the thread being aborted when time ran out. The third and fourth parameters are the delegate to be executed and the parameters to be used for this delegate's invocation.

TimedOperation.Invoke first checks to make sure that the parameters are valid, namely that a delegate was supplied and that the amount of time allowed is positive (it wouldn't make much sense to allow the method negative time). The delegate and the parameters are then packaged into an instance of the internal Operation class. This class provides the method that will be run on the temporary thread. The user-supplied delegate can't be used directly as the ThreadStart method for several reasons. First and foremost, doing so would only allow TimedOperation to work with methods that accept no parameters and that return no value. Second, it would be more difficult to rethrow any exceptions thrown from this method to the caller of TimedOperation.Invoke.

With this Operation instance in hand, a new background thread is started with the Operation's Run method. The Run method simply executes the user-supplied delegate within a try block: any exceptions besides ThreadAbortException thrown from the delegate are caught and stored in the Operation instance, along with any value returned by the delegate, if any. After starting the thread, the Invoke method waits for the worker thread to finish processing by using the Thread.Join overload that accepts a timeout. If Thread.Join returns true, it means that the worker thread completed within the time specified. In this case, Invoke rethrows any exceptions that might have been thrown by the delegate, or it returns the value returned by the delegate (any out or ref parameters supplied to the delegate will be accessible to the caller of TimedOperation.Invoke through the parameters argument). If, however, Thread.Join returns false, it means that the worker thread did not complete in the time allotted. As a result, Invoke calls Abort on the thread object and returns null.

This technique is used by the Microsoft® .NET Framework in a couple of places, the most prominent in ASP.NET. ASP.NET implements an execution timeout for all requests, which by default is 90 seconds. The HttpContext for each request, along with the Thread that's processing it, is stored on a list which is periodically culled by a worker thread in the ASP.NET runtime. If a request is found on this queue past its execution timeout, the worker thread cancels the request using Thread.Abort to abort the relevant thread.

I mentioned earlier that TimedOperation.Invoke will in most situations cause the thread to stop executing, but there are, in fact, situations in which it will not. ThreadAbortException is a special exception in that it can be caught but, when the catch block finishes processing, the exception is rethrown by the runtime. However, if code has been granted the ControlThread security permission, it is able to explicitly ignore any ThreadAbortExceptions thrown, using the Thread.ResetAbort method to prevent the runtime from rethrowing the exception. In this scenario, the method you invoke could continue to run even after you try to stop it. Additionally, ThreadAbortException will still allow the relevant catch and finally blocks to run. Any one of these blocks could enter into an infinite loop, preventing the thread from being aborted in a graceful manner. As a third example, the common language runtime (CLR) is only able to abort threads when they are executing managed code; any thread a user tries to abort that is currently executing unmanaged code won't be aborted until it returns to the managed world.

Unfortunately, there are possibly worse problems with this approach. The point of Thread.Abort is to stop a thread running, regardless of what it's doing, and in general it's very good at its job. Consider the following code:

IntPtr someMemory = Marshal.AllocHGlobal(1024); try { ... } finally { Marshal.FreeHGlobal(someMemory); }

If this code is executing on the thread you're aborting, and it just so happens that the abort happens after the call to AllocHGlobal and before the pointer to that allocated memory is stored to the someMemory local variable, or if it happens after the value is stored to someMemory but before the try block is entered, congratulations, you have a memory leak. Or what if the thread in question entered a lock (such as a Mutex) and was aborted before it could exit it? Congratulations, you've now orphaned the lock, deadlocking other threads that later attempt to acquire that lock. These types of constructs don't even need to be in your own code in order to run into these types of problems. Using any class from the .NET Framework that relies on creating Win32® objects and receiving handles to them, such as System.IO.FileStream, puts you at risk.

The .NET Framework 2.0 introduces new reliability features, including SafeHandles and Constrained Execution Regions, which will help to make your applications much more reliable in the face of asynchronous exceptions such as ThreadAbortException. However, you will still need to be careful when writing code that uses Thread.Abort, as it can cause unexpected failures.

Consider a type that makes use of a static constructor. If this static constructor throws an exception, any future attempts to run that constructor will result in the same exception being thrown, wrapped in a TypeInitializationException. So, if you abort a thread while it is running a static constructor, a ThreadAbortException will be raised within the constructor, thereby most likely preventing the use of that type in the current application domain. If you knew there was a chance that a specific type's static constructor might be aborted in this fashion, you could attempt a preemptive strike by using the RunClassConstructor method on the System.Runtime.CompilerServices.RuntimeHelpers class to run the type's static constructor before the thread is aborted, but that's nothing if not a hack.

If you have some knowledge of the method you want to run with a timeout, you might be able to avoid using Thread.Abort and instead opt for Thread.Interrupt, which is much safer from a reliability perspective but isn't as immediate in its results as Thread.Abort. Thread.Interrupt throws a ThreadInterruptException on the target thread, but only when the thread is in a wait, sleep, or join state; if the target thread is in none of these states, an exception won't be thrown until the thread enters one of them. This is safer than Thread.Abort because a thread will not be in one of these states unless it is explicitly coded to be there (such as with a call to Monitor.Enter, Thread.Sleep, or Thread.Join), and thus the previous example using Marshal.AllocHGlobal would be more reliable, unless of course it was coded poorly, such as in the following (purposely silly) way:

IntPtr someMemory = Marshal.AllocHGlobal(1024); Thread.Sleep(10); try { ... } finally { Marshal.FreeHGlobal(someMemory); }

The flip side to this is that the target thread may never exit. Consider the following method:

void LoopForever() { while(true); }

This method immediately enters an infinite loop, and while a Thread.Abort would cause its operation to cease, it never enters the WaitSleepJoin thread state, and thus a Thread.Interrupt will be unable to stop it.

The moral of the story is that it's best to avoid Thread.Abort whenever possible, but on the other hand there's no better way to stop a method's execution if you have no control over the implementation of the method in question. Just be aware that by using Thread.Abort without extreme care, you could possibly be putting the current application domain, or even worse the whole process, into an inconsistent state.

Send your questions and comments to netqa@microsoft.com.

Stephen Toub is the Technical Editor for MSDN Magazine.

Additional resources