question

CharlesGardiner-8861 avatar image
0 Votes"
CharlesGardiner-8861 asked CharlesGardiner-8861 edited

Working with driver created requests - WdfRequestCreate and dedicated Queue

Perhaps the bird's eye view first. I am developing a USB driver where the application is semi real-time (i.e. constantly throwing requests at the driver, well, at least while the thread is running) and on the USB side, a microcontroller which can occasionally stall.

I would like to decouple the IRP handling from the USB side of things. My idea was to use WdfRequestCreate to create new request containing a copy of the IRP input buffer and add this driver-created request to another queue in the driver, completing the IRP when the driver-created request has been added. Thus completing to the application side as quickly as possible. I don't know whether the IRPs coming in from the application are synchronous or asynchronous and don't really want to care.

I then wanted to register an EVT_WDF_IO_QUEUE_IO_WRITE callback with the USB-side queue, possibly concatenating buffers from multiple original IRP-side requests to optimise the USB-side communication, and sending these to USB over WdfUsbTargetPipeWriteSynchronously()

Unfortunately it seems this won't work. Apparently I can only call WdfRequestForwardToIoQueue() on a request received from the framework, right? i.e. not on a driver created request? That's bit of a bummer, I want to complete the IRP-side request as expediently as possible to let the application continue if it is issuing synchronous calls.

What I haven't been able to learn from the documentation is whether it is possible to just throw multiple calls to WdfRequestSend at the USB pipe, getting the associated IoTarget with WdfUsbTargetPipeGetIoTarget() and of course assigning a clean-up callback with the request. If this is possible, are these guaranteed to execute in the order in which they are sent to the pipe?

I have also tried queueing the IRP-side request copies in a collection and just popping these off one by one calling WdfUsbTargetPipeWriteSynchronously on the USB side. To decouple the USB-side from the IRP-side, the driver has a work-item which just loops, checking the collection for content. But I see two downsides, here: firstly the collection does not seem to be thread-safe meaning I need a semaphore, don't I? And secondly the work-item is just looping wasting CPU if there is nothing in the collection to send down to USB. I gave up on this after a while as it seemed like implementing a lot of framework functionality myself again, but maybe I should revisit this approach. This has a dependency between the IRP-side and the USB-side through the WdfWaitLock (running at passive level) which I don't really like but I suppose the framework has the same essentially if I use synchronisation-scope queue.

Grateful for any thoughts, clarifications or suggestions.
Charles


windows-hardware-wdk
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

CharlesGardiner-8861 avatar image
0 Votes"
CharlesGardiner-8861 answered CharlesGardiner-8861 edited

Well, to answer my own question,

in the end I reworked the transmitting part of the USB driver which now stores 4K blocks in a collection. A 4K block contains a copy of buffer portions from the original IRP. The filling side is from the EVT_WDF_IO_QUEUE_IO_WRITE callback. The reading side is a loop in a WorkItem which repeatedly calls WdfUsbTargetPipeWriteSynchronously, popping the elements off the collection as it progresses. Synchronisation is through a WaitLock.

The 4K blocks come from a Lookaside list and I was very surprised to see that requesting look-asides very quickly eats up all system memory, so much so that the system becomes inoperable within less than a minute. Testing over Remote Desktop drops the connection when this happens. In the end I limited the collection to 4096 (@ 4K) entries and just inserted KeDelayExecutionThread loops with 5ms delay per iteration, delaying the IRP completion, to put back-pressure on the application which is busy throwing packets at the driver, much faster than USB (2.0) can consume. Without the limit, there was easily more than 300,000 elements in the collection (@4k per element).

Anyway, this concept basically seems to work so off for a bit of optimisation ...

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.