Create multiple threads from within your application

When I posted this Sample program to create multiple threads, I knew the inevitable follow-up question was “can I run my VFP code in separate threads?”. Sure enough, several people asked, citing various valid usage scenarios.

Below is a class that you can use to run your VFP code in multiple separate threads. It can create as many threads as you like, each of which is running VFP code. Because it is a multithreaded sample, it requires the multithreaded runtime, which is just a few megabytes.

The sample code uses the class by creating a routine called MyThreadFunc, which is a CPU intensive task that sums the integers from 1 to some large fixed number and then inserts the result into a table. This task is repeated a few times. The time is measured to call MyThreadFunc from N+1 different threads (including the main thread). That time is compared with calling the same code the same number of times, but only from the main thread.

On my dual processor machine (with hyperthreading on), with 10 threads, the performance was almost double the performance of a single thread, as expected. On my single processor laptop, the performance is roughly the same, with the single threaded slightly faster than the multithreaded, due to thread overhead.

If I change the task to be less CPU intensive and more shared resource intensive by making it update a single shared table multiple times, the performance gain decreases, due to contention for a single shared resource as expected.

The code requires that you have a multithreaded COM DLL built from this code: Blogs get 300 hits per hour: Visual FoxPro can count. The Ic1 interface and the MyDoCmd method are defined in that DLL.

This code really maxes out your CPU when running with many threads. In fact, while I had task man open, I saw the CPU usage at very low numbers for an instant when I expected it to be very high because Taskman didn’t get enough CPU to update its display!

Observe from the results table that various threads complete at various times, interweaving their results, meaning that in the middle of computation, the threads are swapped out.

What kind of performance numbers do you see?

Here’s how ThreeadClass works: it allocates memory for various items, such as GUIDs, strings, and the generated code. It generates machine code into a string, and calls CreateThread, pointing to that string as the Thread procedure to execute. The threads are stored in an array. CoCreateInstance is called to create an instance of the VFP COM object. That object’s MyDoCmd method is called via early binding through it’s vTable.

The vTable of the Ic1 interface (which inherits from IDispatch, which inherits from IUnknown) is expected to be mapped out like this:

0 QueryInterface IUnknown

1 AddRef IUnknown

2 Release IUnknown

3 GetTypeInfoCount IDispatch

4 GetTypeInfo IDispatch

5 GetIDsOfNames IDispatch

6 Invoke IDispatch

7 MyDoCmd Ic1

8 MyEval Ic1

For more on vTable layout see my Paper on Visual Foxpro and Advanced COM

MyDoCmd is entry #7. 7 * 4 bytes per pointer = 28, which is 1c in hex.

            sCode = sCode + CHR(0xff) + CHR(0x50) + CHR(0x1c) && call [eax+1ch] && call indirect the function at 1ch in the vTable

Likewise, the Release is entry #2, so the offset is 8.

The preamble does a PUSH EBP, MOV EBP, ESP to set up a stack frame. At the end, there is a MOV ESP, EBP, POP EBP and RET 4 to release the frame. The stack frame means local variables can be accessed via a negative offset from EBP and any parameters with a positive offset. The RET 4 pops off the single 4 byte argument (bstrArg) that was passed in and returns, terminating the thread.

If your t1.dll is in a COM + application (as mine is on some machines), then you will need to specify your particular Guid for Ic1 (found in your t1.vbr file) instead of IDispatch. Then COM will know to marshall the rest of the vTable. In a COM+ application the ProcessIds will be different:


?_vfp.ProcessId, x.MyEval("_vfp.ProcessId")

A COM+ app also might not have rights to write to the disk, and may not shut down the server when you expect: it might keep an instance around for quick activation. It also will probably have a different Current Directory, so may not find MyThreadProc.prg

All threads in the sample run the same code, but you can certainly make them run different code various ways. Also, the threads can be kept alive in a thread pool, perhaps waiting for more tasks to execute.

As an interesting exercise, try making the thread procedure just Sleep 10 seconds

            IF .t. OR p2="Thread"

                  DECLARE integer Sleep IN WIN32API integer



With 10 threads, the 11 calls to ThreadProc (including the one in the main thread), the Sleeps occur in parallel and it finishes in 10 seconds. In a single threaded app, it takes 110 seconds to execute them in series!

The machine code generated is similar to this pseudo C++ code which has no error checking



      IDispatch *ptrCOMObj;

      VARIANT vtResult;

      CoInitialize(0); // initialize COM

      CoCreateInstance(cClsId,0, CLSCTX_INPROC_SERVER+CLSCTX_LOCAL_SERVER, &cIid, &ptrCOMObj);


      ptrComObj->MyDoCmd(bstrCmd("do d:\fox90\test\MyThreadFunc WITH p2"), dwParam as Variant,vtEmpty,vtEmpty,vtEmpty,&vtResult);

      VariantClear(vtResult); //unused, but needs to be freed

      ptrCOMObj->Release(); // release the COM server



I used a C++ project in Visual Studio with inline ASM code and Show Disassembly to get the machine language bytes. My Intel Programmers Reference manual was also useful. The thread proc isn’t optimized, but it doesn’t have to be. Almost no time is spent there, compared with calling the COM server.

See also: Windows Security and how it affects running generated code

The VFP code (with minimal error checking):






CREATE TABLE ThreadLog (threadid i, timestamp t,misc c(50)) && A table into which each thread will insert results

USE ThreadLog && open shared

TEXT TO cstrVFPCode TEXTMERGE NOSHOW && generate the task to run: MyThreadFunc

      PROCEDURE MyThreadFunc(p2) && p2 is the 2nd param to MyDoCmd

            TRY && use exception handling

                  DECLARE integer GetCurrentThreadId in WIN32API

                  LOCAL i,j,k,nSum

                  FOR k = 1 TO 3 && calculate result 3 times per thread


                        FOR i = 1 TO 500000



                        INSERT INTO <<DBF()>> VALUES (GetCurrentThreadId(), DATETIME(),TRANSFORM(p2)+":"+TRANSFORM(nSum))


            CATCH TO oex

                  INSERT INTO <<DBF()>> VALUES (GetCurrentThreadId(), DATETIME(),p2+" Error: "+oex.message)





COMPILE MyThreadFunc.prg

nThreads=10 && WaitForMultipleObjects MAXIMUM_WAIT_OBJECTS = 64



      cStrIid="{00020400-0000-0000-C000-000000000046}" && IID_IDispatch

* cStrIid="{3608114E-633A-44FF-8E51-1BBCF7225018}" && IID_Ic1 from your t1.vbr file.

      ox.StartThreads(nThreads,"do "+SYS(5)+CURDIR()+"MyThreadFunc WITH p2","Thread: %threadnum",cStrIid)

      ?TRANSFORM(nThreads)+" threads created. Main thread calculating..."

      MyThreadFunc("Main") && main thread will do calculation too, rather than just being idle

      ?"Main thread done: waiting for other threads"

      ox.WaitForThreads("?'main thread waiting'") && wait til threads finish

?"Using "+TRANSFORM(nThreads)+" threads takes "+TRANSFORM(SECONDS()-nStart)+" seconds"

?"Now try single threaded:"


      FOR i = 1 TO nThreads+1 && add one for the main thread



?"Single thread takes "+TRANSFORM(SECONDS()-nStart)+" seconds"

LOCATE && Go to the first record

BROWSE LAST NOWAIT && show the results

#define CREATE_SUSPENDED 0x00000004


#define WAIT_TIMEOUT 258



#define VT_BSTR 8

DEFINE CLASS ThreadClass as session

      hProcHeap =0


      DIMENSION hThreads[1] && Handle to each thread

      cThreadHandles="" && Handle to each thread as a string rep of an int array

      PROCEDURE Init

            DECLARE integer LoadLibrary IN WIN32API string

            DECLARE integer FreeLibrary IN WIN32API integer

            DECLARE integer GetProcAddress IN WIN32API integer hModule, string procname

            DECLARE integer CreateThread IN WIN32API integer lpThreadAttributes, ;

                  integer dwStackSize, integer lpStartAddress, integer lpParameter, integer dwCreationFlags, integer @ lpThreadId

            DECLARE integer ResumeThread IN WIN32API integer thrdHandle

            DECLARE integer CloseHandle IN WIN32API integer Handle

            DECLARE integer GetProcessHeap IN WIN32API

            DECLARE integer HeapAlloc IN WIN32API integer hHeap, integer dwFlags, integer dwBytes

            DECLARE integer HeapFree IN WIN32API integer hHeap, integer dwFlags, integer lpMem

            DECLARE integer WaitForSingleObject IN WIN32API integer hHandle, integer dwMilliseconds

            DECLARE integer WaitForMultipleObjects IN WIN32API integer nCount, string pHandles, integer bWaitAll, integer dwMsecs

            DECLARE integer CLSIDFromProgID IN ole32 string lpszProgID, string @ strClSID

            DECLARE integer CLSIDFromString IN ole32 string lpszProgID, string @ strClSID

            DECLARE integer SysAllocString IN oleaut32 string wstr

            DECLARE integer SysFreeString IN oleaut32 integer bstr

            CREATE CURSOR memAllocs (memPtr i, AllocType c(1)) && track mem allocs that need to be freed: H=Heap,B=BSTR,L=Library

            this.hProcHeap = GetProcessHeap()

      PROCEDURE StartThreads(nThreads as Integer, ThreadCmd as String, ThreadProcParam as String,cStrIid as String )

            this.nThreads = nThreads


            IF CLSIDFromProgID(STRCONV("t1.c1"+CHR(0),5),@cClsId)!= 0 && dual interface

                  ?"Error: class not found"





            nLocals = 30 && sufficiently large for local vars

            sCode="" && generate machine code for thread procedure into a string

* sCode = sCode + CHR(0xcc) && int 3 DebugBreak() to attach a debugger

            sCode = sCode + CHR(0x55) && push ebp

            sCode = sCode + CHR(0x8b) + CHR(0xec) && mov ebp, esp

            sCode = sCode + CHR(0x81) + CHR(0xec)+BINTOC(nLocals * 4, "4rs") && sub esp, nLocals

            sCode = sCode + CHR(0x6a) + CHR(0x00) && push 0

            sCode = sCode + this.CallDllFunction("CoInitialize", "ole32")


            sCode = sCode + this.GenCodeAtPoint("BeforeStart")

            sCode = sCode + CHR(0x8d) + CHR(0x45)+CHR(0xf0) && lea eax, [ebp-10h] && addr to put COM ptr

            sCode = sCode + CHR(0x50) && push eax

            sCode = sCode + CHR(0xb8) + BINTOC(this.MakeStr(cIid),"4rs") && mov eax, str

            sCode = sCode + CHR(0x50) && push eax

            sCode = sCode + CHR(0xb8) + BINTOC(CLSCTX_INPROC_SERVER+CLSCTX_LOCAL_SERVER,"4rs") && mov eax, val

            sCode = sCode + CHR(0x50) && push eax

            sCode = sCode + CHR(0x6a) + CHR(0x00) && push 0

            sCode = sCode + CHR(0xb8) + BINTOC(this.MakeStr(cClsId),"4rs") && mov eax, str

            sCode = sCode + CHR(0x50) && push eax

            sCode = sCode + this.CallDllFunction("CoCreateInstance", "ole32")

            sCode = sCode + this.GenCodeAtPoint("Creating")

            sCode = sCode + CHR(0x8d) + CHR(0x45)+CHR(0xe0) && lea eax, [ebp-20h] && local var to get the vtResult of the COM call

            sCode = sCode + CHR(0x50) && push eax

            sCode = sCode + this.CallDllFunction("VariantInit", "oleaut32") && Initialize the vtResult

            *call MyDoCmd via early binding. First push the parms

            sCode = sCode + CHR(0x8d) + CHR(0x45)+CHR(0xe0) && lea eax, [ebp-20h] && pass the address of vtResult for return value

            sCode = sCode + CHR(0x50) && push eax

            *Now we need to push 3 empty variants, each of which is 4 DWORDS

            sCode = sCode + CHR(0x33) + CHR(0xc0) && xor eax,eax

            sCode = sCode + REPLICATE(CHR(0x50),12) && push eax12 times


            *2nd param is P2:

            sCode = sCode + CHR(0x33) + CHR(0xc0) && xor eax,eax

            sCode = sCode + CHR(0x50) && push eax

            sCode = sCode + CHR(0x8b) + CHR(0x45) + CHR(0x8) && mov eax,[ebp+8] && Form the P2 param as a Variant from the BSTR arg from the parent thread

            sCode = sCode + CHR(0x50) && push eax

            sCode = sCode + CHR(0x33) + CHR(0xc0) && xor eax,eax

            sCode = sCode + CHR(0x50) && push eax

            sCode = sCode + CHR(0xb8) + BINTOC(VT_BSTR,"4rs") && mov eax, VT_BSTR

            sCode = sCode + CHR(0x50) && push eax


            *1st param is the expr for VFP to Exec.

            sCode = sCode + CHR(0xb8)+BINTOC(this.MakeStr(ThreadCmd,.t.,.t.),"4rs") && mov eax, cExpr (p2 is 2nd param to MyDoCmd)

            sCode = sCode + CHR(0x50) && push eax

            *Now make the call

            sCode = sCode + CHR(0x8b) + CHR(0x45) + CHR(0xf0) && mov eax, [ebp-10h] && the COM ptr

            sCode = sCode + CHR(0x50) && push eax

            sCode = sCode + CHR(0x8b) + CHR(0) && mov eax, [eax]&& et the vTable

            sCode = sCode + CHR(0xff) + CHR(0x50) + CHR(0x1c) && call [eax+1ch] && call indirect the function at 1ch in the vTable

            sCode = sCode + this.GenCodeAtPoint("Calling")

            *Free the return value with VariantClear because it's ignored

            sCode = sCode + CHR(0x8d) + CHR(0x45)+CHR(0xe0) && lea eax, [ebp-20h]

            sCode = sCode + CHR(0x50) && push eax

            sCode = sCode + this.CallDllFunction("VariantClear", "oleaut32")

            sCode = sCode + this.GenEndCode(.t.)

            AdrCode=this.memAlloc(LEN(sCode),sCode) && allocate memory for the code

            DIMENSION this.hThreads[nThreads]


            FOR i = 1 TO nThreads


                  this.hThreads[i] = CreateThread(0,8192, AdrCode, bstrArg, 0, 0)

                  this.cThreadHandles = this.cThreadHandles+BINTOC(this.hThreads[i],"4rs") && put the handles into a string rep of an int array


      PROCEDURE GenCodeAtPoint(nPoint as String) as String && derived classes can override to gen code to exec at various points

            RETURN ""

      PROCEDURE GenEndCode(fRelease as Boolean) as String && generate code to end thread

            LOCAL sCode


            IF fRelease && do we also release COM obj?


                  sCode = sCode + CHR(0x8b) + CHR(0x45) + CHR(0xf0) && mov eax, [ebp-10h]

                  sCode = sCode + CHR(0x50) && push eax && push the THIS ptr

                  sCode = sCode + CHR(0x8b) + CHR(0) && mov eax, [eax]&& get the vTable

                  sCode = sCode + CHR(0xff) + CHR(0x50) + CHR(0x8) && call [eax+8h]



            sCode = sCode + this.GenCodeAtPoint("BeforeEnd")

            sCode = sCode + this.CallDllFunction("CoUninitialize", "ole32")

            sCode = sCode + CHR(0x33) + CHR(0xc0) && xor eax,eax && make ExitCodeThread= 0

            sCode = sCode + CHR(0x8b) + CHR(0xe5) && mov esp, ebp

            sCode = sCode + CHR(0x5d) && pop ebp

            sCode = sCode + CHR(0xc2)+CHR(0x04)+CHR(0x00) && ret 4

            RETURN sCode

      PROCEDURE WaitForThreads(cExpr as String)

            DO WHILE WaitForMultipleObjects(this.nThreads, this.cThreadHandles, 1, 500) = WAIT_TIMEOUT && wait msecs for the threads to finish

                  &cExpr && execute any passed in param while waiting


            FOR i = 1 TO this.nThreads



      PROCEDURE MemAlloc(nSize as Integer, cStr as String) as Integer

            LOCAL nAddr

            nAddr = HeapAlloc(this.hProcHeap, 0, nSize) && allocate memory

            ASSERT nAddr != 0 MESSAGE "Out of memory"

            INSERT INTO memAllocs VALUES (nAddr,"H") && track them for freeing later

            SYS(2600,nAddr, LEN(cStr),cStr) && copy the string into the mem

            RETURN nAddr

      PROCEDURE CallDllFunction(strExport as String, strDllName as String) as String

            *Create a string of machine code that calls a function in a DLL. Parms should already be pushed

            LOCAL nAddr as Integer, hModule as Integer

            hModule = LoadLibrary(strDllName)

            INSERT INTO memAllocs VALUES (hModule,"L") && track loads for freeing later


            ASSERT nAddr != 0 MESSAGE "Error: Export not found "+ strExport+" "+ strDllName

            RETURN CHR(0xb8)+BINTOC(nAddr,"4rs") + CHR(0xff) + CHR(0xd0) && mov eax, addr; call eax

      PROCEDURE MakeStr(str as String, fConvertToUnicode as Logical, fMakeBstr as Logical) as Integer

            * converts a string into a memory allocation and returns a pointer

            LOCAL nRetval as Integer

            IF fConvertToUnicode



                  str = str + CHR(0) && null terminate


            IF fMakeBstr

                  nRetval= SysAllocString(str)

                  ASSERT nRetval != 0 MESSAGE "Out of memory"

                  INSERT INTO memAllocs VALUES (nRetval,"B") && track them for freeing later


                  nRetval= this.MemAlloc(LEN(str),str)


            RETURN nRetval

      PROCEDURE Destroy

            SELECT memAllocs


                  DO CASE

                  CASE AllocType="B" && BSTR


                  CASE AllocType="H" && Heap


                  CASE AllocType="L" && LoadLibrary