Embedding Python in Multi-Threaded C/C++ Applications
Developers often make use of high-levelscripting languages as a way of quickly writing flexible code.Various shell scripting languages have long been used to automateprocesses on UNIX systems. More recently, software applicationshave begun to provide scripting layers that allow the user toautomate common tasks or even extend the feature set. Think of allthe well-known applications you use: GIMP, Emacs, Word, Photoshop,etc. It seems as though all can be scripted in some way.
In this article, I will describe how you can embed the Pythonlanguage within your C applications. There are many reasons youwould want to do this. For instance, you may want to provide yourmore advanced users with the ability to alter or customize theprogram. Or maybe you want to take advantage of a Pythoncapability, rather than implement it yourself. Python is a goodchoice for this task because it provides a clean, intuitive C API.Since many complex applications are written using threads, I willalso show you how to create a thread-safe interface to the Pythoninterpreter.
All the examples assume you are using Python version 1.5.2,which comes pre-installed on most recent Linux distributions. TheAPI to access the Python interpreter is the same for both C andC++. There are no special C++ constructs used, and all functionsare declared extern “C”. For this reason, the concepts describedand the example code given here should work equally well when usingeither C or C++.
There are two ways that C and Python code can work togetherwithin the same process. Simply put, Python code can call C code orC code can call Python code. These two methods are called“extending” and “embedding”, respectively. When extending, youcreate a new Python module implemented in C/C++. This allows you toprovide new functionality to the Python language that cannot beimplemented in Python. For instance, several core Python modulessuch as “time” and “nis” are implemented as C extensions, whileothers are written in Python. You never notice the differencebetween C and Python modules, because the act of importing andusing these modules is the same. If you look around in your/usr/lib/python1.5 directory, you may see some shared library files(extension .so). These are Python module extensions written in C.You will also see various Python files (extension .py) which aremodules written in Python.
Typically, when you embed Python, you will develop a C/C++application that has the ability to load and execute Pythonscripts. The application will be linked against the Pythoninterpreter library, called libpython1.5.a, which provides allfunctionality related to evaluating Python code. There is no Pythonexecutable involved, only an API for your application touse.
Embedding Python is a relatively straightforward process. Ifyour goal is merely to execute vanilla Python code from within a Cprogram, it's actually quite easy. Listing 1 is the complete sourceto a program that embeds the Python interpreter. This illustratesone of the simplest programs you could write making use of thePython interpreter.
Listing 1 uses three Python-specific function calls.Py_Initialize starts up the Pythoninterpreter library, causing it to allocate whatever internalresources it needs. You must call this function before calling mostother functions in the Python API.PyEval_SimpleString provides aquick, no-frills way to execute arbitrary Python code.Interpretation of the code is immediate. In the above example, forinstance, the import sys line causes Pythonto import the sys module beforereturning control to the C/C++ program. Each string passed toPyEval_SimpleString must be a complete Python statement of somekind. In other words, half statements are illegal, even if they arecompleted with another call to PyRun_SimpleString. For example, thefollowing code will not work properly:
// Python will print first error here PyRun_SimpleString("import ");<\n> // Python will print second error here PyRun_SimpleString("sys\n");<\n>
Py_Finalize is the lastPython function which any application that embeds Python must call.This function shuts down the interpreter and frees any resources itallocated during its lifetime. You should call this when you arecompletely finished using the Python library. When you callPy_Finalize, Python will unload all imported modules one by one.Many modules must execute their own clean-up code when they areunloaded in order to free any global resources they may haveallocated. For this reason, calling Py_Finalize can have the sideeffect of causing quite a bit of other code to run.
PyEval_SimpleString is justone way to execute Python code from within your C applications. Infact, there is a whole collection of similar high-level functions.PyEval_SimpleFile is just likePyEval_SimpleString, except it reads its input from aFILE pointer rather than a character buffer. Seethe Python documentation atwww.python.org/docs/api/veryhigh.htmlfor complete documentation on these high-level functions.
In addition to evaluating Python scripts, you can alsomanipulate Python objects and call Python functions directly fromyour C code. While this involves more complex C code than usingPyEval_SimpleString, it also allows access to more detailedinformation. For example, you can access objects returned fromPython functions or determine if an exception has beenthrown.
Extending Python
When you embed Python within your application, it is oftendesirable to provide a small module that exposes an API related toyour application so that scripts executing within the embeddedinterpreter have a way to call back into the application. This isdone by providing your own Python module, written in C, and isexactly the same as writing normal Python modules. The onlydifference is your module will function properly only within theembedded interpreter.
Extending Python requires some understanding of how thePython interpreter manipulates objects from C. All functionarguments and return values are pointers to PyObject structures,which are the C representation of real Python objects. You can makeuse of various function calls to manipulate PyObjects. Listing 2 isa simple example of a Python module extension written in C. This isthe source to the Python cryptmodule, which provides one-way hashing used in passwordauthentication.
All C implementations of Python-callable functions take twoarguments of type PyObject. The first argument is always “self”,the object whose method is being called (similar to the infamous“this” pointer in C++). The second object contains all thearguments to the function.PyArg_Parse is used to extractvalues from a PyObject containing function arguments. You do thisby passing, in the PyObject which contains the values, a formatstring which represents the data types you expect to be there, andone or more pointers to data types to be filled in with values fromthe PyObject. In Listing 2, the function takes two strings,represented by "(ss)".PyArg_Parse is similar to the Cfunction sscanf, except itoperates on a PyObject rather than a character buffer. In order toreturn a string value from the function, callPyString_FromString. This helperfunction takes a char* value and converts itinto a PyObject.
C programs can easily create new threads of execution. UnderLinux, this is most commonly done using the POSIX Threads(pthreads) API and the function callpthread_create. For an overview ofhow to use pthreads, see “POSIX Thread Libraries” by Felix Garciaand Javier Fernandez athttp://www.linuxjournal.com/lj-issues/issue70/3184.htmlin the “Strictly On-line” section of LJ,February 2000. In order to support multi-threading, Python uses amutex to serialize access to its internal data structures. I willrefer to this mutex as the “global interpreter lock”. Before agiven thread can make use of the Python C API, it must hold theglobal interpreter lock. This avoids race conditions that couldlead to corruption of the interpreter state.
The act of locking and releasing this mutex is abstracted bythe Python functionsPyEval_AcquireLock andPyEval_ReleaseLock. After callingPyEval_AcquireLock, you can safely assume your thread holds thelock; all other cooperating threads are either blocked or executingcode unrelated to the internals of the Python interpreter, and youmay now call arbitrary Python functions. Once acquiring the lock,however, you must be certain to release it later by callingPyEval_ReleaseLock. Failure to do so will cause a thread deadlockand freeze all other Python threads.
To complicate matters further, each thread running Pythonmaintains its own state information. This thread-specific data isstored in an object called PyThreadState. Whencalling Python API functions from C in a multi-threadedapplication, you must maintain your own PyThreadState objects inorder to safely execute concurrent Python code.
If you are experienced in developing threaded applications,you might find the idea of a global interpreter lock ratherunpleasant. Well, it's not as bad as it first appears. While Pythonis interpreting scripts, it periodically yields control to otherthreads by swapping out the current PyThreadState object andreleasing the global interpreter lock. Threads previously blockedwhile attempting to lock the global interpreter lock will now beable to run. At some point, the original thread will regain controlof the global interpreter lock and swap itself back in.
This means when you call PyEval_SimpleString, you are facedwith the unavoidable side effect that other threads will have achance to execute, even though you hold the global interpreterlock. In addition, making calls to Python modules written in C(including many of the built-in modules) opens the possibility ofyielding control to other threads. For this reason, two C threadsthat execute computationally intensive Python scripts will indeedappear to share CPU time and run concurrently. The downside isthat, due to the existence of the global interpreter lock, Pythoncannot fully utilize CPUs on multi-processor machines usingthreads.
Enabling Thread Support
Before your threaded C program is able to make use of thePython API, it must call some initialization routines. If theinterpreter library is compiled with thread support enabled (as isusually the case), you have the runtime option of enabling threadsor not. Do not enable runtime threading support unless you plan onusing threads. If runtime support is not enabled, Python will beable to avoid the overhead associated with mutex locking itsinternal data structures. If you are using Python to extend athreaded application, you will need to enable thread support whenyou initialize the interpreter. I recommend initializing Pythonfrom within your main thread of execution, preferably duringapplication startup, using the following two lines of code:
// initialize Python Py_Initialize(); // initialize thread support PyEval_InitThreads();
Both functions return void, so there are no error codes tocheck. You can now assume the Python interpreter is ready toexecute Python code. Py_Initializeallocates global resources used by the interpreter library. CallingPyEval_InitThreads turns on theruntime thread support. This causes Python to enable its internalmutex lock mechanism, used to serialize access to critical sectionsof code within the interpreter. This function also has the sideeffect of locking the global interpreter lock. Once the functioncompletes, you are responsible for releasing the lock. Beforereleasing the lock, however, you should grab a pointer to thecurrent PyThreadState object. You will need this later in order tocreate new Python threads and to shut down the interpreter properlywhen you are finished using Python. Use the following bit of codeto do this:
PyThreadState * mainThreadState = NULL; // save a pointer to the main PyThreadState object mainThreadState = PyThreadState_Get(); // release the lock PyEval_ReleaseLock();
Python requires a PyThreadState object for each thread thatis executing Python code. The interpreter uses this object tomanage a separate interpreter data space for each thread. Intheory, this means that actions taken in one thread should notinterfere with the state of another thread. For instance, if youthrow an exception in one thread, the other snippets of Python codekeep running as if nothing happened. You must help Python to manageper-thread data. To do this, manually create a PyThreadState objectfor each C thread that will execute Python code. In order to createa new PyThreadState object, you need a pre-existingPyInterpreterState object. The PyInterpreterState object holdsinformation that is shared across all cooperating threads. When youinitialized Python, it created a PyInterpreterState object andattached it to the main PyThreadState object. You can use thisinterpreter object to create a new PyThreadState for your own Cthread. Here's some example code which does just that (ignore linewrapping):
// get the global lock PyEval_AcquireLock(); // get a reference to the PyInterpreterState PyInterpreterState * mainInterpreterState = mainThreadState->interp<\n>; // create a thread state object for this thread PyThreadState * myThreadState = PyThreadState_New(mainInterpreterState); // free the lock PyEval_ReleaseLock();
Now that you have created a PyThreadState object, your Cthread can begin to use the Python API to execute Python scripts.You must adhere to a few simple rules when executing Python codefrom a C thread. First, you must hold the global interpreter lockbefore doing anything that alters the state of the current threadstate. Second, you must load your thread-specific PyThreadStateobject into the interpreter before executing any Python code. Onceyou have satisfied these constraints, you can execute arbitraryPython code by using functions such as PyEval_SimpleString.Remember to swap out your PyThreadState object and release theglobal interpreter lock when done. Note the symmetry of “lock,swap, execute, swap, unlock” in the code (ignore linewrapping):
// grab the global interpreter lock PyEval_AcquireLock(); // swap in my thread state PyThreadState_Swap(myThreadState); // execute some python code PyEval_SimpleString("import sys\n"); PyEval_SimpleString("sys.stdout.write('Hello from a C thread!\n')\n"); // clear the thread state PyThreadState_Swap(NULL); // release our hold on the global interpreter PyEval_ReleaseLock();
Once your C thread is no longer using the Python interpreter,you must dispose of its resources. To do this, delete yourPyThreadState object. This is accomplished with the followingcode:
// grab the lock PyEval_AcquireLock(); // swap my thread state out of the interpreter PyThreadState_Swap(NULL); // clear out any cruft from thread state object PyThreadState_Clear(myThreadState); // delete my thread state object PyThreadState_Delete(myThreadState); // release the lock PyEval_ReleaseLock();
This thread is now effectively done using the Python API. Youmay safely call pthread_exit atthis point to halt execution of the thread.
Once your application has finished using the Pythoninterpreter, you can shut down Python support with the followingcode:
// shut down the interpreter PyEval_AcquireLock(); Py_Finalize();
Note there is no reason to release the lock, because Pythonhas been shut down. Be certain to delete all your thread-stateobjects with PyThreadState_Clearand PyThreadState_Delete beforecalling Py_Finalize.
Python is a good choice for use as an embedded language. Theinterpreter provides support for both embedding and extending,which allows two-way communication between C application code andembedded Python scripts. In addition, the threading supportfacilitates integration with multi-threaded applications withoutcompromising performance.
You can download example source code atftp.linuxjournal.com/pub/lj/listings/issue73/3641.tgz.This includes an example implementation of a multi-threaded HTTPserver with an embedded Python interpreter. In order to learn moreabout the implementation details, I recommend reading the Python CAPI documentation athttp://www.python.org/docs/api/.In addition, I have found the Python interpreter code itself to bean invaluable reference.
Using PyGILState_Ensure/PyGILState_Release

contructor:
-----------
PyGILState_Ensure ONLY ensure that one thread use the same PyThreadState;
if two threads call PyGILState_Ensure,
one thread might invalidate the PyThreadState of the other WITHOUT locking!
(See the source of Python; Python/pystate.c)
PyGILState_Ensure:
tcur = (PyThreadState *)PyThread_get_key_value(autoTLSkey); if (tcur == NULL) { /* Create a new thread state for this thread */ tcur = PyThreadState_New(autoInterpreterState); if (tcur == NULL) Py_FatalError("Couldn't create thread-state for new thread"); /* This is our thread state! We'll need to delete it in the matching call to PyGILState_Release(). */ tcur->gilstate_counter = 0; current = 0; /* new thread state is never current */ } else current = PyThreadState_IsCurrent(tcur); if (current == 0) PyEval_RestoreThread(tcur);
Locking is done in PyEval_RestoreThread(See the source in Python/ceval.c),
which called only if current = 0, i.e.,
there is no saved PyThreadState(_PyThreadState_Current in terms of pystate.c).
So one MUST have to call PyEval_SaveThread not just PyEval_ReleaseLock!!!
mainThreadState = PyEval_SaveThread();
Destructor:
-----------
Since there is no explicit PyThreadState in main thread,(See the contructor above.)
one MUST restore PyThreadState of main thread by PyEval_RestoreThread.
Otherwise there is segmentation fault because Py_Finalize use the current PyThreadState!
(See the source of Python; Python/pythonrun.c)
Py_Finalize:
tstate = PyThreadState_GET(); interp = tstate->interp; // <- At this point.
PyEval_RestoreThread(mainThreadState);
Note that there is one pair; one is PyEval_SaveThread in contructor,
the other PyEval_RestoreThread in destructor.
There is another pair in PyGILState_Ensure(PyEval_RestoreThread) and
PyGILState_Release(PyEval_SaveThread).
The overall structures for multi-threaded Python/C API calling look like:
Main thread:
// Constructor Py_Initialize(); PyEval_InitThreads(); PyThreadState* mainThreadState = PyEval_SaveThread(); ...... PyGILState_STATE gilState = PyGILState_Ensure(); // PyEval_RestoreThread // Call Python/C API... PyGILState_Release(gilState); // PyEval_SaveThread ...... // Create new thread... ...... PyGILState_STATE gilState = PyGILState_Ensure(); // PyEval_RestoreThread // Call Python/C API... PyGILState_Release(gilState); // PyEval_SaveThread ...... // Destructor PyEval_Restore(mainThreadState); Py_Finalize();
New thread:
...... PyGILState_STATE gilState = PyGILState_Ensure(); // PyEval_RestoreThread // Call Python/C API... PyGILState_Release(gilState); // PyEval_SaveThread ......
How does this code look if you use PyGILState_Ensure/Release?

I wonder how you implement this example using the PyGILState API that was introduced in version 2.3? Does the PyGILState_Ensure replace this, for example:
...
#idfef USE_GILSTATE
PyGILState* state = PyGILState_Ensure();
#else
// get the global lock
PyEval_AcquireLock();
// get a reference to the PyInterpreterState
PyInterpreterState * mainInterpreterState = mainThreadState->interp;
// create a thread state object for this thread
PyThreadState * myThreadState = PyThreadState_New(mainInterpreterState);
// free the lock
PyEval_ReleaseLock();
#endif
and likewise
...
#ifdef USE_GILSTATE
PyGILState_Release(state);
#else
// grab the lock
PyEval_AcquireLock();
// swap my thread state out of the interpreter
PyThreadState_Swap(NULL);
// clear out any cruft from thread state object
PyThreadState_Clear(myThreadState);
// delete my thread state object
PyThreadState_Delete(myThreadState);
// release the lock
PyEval_ReleaseLock();
#endif // USE_GILSTATE
...
I also found that if you run the original example in version 2.4 and have python compiled with Py_DEBUG defined, you will get fatal errors in pystate.c. The reason is that we can't have more than one thread state per thread:
The exception is thrown from
pystate.c, line 306:
Py_FatalError("Invalid thread state for this thread");
Has anybody else tried it?
agree, (PyGILState_*) is much simpler

This much more simplier locking model (PyGILState*())was introduced in the python2.3.
In my app each call of the embeded python code is locked by object of this class:
class PythonThreadLocker
{
PyGILState_STATE state;
public:
PythonThreadLocker() : state(PyGILState_Ensure())
{}
~PythonThreadLocker() {
PyGILState_Release(state);
}
};
It works safely. I must confess, that at first I wrote special singleton, which stored interpreted states for each thread (with API which is described in article), and then I found this very handy PyGILState_(Ensure/Realise).
I found this usage on koders.com, while quering "PyGILState_Ensure", thanks for the aiming:)
example.. missing?

Seems like the example code contains only code snippets from the article. Am I missing something? :) (i.e. no mentioned "http server with embedded python" thing :))
Done to perfection

Thanks for this useful article. We're embedding into a Win32 C++ multi-threaded app.
Ditto on the above comment -- needed to add a step to shutdown: swap the main thread state back in before shutting down the interpreter.
extending instead of embedding

With python 2.2, I am using an audio library (portaudio) that uses callbacks for
audio buffer filling. This is extending rather than embedding.
First of all:
PyInterpreterState * mis;
PyThreadState * mts;
mts = PyThreadState_Get();
mis = mts->interp;
ts = PyThreadState_New(mis); /* stored away somewhere */
Note: we don't need to PyEval_AcquireLock, as we already have the lock.
Inside the callback:
PyEval_AcquireLock();
PyThreadState_Swap(ts);
/* call python code here */
PyThreadState_Swap(NULL);
PyEval_ReleaseLock();
Finishing up:
PyThreadState_Swap(NULL);
PyThreadState_Clear(ts);
PyThreadState_Delete(ts);
Also, I found it necessary to do
PyEval_InitThreads();
before all the above.
Simon.
extending instead of embedding

With python 2.2, I am using an audio library (portaudio) that uses callbacks for
audio buffer filling. This is extending rather than embedding.
First of all:
PyInterpreterState * mis;
PyThreadState * mts;
mts = PyThreadState_Get();
mis = mts->interp;
ts = PyThreadState_New(mis); /* stored away somewhere */
Note: we don't need to PyEval_AcquireLock, as we already have the lock.
Inside the callback:
PyEval_AcquireLock();
PyThreadState_Swap(ts);
/* call python code here */
PyThreadState_Swap(NULL);
PyEval_ReleaseLock();
Finishing up:
PyThreadState_Swap(NULL);
PyThreadState_Clear(ts);
PyThreadState_Delete(ts);
Also, I found it necessary to do
PyEval_InitThreads();
before all the above.
Simon.
Re: extending instead of embedding

ok, i previewed OK this but it ignored pre markers in the final post...
doh!
Re: Embedding Python in Multi-Threaded C/C++ Applications

excellent resource!
Good tutorial, forgot swap to main before Finalize

Shutting down the interpreter should have
// shut down the interpreter
PyEval_AcquireLock();
PyThreadState_Swap(mainThreadState);
Py_Finalize();
otherwise you get this error message and segfault
Fatal Python error: PyThreadState_Get: no current thread
Thanks.
Re: Embedding Python in Multi-Threaded C/C++ Applications

Excellent!
Re: Embedding Python in Multi-Threaded C/C++ Applications

This article is so useful to make sense out of Python's involvement with threads that it should be added to the standard documentation shipping with the language.
It just helped me to sove a problem that I had been wrestling with for 24 hours.
Regards,
Fabien.
Re: Embedding Python in Multi-Threaded C/C++ Applications

thank you - it was stright forward to create an extension with a separate thread and a callback...
it saved quite some time.
Very good article. Helped me

Very good article. Helped me solve a problem i was investigating for two days now.
Still getting crashes...
Thanks for the article, helped to understand the GIL a little more.
Since python 2.3 you can do the whole GIL lock things with the GILState_Ensure and Release functions. Look at my code:
kay this was the code basically. So again the handler class saves a python function pointer of a certain event. E.g. if i want to call a python function when (lets suppose you coded a chat program) some sends a message to others, you call the CHandler fitting to "ChatMessage" with arguments built like Py_BuildValue("(ss)", playerName, message) and call ExecuteHandler(handler, args /* built with above BuildValue */). The problem is then if someone excessively spams and there are many many threads which call the function, the program crashes sometime.
Full code can be seen at:
http://pyghost.googlecode.com