I'm using SWIG to glue together some C++ code to Python (2.6), and part of that glue includes a piece of code that converts large fields of data (millions of values) from the C++ side to a Numpy array. The best method I can come up with implements an iterator for the class and then provides a Python method:
def __array__(self, dtype=float):
return np.fromiter(self, dtype, self.size())
The problem is that each iterator next call is very costly, since it has to go through about three or four SWIG wrappers. It takes far too long. I can guarantee that the C++ data are stored contiguously (since they live in a std::vector), and it just feels like Numpy should be able to take a pointer to the beginning of that data alongside the number of values it contains, and read it directly.
Is there a way to pass a pointer to internal_data_[0] and the value internal_data_.size() to numpy so that it can directly access or copy the data without all the Python overhead?
解决方案
So it looks like the only real solution is to base something off pybuffer.i that can copy from C++ into an existing buffer. If you add this to a SWIG include file:
%insert("python") %{
import numpy as np
%}
/*! Templated function to copy contents of a container to an allocated memory
* buffer
*/
%inline %{
//==== ADDED BY numpy.i
#include
template < typename Container_T >
void copy_to_buffer(
const Container_T& field,
typename Container_T::value_type* buffer,
typename Container_T::size_type length
)
{
// ValidateUserInput( length == field.size(),
// "Destination buffer is the wrong size" );
// put your own assertion here or BAD THINGS CAN HAPPEN
if (length == field.size()) {
std::copy( field.begin(), field.end(), buffer );
}
}
//====
%}
%define TYPEMAP_COPY_TO_BUFFER(CLASS...)
%typemap(in) (CLASS::value_type* buffer, CLASS::size_type length)
(int res = 0, Py_ssize_t size_ = 0, void *buffer_ = 0) {
res = PyObject_AsWriteBuffer($input, &buffer_, &size_);
if ( res < 0 ) {
PyErr_Clear();
%argument_fail(res, "(CLASS::value_type*, CLASS::size_type length)",
$symname, $argnum);
}
$1 = ($1_ltype) buffer_;
$2 = ($2_ltype) (size_/sizeof($*1_type));
}
%enddef
%define ADD_NUMPY_ARRAY_INTERFACE(PYVALUE, PYCLASS, CLASS...)
TYPEMAP_COPY_TO_BUFFER(CLASS)
%template(_copy_to_buffer_ ## PYCLASS) copy_to_buffer< CLASS >;
%extend CLASS {
%insert("python") %{
def __array__(self):
"""Enable access to this data as a numpy array"""
a = np.ndarray( shape=( len(self), ), dtype=PYVALUE )
_copy_to_buffer_ ## PYCLASS(self, a)
return a
%}
}
%enddef
then you can make a container "Numpy"-able with
%template(DumbVectorFloat) DumbVector;
ADD_NUMPY_ARRAY_INTERFACE(float, DumbVectorFloat, DumbVector);
Then in Python, just do:
# dvf is an instance of DumbVectorFloat
import numpy as np
my_numpy_array = np.asarray( dvf )
This has only the overhead of a single Python C++ translation call, not the N that would result from a typical length-N array.
A slightly more complete version of this code is part of my PyTRT project at github.