Both Python and C++ have core types to represent text and these are expected to be freely interchangeable.
cppyy
makes it easy to do just that for the most common cases, while allowing customization where necessary to cover the full range of diverse use cases (such as different codecs). In addition to these core types, there is a range of other character types, from
const char*
and
std::wstring
to
bytes
, that see much less use, but are also fully supported.
The C++ core type
std::string
is considered the equivalent of Python’s
str
, even as purely implementation-wise, it is more akin to
bytes
: as a practical matter, a C++ programmer would use
std::string
where a Python developer would use
str
(and vice versa), not
bytes
.
Python
str
is unicode, however, whereas an
std::string
is character based, thus conversions require encoding or decoding. To allow for different encodings,
cppyy
defers implicit conversions between the two types until forced, at which point it will default to seeing
std::string
as ASCII based and
str
to use the UTF-8 codec. To support this, the bound
std::string
has been pythonized to allow it to be a drop-in for a range of uses as appropriate within the local context.
In particular, it is sometimes necessary (e.g. for function arguments that take a non-const reference or a pointer to non-const
std::string
variables), to use an actual
std::string
instance to allow in-place modifications. The pythonizations then allow their use where
str
is expected. For example:
>>> cppyy.cppexec("std::string gs;") True >>> cppyy.gbl.gs = "hello" >>> type(cppyy.gbl.gs) # C++ std::string type <class cppyy.gbl.std.string at 0x7fbb02a89880> >>> d = {"hello": 42} # dict filled with str >>> d[cppyy.gbl.gs] # drop-in use of std::string -> str 42 >>>
To handle codecs other than UTF-8, the
std::string
pythonization adds a
decode
method, with the same signature as the equivalent method of
bytes
. If it is known that a specific C++ function always returns an
std::string
representing unicode with a codec other than UTF-8, it can in turn be explicitly pythonized to do the conversion with that codec.
C++’s “wide” string,
std::wstring
, is based on
wchar_t
, a character type that is not particularly portable as it can be 2 or 4 bytes in size, depending on the platform. cppyy supports
std::wstring
directly, using the
wchar_t
array conversions provided by Python’s C-API.
The C representation of text,
const char*
, is problematic for two reasons: it does not express ownership; and its length is implicit, namely up to the first occurrence of
'\0'
. The first can, up to an extent, be ameliorated: there are a range of cases where ownership can be inferred. In particular, if the C string is set from a Python
str
, it is the latter that owns the memory and the bound proxy of the former that in turn owns the (unconverted)
str
instance. However, if the
const char*
’s memory is allocated in C/C++, memory management is by necessity fully manual. Length, on the other hand, can only be known in the case of a fixed array. However even then, the more common case is to use the fixed array as a buffer, with the actual string still only extending up to the
'\0'
char, so that is assumed. (C++’s
std::string
suffers from none of these issues and should always be preferred when you have a choice.)
The C representation of a character array,
char*
, has all the problems of
const char*
, but in addition is often used as “data array of 8-bit int”.
cppyy directly supports the following character types, both as single variables and in array form:
char
,
signed char
,
unsigned char
,
wchar_t
,
char16_t
,和
char32_t
.