As a dynamic language, Python encourages a programming style of considering classes and objects in terms of their methods and attributes, more than where they fit into the class hierarchy.
This can make Python a very relaxed and comfortable language for rapid development, but with a price - the ‘red tape’ of managing data types is dumped onto the interpreter. At run time, the interpreter does a lot of work searching namespaces, fetching attributes and parsing argument and keyword tuples. This run-time ‘late binding’ is a major cause of Python’s relative slowness compared to ‘early binding’ languages such as C++.
However with Cython it is possible to gain significant speed-ups through the use of ‘early binding’ programming techniques.
例如,考虑以下 (愚蠢) 代码范例:
cdef class Rectangle: cdef int x0, y0 cdef int x1, y1 def __init__(self, int x0, int y0, int x1, int y1): self.x0 = x0 self.y0 = y0 self.x1 = x1 self.y1 = y1 def area(self): area = (self.x1 - self.x0) * (self.y1 - self.y0) if area < 0: area = -area return area def rectArea(x0, y0, x1, y1): rect = Rectangle(x0, y0, x1, y1) return rect.area()
在
rectArea()
method, the call to
rect.area()
和
area()
method contain a lot of Python overhead.
However, in Cython, it is possible to eliminate a lot of this overhead in cases where calls occur within Cython code. For example:
cdef class Rectangle: cdef int x0, y0 cdef int x1, y1 def __init__(self, int x0, int y0, int x1, int y1): self.x0 = x0 self.y0 = y0 self.x1 = x1 self.y1 = y1 cdef int _area(self): area = (self.x1 - self.x0) * (self.y1 - self.y0) if area < 0: area = -area return area def area(self): return self._area() def rectArea(x0, y0, x1, y1): rect = Rectangle(x0, y0, x1, y1) return rect.area()
Here, in the Rectangle extension class, we have defined two different area calculation methods, the efficient
_area()
C method, and the Python-callable
area()
method which serves as a thin wrapper around
_area()
. Note also in the function
rectArea()
how we ‘early bind’ by declaring the local variable
rect
which is explicitly given the type Rectangle. By using this declaration, instead of just dynamically assigning to
rect
, we gain the ability to access the much more efficient C-callable
_area()
方法。
But Cython offers us more simplicity again, by allowing us to declare dual-access methods - methods that can be efficiently called at C level, but can also be accessed from pure Python code at the cost of the Python access overheads. Consider this code:
cdef class Rectangle: cdef int x0, y0 cdef int x1, y1 def __init__(self, int x0, int y0, int x1, int y1): self.x0 = x0 self.y0 = y0 self.x1 = x1 self.y1 = y1 cpdef int area(self): area = (self.x1 - self.x0) * (self.y1 - self.y0) if area < 0: area = -area return area def rectArea(x0, y0, x1, y1): rect = Rectangle(x0, y0, x1, y1) return rect.area()
Here, we just have a single area method, declared as
cpdef
to make it efficiently callable as a C function, but still accessible from pure Python (or late-binding Cython) code.
If within Cython code, we have a variable already ‘early-bound’ (ie, declared explicitly as type Rectangle, (or cast to type Rectangle), then invoking its area method will use the efficient C code path and skip the Python overhead. But if in Cython or regular Python code we have a regular object variable storing a Rectangle object, then invoking the area method will require:
and within the area method itself:
So within Cython, it is possible to achieve massive optimisations by using strong typing in declaration and casting of variables. For tight loops which use method calls, and where these methods are pure C, the difference can be huge.