Fast(er) pixel level access

Hi,

I've just started using wxPython. Working on a cellular automaton I did
some timings and got this (just the top):

···

--------------------------------------------------------------------------------
Wed Apr 23 22:22:11 2008 caprofile7

         1943247 function calls in 14.565 CPU seconds

   Ordered by: internal time

   ncalls tottime percall cumtime percall filename:lineno(function)
        1 3.544 3.544 14.169 14.169
/usr/lib/python2.5/site-packages/wx-2.8-gtk2-unicode/wx/_core.py:7219(MainLoop)
   158760 2.855 0.000 10.134 0.000
/home/folkert/workarea/cellularautomaton/dharana/wxCA/canumclass3.py:56(parity)
   120393 2.081 0.000 5.379 0.000
/home/folkert/workarea/cellularautomaton/dharana/wxCA/canumclass3.py:259(setADeadPixel)
   162181 1.311 0.000 1.948 0.000
/usr/lib/python2.5/site-packages/wx-2.8-gtk2-unicode/wx/_gdi.py:982(__init__)
   162181 0.888 0.000 1.168 0.000
/usr/lib/python2.5/site-packages/wx-2.8-gtk2-unicode/wx/_gdi.py:1047(__init__)
    41788 0.723 0.000 1.785 0.000
/home/folkert/workarea/cellularautomaton/dharana/wxCA/canumclass3.py:271(setALifePixel)
   162181 0.536 0.000 0.536 0.000
/usr/lib/python2.5/site-packages/wx-2.8-gtk2-unicode/wx/_gdi.py:1080(MoveTo)
       10 0.426 0.043 10.560 1.056
/home/folkert/workarea/cellularautomaton/dharana/wxCA/canumclass3.py:343(applyRule)
--------------------------------------------------------------------------------

My question is if there is a way to optimize things? I can't do anything
about the "MainLoop" time I guess?
For the "parity" and "setADeadPixel" functions the code is included below.
Is the bitmap access in "SetADeadPixel" the only and most efficient way
to manipulate pixel values?
What is the difference in using pda.Offset(...) vs pda.MoveTo(...)?
I couldn't find a direct way to retrieve pixel values directly so shadow
values are kept in a byte array img['b'].
This allows for direct comparison of pixel values (RGB and Alpha info).

Thanks in advance,
Folkert
--------------------------------------------------------------------------------
    def parity(self, x, y, img):
        sum = 0
        if img['b'][x,y-1] == lifeCell: # comparing 2 32-bit values
            sum += 1
        if img['b'][x-1,y] == lifeCell:
            sum += 1
        if img['b'][x,y] == lifeCell:
            sum += 1
        if img['b'][x+1,y] == lifeCell:
            sum += 1
        if img['b'][x,y+1] == lifeCell:
            sum += 1
        if sum == 1:
            self.setLife(img, x, y)
            return 1
        else:
            self.setDead(img, x, y) # actually a call to
setADeadPixel
            return 0
-------------------------------------------------------------------------------
    def setADeadPixel(self, img, x, y):
        # Set the pixel in the bitmap:
        pd = wx.AlphaPixelData(img['i'])
        if not pd:
            raise RuntimeError("Failed to gain raw access to bitmap data.")
        pda = wx.AlphaPixelData_Accessor(img['i'], pd)
        #pda.Offset(pd, x, y)
        pda.MoveTo(pd, x, y)
        pda.Set(DCR, DCG, DCB, DCA)
        # Set the pixel in the buffer:
        img['b'][x,y] = deadCell
-----------------------------------------------------------------------------------------------------------------

Christopher Barker schreef:

Folkert Boonstra wrote:

Hi,

I've just started using wxPython. Working on a cellular automaton I did
some timings and got this (just the top):

This looks like an application crying out for for numpy -- a real 2-d
array would help a lot, both in terms of ease of writing code, and
performance. I'm pretty sure you can have a numpy array and a
wx.Bitmap share a data buffer. See the Demo, under "Recent Additions"
: RawbitmapAccess and BitmapFromBuffer.

-Chris

Hi Chris,

Thanks for replying. Yes I use numpy (see code below) already. And ... I
borrowed some code from the examples you mentioned.
The img['b'] refers to the bitmap array (used as shadow buffer) created
via numpy.
I guess that's the same as what you are referring to as sharing.
As you can see from the code below the shadow buffer and the bitmap use
/ are created from the bitmap array.
So my original questions still stand.

Folkert

···

--------------------------------------------------------------------------------------------------------------
import numpy

def makeByteArray(shape):
    return numpy.empty(shape, (numpy.uint32, {'r':(numpy.uint8,0), \
                                              'g':(numpy.uint8,1), \
                                              'b':(numpy.uint8,2), \
                                              'a':(numpy.uint8,3)}))

deadCellAR = makeByteArray((1,1))
DCR = deadCellR = deadCellAR['r'][0,0] = 200
DCG = deadCellG = deadCellAR['g'][0,0] = 0
DCB = deadCellB = deadCellAR['b'][0,0] = 0
DCA = deadCellA = deadCellAR['a'][0,0] = 255 #128
deadCell = deadCellAR[0][0]
lifeCellAR = makeByteArray((1,1))
LCR = lifeCellR = lifeCellAR['r'][0,0] = 0
LCG = lifeCellG = lifeCellAR['g'][0,0] = 200
LCB = lifeCellB = lifeCellAR['b'][0,0] = 0
LCA = lifeCellA = lifeCellAR['a'][0,0] = 255 #128
lifeCell = lifeCellAR[0][0]

--------------------------------------------------------------------------------------------------------------
    def makeBitmapArray(self, red, green, blue, alpha=128):
        # Make an array of bytes that is DIM*DIM in size, with enough
        # slots for each pixel to have a RGB and A value
        arr = makeByteArray( (DIM,DIM) )
        # initialize all pixel values to the values passed in
        arr['r'][:,:] = red
        arr['g'][:,:] = green
        arr['b'][:,:] = blue
        arr['a'][:,:] = alpha
        # Set the alpha for the border pixels to be fully opaque
        arr['a'][0, 0:DIM] = wx.ALPHA_OPAQUE # first row
        arr['a'][DIM-1, 0:DIM] = wx.ALPHA_OPAQUE # last row
        arr['a'][0:DIM, 0] = wx.ALPHA_OPAQUE # first col
        arr['a'][0:DIM, DIM-1] = wx.ALPHA_OPAQUE # last col
        return arr
    
    def makeBitmap(self, red, green, blue, alpha=128):
        # make the array
        arr = self.makeBitmapArray(red, green, blue, alpha)
        # finally, use the array to create a bitmap
        bmp = wx.BitmapFromBufferRGBA(DIM, DIM, arr)
        d = {}
        d['i'] = bmp
        d['b'] = arr
        return d
--------------------------------------------------------------------------------------------------------------

Folkert Boonstra wrote:

Thanks for replying. Yes I use numpy (see code below) already.

a couple comments:

def makeByteArray(shape):
    return numpy.empty(shape, (numpy.uint32, {'r':(numpy.uint8,0), \
                                              'g':(numpy.uint8,1), \
                                              'b':(numpy.uint8,2), \
                                              'a':(numpy.uint8,3)}))

I haven't seen record arrays used like this -- pretty cool.

        # Set the alpha for the border pixels to be fully opaque
        arr['a'][0, 0:DIM] = wx.ALPHA_OPAQUE # first row

you can write this as: arr['a'][0,:] # just a bit cleaner

    def makeBitmap(self, red, green, blue, alpha=128):
        # make the array
        arr = self.makeBitmapArray(red, green, blue, alpha)
        # finally, use the array to create a bitmap
        bmp = wx.BitmapFromBufferRGBA(DIM, DIM, arr)
        d = {}
        d['i'] = bmp
        d['b'] = arr
        return d

This looks right. However, I was wrong, the wx.Bitmap does not share data with the array -- a copy is made:

"""
Unlike wx.ImageFromBuffer the bitmap created with this function does not share the memory buffer with the buffer object.
"""

So you may need to call wx.BitmapFromBufferRBGA whenever you want to update the screen. See the enclosed example.

You may be able to use wx.ImageFromBuffer, but then you'd still need to convert your wxImage into a wx.Bitmap when you want to display, which may not be any faster.

I'd do all your modeling and manipulation with numpy, then only update the Bitmap when you're done -- I'm betting it'll be fast enough.

Note that you can probably speed up the "parity" function by "vectorizing" it. See the numpy docs and mailing list for help with that.

One more comment -- how many states does your cellular automata have? you can probably model it by using a grid of, say, uint8 values, and then translate them to colors as the last step.

-Chris

BitmapFromBufferTest.py (2.73 KB)

···

from: wxPython API Documentation — wxPython Phoenix 4.2.2 documentation

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

Robin Dunn schreef:

[...]

For the "parity" and "setADeadPixel" functions the code is included
below.
Is the bitmap access in "SetADeadPixel" the only and most efficient way
to manipulate pixel values?

You could probably save a lot of overhear by not recreating the
pixeldata and accessor objects each time. Also the raw bitmap access
classes are optimized for sequential access. Always doing random
access (MoveTo's) is probably imposing some additional overhead.

Yes I tried before to create a accessor objects only once but that ended
in disaster when using pda.Offset(...).
Your comment below triggered me to use the pda.MoveTo(...) and create
the pixeldata / accessor objects only once.
That works.

What is the difference in using pda.Offset(...) vs pda.MoveTo(...)?

MoveTo does a Reset():

            void MoveTo(const PixelData& data, int x, int y)
            {
                Reset(data);
                Offset(data, x, y);
            }

I couldn't find a direct way to retrieve pixel values directly so shadow
values are kept in a byte array img['b'].

The iterator object has a Get method.

Thanks. I hadn't realized at first what the Get would do.

This allows for direct comparison of pixel values (RGB and Alpha info).

I agree with Chris. Using a numpy array makes a lot of sense here,
and then it would be a quick transformation (or maybe none depending
on your needs) to convert it into something that could be converted to
a bitmap. in other words, instead of using your bitmap as the data
store, use something else that is quick and easy to manipulate, and
then convert that to the bitmap for drawing.

As mentioned in a previous message, I did all of that. See an earlier
message that includes the complete source.

Thanks for your replies.

Folkert Boonstra wrote:

There was an example that triggered me. It took a little to realize that
in order to reference for example the 'a' value one has to use
arr['a'][x,y] and not arr[x,y][['a'].

which concerns me -- are you sure the bytes are in the right order?

Done that now. Everything is now done on the buffer first. Speed is much
better now but not yet fast enough :slight_smile:

darn -- have you profiled it yet? Where is the time being taken?

Note that you can probably speed up the "parity" function by
"vectorizing" it. See the numpy docs and mailing list for help with that.

I tried the following but it's much slower than my original so I guess
you mean something else:

    def ptest(self, p):
        return numpy.where(p == lifeCell, 1, 0)

p == lifeCell returns an boolean array: 1 where it's true, o where not, so the "where" here is entirely superfluous.

    def parityV1(self, x, y, img):
        self.m[0] = img['b'][x,y-1]
        self.m[1] = img['b'][x-1,y]
        self.m[2] = img['b'][x,y]
        self.m[3] = img['b'][x+1,y]
        self.m[4] = img['b'][x,y+1]
        if numpy.sum(self.ptest(self.m)) == 1:
            self.setLife(img, x, y)
            return 1
        else:
            self.setDead(img, x, y)
            return 0

Here is your slowdown:

    def applyRule(self, rule):
         # Foreach pixel apply the rule
         img = self.imgs
         for y in xrange(DIM):
             for x in xrange(DIM):
                 if x>0 and x<(DIM-1) and y>0 and y<(DIM-1):
                     r = rule(x, y, img)

looping through each element of an an array is going to be slow. You need to apply your rule to the entire array at once.

Something like this should get you started:

#!/usr/bin/env python

import numpy as np
import numpy.random as random

w = 4
h = 5

a = random.randint(0,2,(w,h))

print a

num_neighbors = np.zeros((w-2, h-2), np.int16) # leave out the boundaries

num_neighbors += a[1:-1, 1:-1]
print "added the center:"
print num_neighbors

num_neighbors += a[0:-2, 1:-1]
print "added the top:"
print num_neighbors

num_neighbors += a[2:, 1:-1]
print "added the bottom:"
print num_neighbors

num_neighbors += a[1:-1, 0:-2]
print "added the left:"
print num_neighbors

num_neighbors += a[1:-1, 2:]
print "added the right:"
print num_neighbors

You can get more hints from the numpy list.

One more comment -- how many states does your cellular automata have?
you can probably model it by using a grid of, say, uint8 values, and
then translate them to colors as the last step.

But that's what i'm doing using 32-bit values?

yes, indeed you are. The only issue is that you are using more bytes that required, but probably not a big issue at all, unless you have a really huge grid.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

Folkert,

We should move this to the numpy list, but one more...

Folkert Boonstra wrote:

Christopher Barker schreef:

which concerns me -- are you sure the bytes are in the right order?

I don't understand that syntax either and couldn't find anything on:

neither could I. I also looked in "The Numpy Book" (not free).

I cc'd you on my post to the numpy list -- I highly recommend you subscribe, you'll learn a lot that will help you with this sort of stuff.

Byte order:
Well, first of all, the colors displayed are the colors I expect.

well, that's all you need to know -- the byte order is right.

As I'm on a little-endian system offset 0 implies the rightmost byte,
offset 3 the leftmost byte.

I'm not sure you'd get anything different with a big-endian system -- you've defined the four bytes. What would be different is how those four bytes would be interpreted as an integer -- but if all you care about is equality, then it doesn't matter.

In fact, I did a test:

import numpy

shape = (2,2)

rgbaType = numpy.dtype({'r':(numpy.uint8,0),
                        'g':(numpy.uint8,1),
                        'b':(numpy.uint8,2),
                        'a':(numpy.uint8,3)})

A = numpy.zeros(shape, dtype=(numpy.uint32, rgbaType) )
#A = numpy.zeros(shape, dtype=rgbarec )

print "A zero array"
print A

# set the red value:
A['r'] = 255

print " A red array "
print A

when I run that in a PPC (bug endian?), I get:

A zero array
[[0 0]
  [0 0]]
  A red array
[[4278190080 4278190080]
  [4278190080 4278190080]]

But when I run it on an Intel machine:
[[0 0]
  [0 0]]
  A red array
[[255 255]
  [255 255]]

So the bytes are in the same order in either case, but are interpreted as uint32 differently. So be a careful if you want the results to be cross platform.

Here's an example

size = (1, 1) dtype = ('<u4', [('r', '|u1'),....

note that this is:

size = (1, 1) dtype = ('>u4', [('r', '|u1'),....

on the PPC. The ">" means "big endian", the "<" "little endian". You can specify the endianness in your dtype, and I think numpy will byteswap for you, but it gets a bit hard to keep track of.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

Folkert Boonstra wrote:

Just an update on how far i got with getting it faster.
Replacing loops with list comprehensions is a major enhancement:

I'm a bit surprised -- I thought list comprehensions gave you almost the same byte code. I learn something new every day.

Anyway, you can easily get a 10X or so speed up for this kind of thing with numpy -- it really is the tool for the job, it's well worth learning.

on further examination:
     def applyRule(self, rule):
         # Apply the rule
         xdim = ydim = [d for d in range(DIM) if d>0 and d<(DIM-1)]

if xdim = ydim, why make two? though they are references to the same thing anyway. (of course, you may be planning to generalize this to when x does not equal y)

and: [d for d in range(DIM) if d>0 and d<(DIM-1)] is the same as:

range(1,DIM-1)

Let Python work for you!

Still, if you can make your rule function just take the img array, and process it with numpy tools, it will be much faster (less code to get wrong, too) -- trust me on this.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Folkert Boonstra schreef:

Christopher Barker schreef:
  

Folkert Boonstra wrote:
    

Just an update on how far i got with getting it faster.
Replacing loops with list comprehensions is a major enhancement:
      

I'm a bit surprised -- I thought list comprehensions gave you almost
the same byte code. I learn something new every day.

Anyway, you can easily get a 10X or so speed up for this kind of
thing with numpy -- it really is the tool for the job, it's well worth
learning

Chris,

Thanks for your suggestion(s)!. It is now fast enough for 512x512 size
images.
Now I have to look at the drawing of bitmaps again.
The rules are now implemented solely with numpy. The lichenWithDeath
rule is ugly currently (although still fast). Is there a way to do the
convolve only once?

    def parity(self):
        kernel = NU.array([[0,1,0],[1,1,1],[0,1,0]]).astype(NU.uint8)
        mask = (NI.convolve(self.bufbw, kernel) == 1).astype(NU.uint8)
        self.bufbw = mask
        self.buf = NU.array(self.DA, copy=1)
        NU.putmask(self.buf, mask, self.LA)
                    
    def life(self):
        kernel = NU.array([[1,1,1],[1,0,1],[1,1,1]]).astype(NU.uint8)
        mask = (NI.convolve(self.bufbw, kernel) == 1).astype(NU.uint8)
        self.bufbw = mask
        self.buf = NU.array(self.DA, copy=1)
        NU.putmask(self.buf, mask, self.LA)

    def lichenWithDeath(self):
        kernel = NU.array([[1,1,1],[1,0,1],[1,1,1]]).astype(NU.uint8)
        mask3 = (NI.convolve(self.bufbw, kernel) == 3).astype(NU.uint8)
        mask4 = (NI.convolve(self.bufbw, kernel) == 4).astype(NU.uint8)
        mask7 = (NI.convolve(self.bufbw, kernel) == 7).astype(NU.uint8)
        mask8 = (NI.convolve(self.bufbw, kernel) == 8).astype(NU.uint8)
        maskl = mask3 + mask7 + mask8
        maskd = mask4
        self.bufbw = maskl
        self.buf = NU.array(self.DA, copy=1)
        NU.putmask(self.buf, maskl, self.LA)
        NU.putmask(self.buf, maskd, self.DA)

Folkert

Folkert Boonstra wrote:

The lichenWithDeath
rule is ugly currently (although still fast). Is there a way to do the
convolve only once?

This is really one for the numpy list, but since you asked:

    def lichenWithDeath(self):
        kernel = NU.array([[1,1,1],[1,0,1],[1,1,1]]).astype(NU.uint8)
        mask3 = (NI.convolve(self.bufbw, kernel) == 3).astype(NU.uint8)
        mask4 = (NI.convolve(self.bufbw, kernel) == 4).astype(NU.uint8)
        mask7 = (NI.convolve(self.bufbw, kernel) == 7).astype(NU.uint8)
        mask8 = (NI.convolve(self.bufbw, kernel) == 8).astype(NU.uint8)

sure:
Conv = NI.convolve(self.bufbw, kernel)
mask3 = (Conv == 3).astype(NU.uint8)
mask4 = (Conv == 4).astype(NU.uint8)
mask7 = (Conv == 7).astype(NU.uint8)
mask8 = (Conv == 8).astype(NU.uint8)

it might be faster to do:

maskl = (Conv == 3).astype(NU.uint8)
maskl += (Conv == 7).astype(NU.uint8)
maskl += (Conv == 8).astype(NU.uint8)

allocating fewer temporary arrays can help.

        self.buf = NU.array(self.DA, copy=1)

you can spell this:

self.buf = self.DA.copy()

I think it's a bit cleaner.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov