BitmapFromBuffer with iterators

Conrado_PLG · August 10, 2008, 8:02pm

Hello,

Suppose I have a PIL image and want to convert it to a wx.Bitmap. I
would do this:

img = Image.open(...)
w, h = img.size
s = img.tostring()
bmp = wx.BitmapFromBuffer(w, h, s)

This works great, but is does have a problem. If the image is very big
(say, 100MB), right after the bmp is created, I'll have 300MB
allocated (img, s, bmp) if I need to keep a reference to img. Surely,
s will be soon garbage collected, but what matters is that at sometime
in the execution, there is 300MB allocated.

It would be great if there was something like BitmapFromIterator so I
could do something like this:

img = Image.open(...)
w, h = img.size
pix_access = img.load()

def iter()
    for y in xrange(h):
        for x in xrange(w):
            for i in xrange(3):
                yield pix_access[x, y][i]

bmp = wx.BitmapFromIterator(w, h, iter())

This way, only 200MB would be allocated. Also, this would solve other
problems e.g.: when using the FreeImage library, if I need to convert
a image to a wx.Bitmap, I need to swap 'R' and 'G' values because the
library works with 'BGR' order. In order to do this, I need to
allocate a new buffer, copy the bytes and swap them there, so I run
into the same problem of having 300MB allocated. With a iterator I
could solve this.

What do you think? Is there another existing way to solve this? Or is
it a good (or stupid) idea to implement? I don't think it would be
difficult, but I'm not sure how the performance would be. I could try
to implement it myself, but I would need some time since I never tried
to build wxPython myself...

Thanks,
Conrado

Samwyse · August 11, 2008, 6:03am

Conrado PLG wrote:

Hello,

Suppose I have a PIL image and want to convert it to a wx.Bitmap. I
would do this:

img = Image.open(...)
w, h = img.size
s = img.tostring()
bmp = wx.BitmapFromBuffer(w, h, s)

This works great, but is does have a problem. If the image is very big
(say, 100MB), right after the bmp is created, I'll have 300MB
allocated (img, s, bmp) if I need to keep a reference to img. Surely,
s will be soon garbage collected, but what matters is that at sometime
in the execution, there is 300MB allocated.

You do realize that img.tostring() encodes the image to the 'raw' format (or any other format you may specify)? So the data in img may be different from the data in s. That said, img.info is a dict that, IIRC, contains the data once it gets loaded.

It would be great if there was something like BitmapFromIterator so I
could do something like this:

img = Image.open(...)
w, h = img.size
pix_access = img.load()

def iter()
    for y in xrange(h):
        for x in xrange(w):
            for i in xrange(3):
                yield pix_access[x, y][i]

bmp = wx.BitmapFromIterator(w, h, iter())

While a pixel access object may be much faster than get/putpixel, I expect that this would run fairly slowly, especially if you're manipulating 100MB images. Try this and see how fast it runs:

for pixel in iter():
x = pixel

Memory's cheap, I bought 1GB of memory for a laptop last week for $49. I'd say just stick a couple of GB into your workstation. How much is your time worth while you fret over this?

This way, only 200MB would be allocated. Also, this would solve other
problems e.g.: when using the FreeImage library, if I need to convert
a image to a wx.Bitmap, I need to swap 'R' and 'G' values because the
library works with 'BGR' order. In order to do this, I need to
allocate a new buffer, copy the bytes and swap them there, so I run
into the same problem of having 300MB allocated. With a iterator I
could solve this.

No, this is where you use the pixel access object to swap the values in-place. Just because you tell PIL that the image is RGB, doesn't mean that you can't put BGR into it.

for y in xrange(h):
for x in xrange(w):
pix[x, y] = pix[x, y][-1::-1]

Conrado_PLG · August 11, 2008, 2:15pm

You do realize that img.tostring() encodes the image to the 'raw' format (or
any other format you may specify)? So the data in img may be different from
the data in s. That said, img.info is a dict that, IIRC, contains the data
once it gets loaded.

Sorry, I don't get your point... Isn't the tostring() result that wx
expects? (I forgot to add a img = img.convert('rgb') though)

While a pixel access object may be much faster than get/putpixel, I expect
that this would run fairly slowly, especially if you're manipulating 100MB
images. Try this and see how fast it runs:

for pixel in iter():
x = pixel

I though so. I'll do some more testing.

Memory's cheap, I bought 1GB of memory for a laptop last week for $49. I'd
say just stick a couple of GB into your workstation. How much is your time
worth while you fret over this?

Well, I surely can, but I don't know if all of my users are willing to
If people didn't worry about memory usage there wouldn't be so many
complaining about Firefox memory usage, for example...
Anyway, it really isn't a critical issue, but I though it was worth to
search for a better way.

No, this is where you use the pixel access object to swap the values
in-place. Just because you tell PIL that the image is RGB, doesn't mean
that you can't put BGR into it.

for y in xrange(h):
for x in xrange(w):
pix[x, y] = pix[x, y][-1::-1]

Yes, that would be an alternative... This happens actually with
another library, FreeImage. The problem is that I would "corrupt" the
original image so I couldn't, e.g. save it on disk (without swapping
bytes back). Guess I need some more testing with this too...

Thanks!

···

On Mon, Aug 11, 2008 at 03:03, Samwyse <samwyse@gmail.com> wrote:

Chris_Barker1 · August 11, 2008, 5:50pm

Conrado PLG wrote:

Suppose I have a PIL image and want to convert it to a wx.Bitmap. I
would do this:

img = Image.open(...)
w, h = img.size
s = img.tostring()
bmp = wx.BitmapFromBuffer(w, h, s)

This works great, but is does have a problem. If the image is very big
(say, 100MB), right after the bmp is created, I'll have 300MB
allocated (img, s, bmp) if I need to keep a reference to img. Surely,
s will be soon garbage collected, but what matters is that at sometime
in the execution, there is 300MB allocated.

That can be an issue,. but you're not quite right -- "FromBuffer" means it is generating the Bitmap from a Python buffer object. The idea behind a buffer object is that it can be used to share the data buffer (the actual memory) between two (or more!) objects, so in the above, the wx.Bitmap and the String should be sharing the same data.

Try checking your memory use, and see what you get.

You can remove even that copy if you can find a way to make PIL provide a buffer object for you. One way is to use numpy:

import numpy as np

RGBarray = np.asarray(A_PIL_Image)

should create a numpy array that shares data with the PIL image. It's also a buffer object, so you can then:

bmp = wx.BitmapFromBuffer(w, h, RGBarray)

It would be great if there was something like BitmapFromIterator so I
could do something like this:

img = Image.open(...)
w, h = img.size
pix_access = img.load()

def iter()
    for y in xrange(h):
        for x in xrange(w):
            for i in xrange(3):
                yield pix_access[x, y][i]

bmp = wx.BitmapFromIterator(w, h, iter())

How would that help? a wx.Bitmap needs to have a data buffer, so you'd need a copy to do that anyway. If you're trying to get a copy, the above ways already do that. If the above did work, it would be dog slow, making python function calls for each pixel.

Buffers are the right way to do this, and if what I've written above is right, they already work. With the new buffer protocol in py3k, it will be even easier.

When using the FreeImage library, if I need to convert
a image to a wx.Bitmap, I need to swap 'R' and 'G' values because the
library works with 'BGR' order. In order to do this, I need to
allocate a new buffer, copy the bytes and swap them there, so I run
into the same problem of having 300MB allocated. With a iterator I
could solve this.

in a dead-slow way.

numpy will help you here, though. You can make a numpy array with the PIL image, and do the swapping there. You could do it with varying levels of copying - all at once, with a copy, line by line, pixel by pixel, etc.

PIL may be able to do it for you with a built-in, too.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

Conrado_PLG · August 11, 2008, 6:48pm

That can be an issue,. but you're not quite right -- "FromBuffer" means it
is generating the Bitmap from a Python buffer object. The idea behind a
buffer object is that it can be used to share the data buffer (the actual
memory) between two (or more!) objects, so in the above, the wx.Bitmap and
the String should be sharing the same data.

Try checking your memory use, and see what you get.

Actually, it doesn't. From the docs: "Unlike wx.ImageFromBuffer the
bitmap created with this function does not share the memory buffer
with the buffer object."

You can remove even that copy if you can find a way to make PIL provide a
buffer object for you. One way is to use numpy:

import numpy as np

RGBarray = np.asarray(A_PIL_Image)

should create a numpy array that shares data with the PIL image. It's also a
buffer object, so you can then:

bmp = wx.BitmapFromBuffer(w, h, RGBarray)

I'll try this!

How would that help? a wx.Bitmap needs to have a data buffer, so you'd need
a copy to do that anyway. If you're trying to get a copy, the above ways
already do that. If the above did work, it would be dog slow, making python
function calls for each pixel.

That would help avoid having another copy in memory. But I guess
you're right, it would be too slow.

numpy will help you here, though. You can make a numpy array with the PIL
image, and do the swapping there. You could do it with varying levels of
copying - all at once, with a copy, line by line, pixel by pixel, etc.

I'll check NumPy, never though of using it.

I guess this problem could be solve by some kind of adaptor which
would expose a buffer interface from a PIL image (which seems is what
NumPy does) or from a C array (which I would need for FreeImage, so I
could swap bytes 'on the fly'). Or maybe a ImageFromFile which would
use the file protocol instead? Well, I need to do some more tests

Thanks

···

On Mon, Aug 11, 2008 at 14:50, Christopher Barker <Chris.Barker@noaa.gov> wrote:

Conrado_PLG · August 11, 2008, 6:49pm

Oops, I meant "BitmapFromFile"

···

On Mon, Aug 11, 2008 at 15:48, Conrado PLG <conradoplg@gmail.com> wrote:

could swap bytes 'on the fly'). Or maybe a ImageFromFile which would
use the file protocol instead?

Conrado_PLG · August 11, 2008, 7:11pm

Turns out that

RGBarray = np.asarray(A_PIL_Image)

also makes a new copy and doesn't share data (tested; and docs say
that it only shares when the given object is already an array). Back
to square one, unless I'm missing something here...

Robin · August 11, 2008, 7:20pm

Conrado PLG wrote:

Hello,

Suppose I have a PIL image and want to convert it to a wx.Bitmap. I
would do this:

img = Image.open(...)
w, h = img.size
s = img.tostring()
bmp = wx.BitmapFromBuffer(w, h, s)

This works great, but is does have a problem. If the image is very big
(say, 100MB), right after the bmp is created, I'll have 300MB
allocated (img, s, bmp) if I need to keep a reference to img. Surely,
s will be soon garbage collected, but what matters is that at sometime
in the execution, there is 300MB allocated.

It would be great if there was something like BitmapFromIterator so I
could do something like this:

img = Image.open(...)
w, h = img.size
pix_access = img.load()

def iter()
    for y in xrange(h):
        for x in xrange(w):
            for i in xrange(3):
                yield pix_access[x, y][i]

bmp = wx.BitmapFromIterator(w, h, iter())

See the sample in the demo for RawBitmapAccess. As others have mentioned it will be slower, but it is possible to build a bitmap a pixel at a time.

···

--
Robin Dunn
Software Craftsman
http://wxPython.org Java give you jitters? Relax with wxPython!

Chris_Barker1 · August 11, 2008, 9:56pm

Conrado PLG wrote:

Actually, it doesn't. From the docs: "Unlike wx.ImageFromBuffer the
bitmap created with this function does not share the memory buffer
with the buffer object."

Darn, sorry, I forgot about that -- the problem is that a wx.Bitmap is in the "native" bitmap format, which is usually RGB or RGBA these days, but it could be other things, so you can't count on that. It works with wx.Image, 'cause we always know what binary format a wx.Image is (RGB)

So, you are going to have to make a copy, no matter how you slice it. But it would be nice if you could dump the bytes of the PIL image directly in a wx.Bitmap.

1) Are you sure PIL doesn't offer a buffer interface?

2) I'd check if your right about numpy.asarray -- I'm pretty sure it is supposed to work without copying data.

3) Do you need to keep the PIl image around? You could keep the numpy array around instead, you can always generate a new PIL image out of it with Image.fromarray(a)

Turns out that

RGBarray = np.asarray(A_PIL_Image)

also makes a new copy and doesn't share data (tested; and docs say
that it only shares when the given object is already an array).

Actually, it shares when the object conforms to the array protocol, which PIL images are supposed to, at least with a recent version

http://effbot.org/zone/pil-changes-116.htm

# Added “fromarray” function, which takes an object implementing the NumPy array interface and creates a PIL Image from it. (from Travis Oliphant).
# Added NumPy array interface support (__array_interface__) to the Image class (based on code by Travis Oliphant). This allows you to easily convert between PIL image memories and NumPy arrays:

import numpy, Image

i = Image.open('lena.jpg')
a = numpy.asarray(i) # a is readonly
i = Image.fromarray(a)

If you have tested this, and it does copy, then either you've found a bug, or maybe your PIL image isn't an RBG image in memory ( I think PIL can store stuff in a number of ways)

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

Conrado_PLG · August 13, 2008, 1:31am

Found out what was happening: PIL does support the array interface,
but supports it by returning an array object with its data from
img.tostring()... which copies the data. So it's PIL's "fault".

I tried with raw bitmap access, and it is indeed dog slow 70s against 0.3s.

Anyway, I guess I'll leave at that for now... I'll probably try to
code a extension in C or something in the future. Yes, I'm stubborn =)

But thanks for the answers!

···

On Mon, Aug 11, 2008 at 18:56, Christopher Barker <Chris.Barker@noaa.gov> wrote:

If you have tested this, and it does copy, then either you've found a bug,
or maybe your PIL image isn't an RBG image in memory ( I think PIL can store
stuff in a number of ways)

Chris_Barker1 · August 14, 2008, 5:20am

Conrado PLG wrote:

Found out what was happening: PIL does support the array interface,
but supports it by returning an array object with its data from
img.tostring()... which copies the data. So it's PIL's "fault".

I wonder why -- the point of the array interface (one of the points, anyway) is to be able to pass data buffers around without copying.

However, I think PIL may not necessarily store the data in a normal ol' RBG array, so it may not be possible. If you really want to address this, you might try asking on the PIL and/or numpy lists. The array interface code was contributed by one of the core numpy developers.

Anyway, I guess I'll leave at that for now... I'll probably try to
code a extension in C or something in the future. Yes, I'm stubborn =)

well, as I say -- it may not be possible to share the PIL data buffer at all -- C or no. If it is, write a patch to PIL, rather than custom code!

However, one option may be to keep a numpy array around with your data, rather than the PIL image -- you can re-create the PIL image from the array on demand if need be.

-Chris

···

--
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception