Patches for wx.lib.pdfviewer

Following recent discussions on this list, I have amended the pdf viewer to search in turn for, and work with, the modules python-fitz/mupdf, pypdf2 and pypdf. A patch to the current version of viewer.py is attached, together with one for consequential changes to demo/pdfviewer.py.

These have kindly been tested by Werner so I submit them for acceptance into wxPython Classic.

The following is a link to fitz.zip which contains the win32, Python 2.7 build of python-fitz (can be copied directly into site-packages) as the setup.py and mupdf library names for Windows are not the same as those provided in the python-fitz-master download.

https://secure.logmein.com/f?00_uW54mugCyMFuchyl.U9asROONMt.ZTFCehWS.msA

PDFViewer.py.patch (3.9 KB)

viewer.py.patch (32.1 KB)

···

--
Regards

David Hughes
Forestfield Software

Hi David,

When I py2exe my app there are two more variables not initialized.

         self.xshift = 0
         self.yshift = 0

I also see a log entry of "PDF operator W* is not implemented", any idea why this might show?

Werner

Hi David,

When I py2exe my app there are two more variables not initialized.

     self.xshift = 0

     self.yshift = 0

I can of course make sure these are initialised, but are you able to send a trace-back of when these occur? I am curious why these, along with the other variables you reported to me earlier, raise errors for you but not in any of the cases where I use the viewer myself.

I also see a log entry of “PDF operator W* is not implemented”, any idea
why this might show?

W and W* are clipping path operators which modify the current clipping path in different ways. The closed subpaths of the clipping path itself define the areas that can be painted. Marks inside the path are applied to the page, marks outside are not. W and W* are not currently implemented using pyPdf or pyPDF2 but probably are with fitz/mupdf.

David Hughes

···

On Saturday, July 20, 2013 2:56:55 PM UTC+1, werner wrote:

Give me a day or so to get you the trace-back.
I think it has to do with me having the viewer on a “sized_control”
parent.
O.K. I am using pyPDF2 that explains that.
See you
Werner

···

Hi David,

  On 22/07/2013 13:01, David Hughes wrote:
  On Saturday, July 20, 2013 2:56:55 PM UTC+1, werner wrote:
    Hi David,




    When I py2exe my app there are two more variables not

initialized.

             self.xshift = 0


             self.yshift = 0
    I am curious why these, along with the other variables you

reported to me earlier, raise errors for you but not in any of
the cases where I use the viewer myself.

    I also see a log entry of "PDF operator W* is not implemented",

any idea

    why this might show?
    W and W* are clipping path operators which modify the *          current

clipping path* in different ways. The closed subpaths of
the clipping path itself define the areas that can be painted.
Marks inside the path are applied to the page, marks outside are
not. W and W* are not currently implemented using pyPdf or
pyPDF2 but probably are with fitz/mupdf.

David Hughes wrote:

Following recent discussions on this list, I have amended the pdf viewer
to search in turn for, and work with, the modules python-fitz/mupdf,
pypdf2 and pypdf. A patch to the current version of viewer.py is
attached, together with one for consequential changes to demo/pdfviewer.py.

These have kindly been tested by Werner so I submit them for acceptance
into wxPython Classic.

The following is a link to fitz.zip which contains the win32, Python 2.7
build of python-fitz (can be copied directly into site-packages) as the
setup.py and mupdf library names for Windows are not the same as those
provided in the python-fitz-master download.

LogMeIn - Remote Access and Desktop Control Software

Hi David,

I'm finally getting around to reviewing these patches and I have run into some exceptions while testing. Using the pyPdf package I'm seeing this:

Traceback (most recent call last):
   File "/Users/robind/projects/wx/2.9/wxPython/demo/PDFViewer.py", line 49, in OnLoadButton
     self.viewer.LoadFile(dlg.GetPath())
   File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 221, in LoadFile
     self.pdfdoc.DrawFile(self.frompage, self.topage)
   File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 586, in DrawFile
     pdf_fonts = self.FetchFonts(self.page)
   File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 637, in FetchFonts
     fonts = currentobject["/Resources"].getObject()['/Font']
   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyPdf/generic.py", line 480, in __getitem__
     return dict.__getitem__(self, key).getObject()
KeyError: '/Font'

and this:

Traceback (most recent call last):
   File "/Users/robind/projects/wx/2.9/wxPython/demo/PDFViewer.py", line 49, in OnLoadButton
     self.viewer.LoadFile(dlg.GetPath())
   File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 221, in LoadFile
     self.pdfdoc.DrawFile(self.frompage, self.topage)
   File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 588, in DrawFile
     self.page.extractOperators(), pdf_fonts)
   File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 720, in ProcessOperators
     drawlist.extend(self.InsertXObject(operand[0]))
   File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 893, in InsertXObject
     dlist.append(self.AddBitmap(stream._data, width, height, filters))
   File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 919, in AddBitmap
     bitmap = wx.BitmapFromBuffer(width, height, data)
   File "/Users/robind/projects/wx/2.9/wxPython/wx/_gdi.py", line 938, in BitmapFromBuffer
     return _gdi_._BitmapFromBuffer(width, height, dataBuffer)
TypeError: in method '_BitmapFromBuffer', expected argument 2 of type 'int'

I tried using int() at that line of code and apparently the parameter is an IndirectObject instead of something that is convertible to an int.

I got the same errors after upgrading to pyPdf2.

···

--
Robin Dunn
Software Craftsman

Robin,

Are these happening with any pdf file?

I don't get these exceptions with pyPDF2 and I use this latest code for a bit now.

Werner

···

On 11/08/2013 01:44, Robin Dunn wrote:

David Hughes wrote:

Following recent discussions on this list, I have amended the pdf viewer
to search in turn for, and work with, the modules python-fitz/mupdf,
pypdf2 and pypdf. A patch to the current version of viewer.py is
attached, together with one for consequential changes to demo/pdfviewer.py.

These have kindly been tested by Werner so I submit them for acceptance
into wxPython Classic.

The following is a link to fitz.zip which contains the win32, Python 2.7
build of python-fitz (can be copied directly into site-packages) as the
setup.py and mupdf library names for Windows are not the same as those
provided in the python-fitz-master download.

LogMeIn - Remote Access and Desktop Control Software

Hi David,

I'm finally getting around to reviewing these patches and I have run into some exceptions while testing. Using the pyPdf package I'm seeing this:

Traceback (most recent call last):
  File "/Users/robind/projects/wx/2.9/wxPython/demo/PDFViewer.py", line 49, in OnLoadButton
    self.viewer.LoadFile(dlg.GetPath())
  File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 221, in LoadFile
    self.pdfdoc.DrawFile(self.frompage, self.topage)
  File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 586, in DrawFile
    pdf_fonts = self.FetchFonts(self.page)
  File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 637, in FetchFonts
    fonts = currentobject["/Resources"].getObject()['/Font']
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyPdf/generic.py", line 480, in __getitem__
    return dict.__getitem__(self, key).getObject()
KeyError: '/Font'

and this:

Traceback (most recent call last):
  File "/Users/robind/projects/wx/2.9/wxPython/demo/PDFViewer.py", line 49, in OnLoadButton
    self.viewer.LoadFile(dlg.GetPath())
  File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 221, in LoadFile
    self.pdfdoc.DrawFile(self.frompage, self.topage)
  File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 588, in DrawFile
    self.page.extractOperators(), pdf_fonts)
  File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 720, in ProcessOperators
    drawlist.extend(self.InsertXObject(operand[0]))
  File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 893, in InsertXObject
    dlist.append(self.AddBitmap(stream._data, width, height, filters))
  File "/Users/robind/projects/wx/2.9/wxPython/wx/lib/pdfviewer/viewer.py", line 919, in AddBitmap
    bitmap = wx.BitmapFromBuffer(width, height, data)
  File "/Users/robind/projects/wx/2.9/wxPython/wx/_gdi.py", line 938, in BitmapFromBuffer
    return _gdi_._BitmapFromBuffer(width, height, dataBuffer)
TypeError: in method '_BitmapFromBuffer', expected argument 2 of type 'int'

I tried using int() at that line of code and apparently the parameter is an IndirectObject instead of something that is convertible to an int.

I got the same errors after upgrading to pyPdf2.

Would you be able to provide a sample pdf that raises these errors,
please? I probably won’t be able to implement any missing features
(that’s what the mupdf library is for) but I’ll try and make it fail
more gracefully if necessary.

···

On 11/08/2013 00:44, Robin Dunn wrote:

  I'm

finally getting around to reviewing these patches and I have run
into some exceptions while testing. Using the pyPdf package I’m
seeing this:

  Traceback (most recent call last):


   ....

    File

“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyPdf/generic.py”,
line 480, in getitem

      return dict.__getitem__(self, key).getObject()


  KeyError: '/Font'




  and this:




  ....

    File "_gdi.py", line 938, in

BitmapFromBuffer
return ._BitmapFromBuffer(width,
height, dataBuffer)
TypeError: in method ‘_BitmapFromBuffer’, expected argument 2 of
type ‘int’
I tried using int() at that line of code and apparently the
parameter is an IndirectObject instead of something that is
convertible to an int.
I got the same errors after upgrading to pyPdf2.


-- Regards
David Hughes
Forestfield Software

Users/robind/projects/wx/2.9/wxPython/wx//
gdi__

Hi Werner,

Do you have any more information about this yet? Also, if it is

possible, would you be able to send me an example (privately if you
prefer) of how your sized-control parent interacts with the viewer.

···

On 22/07/2013 12:15, werner wrote:

      When I

py2exe my app there are two more variables not initialized.

               self.xshift = 0

               self.yshift = 0

Give me a day or so to get you the trace-back.

      I am curious why these, along with the other variables you

reported to me earlier, raise errors for you but not in any of
the cases where I use the viewer myself.

  I think it has to do with me having the viewer on a

“sized_control” parent.


-- Regards
David Hughes
Forestfield Software

Attached is the pdfviewer code I use within my own application,
reduced it to not have all my database and report generation stuff
(I now use Dabo).
I also attached the test file I use in that script, put it somewhere
and change line 47 in the script.
Werner

pdfviewerissue.py (1.69 KB)

Cellarbook listing - portrait.pdf (13.2 KB)

···

Hi David,

  On 11/08/2013 16:58, David Hughes wrote:

Hi Werner,

  Do you have any more information about this yet? Also, if it is

possible, would you be able to send me an example (privately if
you prefer) of how your sized-control parent interacts with the
viewer.

On 22/07/2013 12:15, werner wrote:

        When I

py2exe my app there are two more variables not initialized.

                 self.xshift = 0

                 self.yshift = 0

Give me a day or so to get you the trace-back.

        I am curious why these, along with the other variables

you reported to me earlier, raise errors for you but not in
any of the cases where I use the viewer myself.

    I think it has to do with me having the viewer on a

“sized_control” parent.

David Hughes wrote:

I'm finally getting around to reviewing these patches and I have run
into some exceptions while testing. Using the pyPdf package I'm seeing
this:

Traceback (most recent call last):
....
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyPdf/generic.py",
line 480, in __getitem__
return dict.__getitem__(self, key).getObject()
KeyError: '/Font'

and this:

....
File "/Users/robind/projects/wx/2.9/wxPython/wx/_gdi.py", line 938, in
BitmapFromBuffer
return _gdi_._BitmapFromBuffer(width, height, dataBuffer)
TypeError: in method '_BitmapFromBuffer', expected argument 2 of type
'int'

I tried using int() at that line of code and apparently the parameter
is an IndirectObject instead of something that is convertible to an int.

I got the same errors after upgrading to pyPdf2.

Would you be able to provide a sample pdf that raises these errors,
please? I probably won't be able to implement any missing features
(that's what the mupdf library is for) but I'll try and make it fail
more gracefully if necessary.

I can share one of them. I'll send it to you off-list.

···

On 11/08/2013 00:44, Robin Dunn wrote:

--
Robin Dunn
Software Craftsman

werner wrote:

Are these happening with any pdf file?

I don't get these exceptions with pyPDF2 and I use this latest code for
a bit now.

I just tested with a couple PDFs I happened to have sitting on my desktop. It looks like both of them are image-heavy.

···

--
Robin Dunn
Software Craftsman

I couldn’t reproduce the KeyError exception but I have wrapped it in
a try statement. After resolving the TypeError it turned out to be
coming from an encoded image requiring which
is not included in PyPDF2 or pyPdf. These are usually scanned images
of part or all of a document page and in fact the pdf you sent me,
Robin, consisted of nothing but 9 such complete page images. The
program now detects these images and silently ignores them, or
reports them as unimplemented if VERBOSE is True. They can in
principle be decoded as TIFF images but I don’t know if or when I
will follow this up.
The attached patch replaces the one I posted on July 20.
Additionally, it contains the above changes plus a fix for issues
raised since then by Werner, which he has received and tested.

viewer.py.patch (34.6 KB)

···

On 12/08/2013 20:43, Robin Dunn wrote:

  David

Hughes wrote:

    On 11/08/2013 00:44, Robin Dunn wrote:
      I'm finally getting around to reviewing

these patches and I have run

      into some exceptions while testing. Using the pyPdf package

I’m seeing

      this:




      Traceback (most recent call last):


      ....


      File

“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyPdf/generic.py”,

      line 480, in __getitem__


      return dict.__getitem__(self, key).getObject()


      KeyError: '/Font'




      and this:




      ....


      File "/Users/robind/projects/wx/2.9/wxPython/wx/_gdi.py", line

938, in

      BitmapFromBuffer


      return _gdi_._BitmapFromBuffer(width, height, dataBuffer)


      TypeError: in method '_BitmapFromBuffer', expected argument 2

of type

      'int'




      I tried using int() at that line of code and apparently the

parameter

      is an IndirectObject instead of something that is convertible

to an int.

      I got the same errors after upgrading to pyPdf2.
    Would you be able to provide a sample pdf that raises these

errors,

    please? I probably won't be able to implement any missing

features

    (that's what the mupdf library is for) but I'll try and make it

fail

    more gracefully if necessary.
  I can share one of them.  I'll send it to you off-list.

CCITTFaxDecode


-- Regards
David Hughes
Forestfield Software

David Hughes wrote:

David Hughes wrote:

I'm finally getting around to reviewing these patches and I have run
into some exceptions while testing. Using the pyPdf package I'm seeing
this:

Traceback (most recent call last):
....
File
"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyPdf/generic.py",

line 480, in __getitem__
return dict.__getitem__(self, key).getObject()
KeyError: '/Font'

and this:

....
File "/Users/robind/projects/wx/2.9/wxPython/wx/_gdi.py", line 938, in
BitmapFromBuffer
return _gdi_._BitmapFromBuffer(width, height, dataBuffer)
TypeError: in method '_BitmapFromBuffer', expected argument 2 of type
'int'

I tried using int() at that line of code and apparently the parameter
is an IndirectObject instead of something that is convertible to an
int.

I got the same errors after upgrading to pyPdf2.

Would you be able to provide a sample pdf that raises these errors,
please? I probably won't be able to implement any missing features
(that's what the mupdf library is for) but I'll try and make it fail
more gracefully if necessary.

I can share one of them. I'll send it to you off-list.

I couldn't reproduce the KeyError exception but I have wrapped it in a
try statement. After resolving the TypeError it turned out to be coming
from an encoded image requiring /CCITTFaxDecode/ which is not included
in PyPDF2 or pyPdf. These are usually scanned images of part or all of a
document page and in fact the pdf you sent me, Robin, consisted of
nothing but 9 such complete page images. The program now detects these
images and silently ignores them, or reports them as unimplemented if
VERBOSE is True. They can in principle be decoded as TIFF images but I
don't know if or when I will follow this up.

The attached patch replaces the one I posted on July 20. Additionally,
it contains the above changes plus a fix for issues raised since then by
Werner, which he has received and tested.

Thanks.

···

On 12/08/2013 20:43, Robin Dunn wrote:

On 11/08/2013 00:44, Robin Dunn wrote:

--
Robin Dunn
Software Craftsman

Hi David,

Please,

I would like to test the PDFViewer. But the link LogMeIn - Remote Access and Desktop Control Software is broken.

You have the Fitz win32 zip file?

Thank’s

···

Em sábado, 20 de julho de 2013 05h41min50s UTC-3, David Hughes escreveu:

Following recent discussions on this list, I have amended the pdf viewer
to search in turn for, and work with, the modules python-fitz/mupdf,
pypdf2 and pypdf. A patch to the current version of viewer.py is
attached, together with one for consequential changes to demo/pdfviewer.py.

These have kindly been tested by Werner so I submit them for acceptance
into wxPython Classic.

The following is a link to fitz.zip which contains the win32, Python
2.7 build of python-fitz (can be copied directly into site-packages) as
the setup.py and mupdf library names for Windows are not the same as
those provided in the python-fitz-master download.

https://secure.logmein.com/f?00_uW54mugCyMFuchyl.U9asROONMt.ZTFCehWS.msA


Regards

David Hughes

Forestfield Software

i David,

Please,

I would like to test the PDFViewer. But the link LogMeIn - Remote Access and Desktop Control Software is broken.

You have the Fitz win32 zip file?

Thank’s

···

Em sábado, 20 de julho de 2013 05h41min50s UTC-3, David Hughes escreveu:

Following recent discussions on this list, I have amended the pdf viewer
to search in turn for, and work with, the modules python-fitz/mupdf,
pypdf2 and pypdf. A patch to the current version of viewer.py is
attached, together with one for consequential changes to demo/pdfviewer.py.

These have kindly been tested by Werner so I submit them for acceptance
into wxPython Classic.

The following is a link to fitz.zip which contains the win32, Python
2.7 build of python-fitz (can be copied directly into site-packages) as
the setup.py and mupdf library names for Windows are not the same as
those provided in the python-fitz-master download.

https://secure.logmein.com/f?00_uW54mugCyMFuchyl.U9asROONMt.ZTFCehWS.msA


Regards

David Hughes

Forestfield Software

It had expired because the sharing service only allows a maximum period of 99 days :frowning:

It has been re-activated as LogMeIn - Remote Access and Desktop Control Software - valid until 27 November 2014

···

On 19/08/2014 15:41, suporte.itksoft@gmail.com wrote:

I would like to test the PDFViewer. But the link https:/… is broken.

--
Regards

David Hughes
Forestfield Software

Hi David,

Thanks for your help.

The fitz does not open object file (StringIO)? Only path in string?

···

Em quarta-feira, 20 de agosto de 2014 12h29min34s UTC-3, David Hughes escreveu:

On 19/08/2014 15:41, suporte...@gmail.com wrote:

I would like to test the PDFViewer. But the link https:/… is
broken.

It had expired because the sharing service only allows a maximum period
of 99 days :frowning:

It has been re-activated as http://bit.ly/1oZBOGI - valid until 27
November 2014


Regards

David Hughes

Forestfield Software

Yes, as I recall, the fitz library would only accept a non-unicode file or path name string.

···

On 20/08/2014 19:16, suporte.itksoft@gmail.com wrote:

Thanks for your help.
The fitz does not open object file (StringIO)? Only path in string?

--
Regards

David Hughes
Forestfield Software

So sorry to be a bother, but could you upload the fitz.zip file one more time? It would appear that I’ve arrived to this link a couple of months too late. Thank you

···

On Thursday, August 21, 2014 at 5:03:05 AM UTC-5, David Hughes wrote:

On 20/08/2014 19:16, suporte...@gmail.com wrote:

Thanks for your help.

The fitz does not open object file (StringIO)? Only path in string?

Yes, as I recall, the fitz library would only accept a non-unicode file
or path name string.


Regards

David Hughes

Forestfield Software

Unfortunately the file link mechanism had a maximum lifetime of 99 days. I must get round to using something more sensible but meanwhile the latest 99 day link is Regards
David Hughes
Forestfield Software

···

http://bit.ly/1CTu0zn