The alternative is to have "encode/decode" all over the place.
Hi,
No, when working with a Unicode application you should only do
conversions on input/output points and use Unicode everywhere
internally.
Agreed.
Either way one still has to pay attention to the encoding and think about
it.
I use it in my application for a few years now (since I switched to use the
Unicode build of wxPython and have all my character stuff defined as unicode
and all my script files have "# -*- coding: utf-8 -*-#") and it hasn't
caused me any problems but reduced the number of places where I needed to
use encode/decode.
The '#-*- coding...' line has nothing to do with how strings and bytes
are interpreted within your application. That line is only to tell the
interpreter how to handle the text in your script.
That is confusing to me. I thought it defines the encoding of the .py file, i.e. any string/constant/remark and IIRC one also has to ensure that the editor one is using uses the same encoding otherwise it can get confusing.
What is "ListCtrlPrinter"?
It is a wrapper of wx.Printout in ObjectListView.
I don't think that this is a standard
wxPython class. But my guess would be that may be passing raw Unicode
bytes to whatever is being used to create the pdf, where that code is
expecting an encoded string.
The modified code I had posted worked for me on Windows/Py 2.6/wxPython 2.8, i.e. the accented characters were correct on the monitor, in preview, on print output and viewing the PDF but the OP still had a problem but he is on *nix.
Yes, but I think it's just to specify the encoding of unicode string literals. IOW, how do you convert the foo in u"foo" to a unicode value at compile time, so the value embedded in the byte-code for that literal will be a unicode object. See section 2.2.3 at 2. Using the Python Interpreter — Python 3.13.0 documentation
The value returned by sys.getdefaultencoding() is what is used by default to convert to/from string and unicode values when coerced by type specific code. For example
str(unicode_value)
s = "converted to string: %s" % unicode_value
And also calls out to extension module functions that use APIs like PyArg_ParseTuple and specify that either a string or a unicode type is expected.
···
On 10/25/10 2:10 PM, werner wrote:
The '#-*- coding...' line has nothing to do with how strings and bytes
are interpreted within your application. That line is only to tell the
interpreter how to handle the text in your script.
That is confusing to me. I thought it defines the encoding of the .py
file, i.e. any string/constant/remark and IIRC one also has to ensure
that the editor one is using uses the same encoding otherwise it can get
confusing.
The value returned by sys.getdefaultencoding() is what is used by
default to convert to/from string and unicode values when coerced by
type specific code. For example
str\(unicode\_value\)
s = "converted to string: %s" % unicode\_value
And probably in absurde cases like this:
'abcé'.encode('utf-8')
Traceback (most recent call last):
File "<psi last command>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in
position 3: ordinal not in range(128)
but this logically works:
'abc'.encode('utf-8')
abc
jmf
···
On Oct 25, 11:36 pm, Robin Dunn <ro...@alldunn.com> wrote:
You need to declare the encoding in the module (let it be written on the first line – or on the second line if #! /usr/bin/env python is declared as well, which needs to be at the very top) by # coding=utf-8 or # -- coding: utf-8 --
Then just add a ‘u’ in front of the string literal, like this: u’abcé’
What exactly makes you think this is absurd ? It seems quite
logical ?
Karsten
···
On Tue, Oct 26, 2010 at 06:59:47AM -0700, jmfauth wrote:
And probably in absurde cases like this:
>>> 'abc�'.encode('utf-8')
Traceback (most recent call last):
File "<psi last command>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in
position 3: ordinal not in range(128)
You need to declare the encoding in the module (let it be written on the
first line -- or on the second line if #! /usr/bin/env python is declared
as well, which needs to be at the very top) by # coding=utf-8 or # -*-
coding: utf-8 -*-
No. Robin Dunn has already replied to such a comment here
Because, 'abc' is a string of type <str>. It *has* a coding,
it *is* in a coding format, but it can not *be encoded*.
Only strings of type <unicode> can be encoded.
jmf
···
On Oct 26, 4:27 pm, Karsten Hilbert <Karsten.Hilb...@gmx.net> wrote:
On Tue, Oct 26, 2010 at 06:59:47AM -0700, jmfauth wrote:
> And probably in absurde cases like this:
> >>> 'abc '.encode('utf-8')
> Traceback (most recent call last):
> File "<psi last command>", line 1, in <module>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in
> position 3: ordinal not in range(128)
What exactly makes you think this is absurd ? It seems quite
logical ?
> > And probably in absurde cases like this:
>
> > >>> 'abc '.encode('utf-8')
> > Traceback (most recent call last):
> > � File "<psi last command>", line 1, in <module>
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in
> > position 3: ordinal not in range(128)
>
> What exactly makes you think this is absurd ? It seems quite
> logical ?
>
Because, 'abc' is a string of type <str>. It *has* a coding,
Namely either sys.getdefaultencoding() or of the encoding
that was put at the top of the file into the coding
directive. That's the knack.
it *is* in a coding format, but it can not *be encoded*.
Aha, I see.
Does the file you see this in have a coding directive ? I
would assume it doesn't. If that's true the following
happens:
- python sees the string in the file
- python sees the request for "turning" it into utf8
- python searches for the *current* encoding of the string
- python does not find anything at the top of the file
- python looks at sys.getdefaultencoding
- python finds "ascii"
- python tries to (internally)
- turn the (supposedly) "ascii"-encoded string into unicode by
doing 'abc-strange_e'.decode('ascii') (which, of course, fails)
- because it needs the unicode-version thereof to turn
*that* into utf8
Does that make sense ?
Karsten
···
On Tue, Oct 26, 2010 at 07:45:16AM -0700, jmfauth wrote:
On Oct 26, 4:27�pm, Karsten Hilbert <Karsten.Hilb...@gmx.net> wrote:
> On Tue, Oct 26, 2010 at 06:59:47AM -0700, jmfauth wrote:
Does the file you see this in have a coding directive ? I
would assume it doesn't. If that's true the following
happens:
- python sees the string in the file
- python sees the request for "turning" it into utf8
- python searches for the *current* encoding of the string
- python does not find anything at the top of the file
If my understanding is correct then those two steps do not actually happen. The coding specification at the top of the file is used at compile time, not run time, and string objects do not have a "current encoding", they are just a series of bytes and Python does not keep track of any encoding information about them. The only time Python knows what non-default encoding to use for converting a string object to a unicode object is when you specify it by passing the name to decode(), otherwise it uses the default.
···
On 10/26/10 8:13 AM, Karsten Hilbert wrote:
- python looks at sys.getdefaultencoding
- python finds "ascii"
- python tries to (internally)
- turn the (supposedly) "ascii"-encoded string into unicode by
doing 'abc-strange_e'.decode('ascii') (which, of course, fails)
- because it needs the unicode-version thereof to turn
*that* into utf8
>- python sees the string in the file
>- python sees the request for "turning" it into utf8
>- python searches for the *current* encoding of the string
>- python does not find anything at the top of the file
If my understanding is correct then those two steps do not actually
happen. The coding specification at the top of the file is used at
compile time, not run time, and string objects do not have a "current
encoding",
I should have written:
- python searches for the *current* encoding of the string
- python does not find anything at the top of the file
- python looks at sys.getdefaultencoding
which is more correct but still wrong
they are just a series of bytes and Python does not keep
track of any encoding information about them.
The only time Python
knows what non-default encoding to use for converting a string object
to a unicode object is when you specify it by passing the name to
decode(), otherwise it uses the default.
That explains it, IMO, at any rate.
Karsten
···
On Tue, Oct 26, 2010 at 12:30:08PM -0700, Robin Dunn wrote:
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346