Python, encodings, py2exe and psi

Tim_Roberts · April 6, 2005, 6:53pm

Well, I do not know to much what to do. I developped psi on my
win98 platform with an ANSI wxPython version (I should have
mentionned it). I know psi is working fine on all win32 platforms
(98, XP, 2000) using an ANSI wxPython version. I have no
experience with the Unicode wxPython version.

Unicode is the wave of the future. It makes so many of these issues just go away. You really should take the time to try it.

Either you do not put any encoding information in your
scripts and Python in not working properly or you define an
encoding and your Python modules are no more portable.
This can be illustated with this little sample code,
script a.py:

#-*- coding: iso-8859-1 -*-
print 'école, Äpfel'

Now, let us run a.py with and without the encoding info
in psi or PyShell.
Without the encoding line:

execfile('a.py')

sys:1: DeprecationWarning: Non-ASCII character '\xe9' in
file a.py on line 4, but no encoding declared; see
PEP 263 – Defining Python Source Code Encodings | peps.python.org for details

With the encoding info:

execfile('a.py')

école, Äpfel
It is ok, but the script is only valid in the iso-latin1 world.

I don't understand what you mean by that. If you have the encoding in the script, the script is valid all over the world, and the script has the exact same meaning all over the world. There is no guarantee that the string can actually be printed on a terminal that does not support iso-8859-1, but that's not a Python problem. If the local character set does not include the 'é' character, e with a grave accent, then there IS no way to print it, and it is an error to attempt to do so. The string can still be represented in memory (at least with Unicode), but it cannot be printed, and that's the way it should be. If the user wants to print a string with iso-8859-1-specific glyphs, he's going to need to have fonts that support iso-8859-1 code points.

Python gurus (I do not belong to this group) may also have
noticed, a DeprecationWarning also appears when accented /
european chars are used in comments and no encoding info
is given. That means, Python is really "interpreting"
comments and it does not just skip them. This is extremelly
annoying. Even if one attempt to make a script module
portable by not specifying any encoding, it is practically
impossible to comment the code in a language different than
English!

I don't understand why you use the word "portable" in that case. Omitting the encoding does not make the script "portable". It makes the script "erroneous". Yes, you can embed iso-8859-1 characters in a script file without an encoding, and then move that file to an Arabic computer, but that changes the meaning of the program! That's not portability. That's a bug. Surely it is better to have Python tell you that your program cannot be processed by the Arabic machine, rather than have it print characters that, for all you know, might represent some terribly ethnic slur in the local encoding.

o Question

Just out of curiosity.
Does anybody know, why Python has introduced the encoding in
that was, a comment in the script, and does not use a module
to define it? It would have been more pratical.

No, it wouldn't. It can't be a module. It has to be some kind of "meta" indicator that affects only the interpretation of this file.

The source encoding is a compile-time problem, not a run-time problem. We don't usually think of that distinction in Python, but it is there. If you made the encoding a module import, the setting becomes a run-time setting. That means that the order of execution becomes important. What encoding is used for strings before the import? What happens to strings in a module without an import? Does it inherit the encoding from its parent? Can a module export its encoding to files that import it? What does that mean?

What you need is some kind of pragma: something that says "all strings in THIS file use iso-8859-1 encoding; no promises about any other files." That's exactly what the comment thing does. It is a directive local to this file only. No other concept in Python is like that.

And, as it turns out, there was a precedent for this: emacs, at least, recognizes the exact comment string Python now uses. That's where the syntax came from.

You may argue the file in which you write you script must
know its own encoding. I will reply an application like (La)TeX
is using a "module version".
%preamble
\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}

The analogy is not valid. LaTex does not have compile-time and run-time behavior. The backslash commands in LaTex essentially ARE a form of pragma: it is a configuration setting, separate from the text being rendered.

···

On Wed, 6 Apr 2005 18:07:48 +0200, "Jean-Michel Fauth" <jmfauth@bluewin.ch> wrote:

--
- Tim Roberts, timr@probo.com
Providenza & Boekelheide, Inc.