File output of characters chr(128) through chr(255)

This might help:
http://www.joelonsoftware.com/articles/Unicode.html

basically… because you are not using ASCII you should encode the content you try to write… like this:

b = s.encode(“UTF-8”)
f.write(b)

where s is the text you want to write.

Peter

···

On 10/4/07, Bob Klahn < bobklahn@comcast.net> wrote:

This is not a wxPython problem per se, but it is a critical problem I
have in a wxPython application, so I hope you’ll humor me. I hope
I’m missing something basic; I’m prepared to be embarrassed.
I have literally thousands of flat files which contain “extended
ASCII” (ord = 128 to 255 inclusive) characters. I can read
them all, but I can’t seem to write them, as the typical error I get is
“UnicodeEncodeError: ‘ascii’ codec can’t decode character u’\xdb’ in
position 4: ordinal not in range(128)”.
I simply need to write to disk the exact ordinal values, all between 0
and 255 inclusive, as is. E.g., a byte with value hex DB (decimal
219) needs to be written to disk as the byte hex DB.
How do I avoid codec issues? There’s no codec out there (that I
know of) that will allow ordinal values 0 through 255 to be written
“as is.”
Since I can’t write these files, I’m surprised I can read them. I’m
using the built-in open for both reading and writing.

I’m using wxPython 2.8.4 Unicode with Python 2.5.1 under Windows XP
SP2.

Bob


There is NO FATE, we are the creators.

Peter and Grzegorz, I found the problem, and it actually was
wxPython-related, indirectly.

I didn’t think I was using Unicode anywhere, so that’s why I couldn’t
understand why my program seemed to think I was.

But of course I was using Unicode. Specifically, I was picking up
some keystrokes via evt.GetUnicodeKey(), which I had gleaned from
wxPython’s KeyEvents demo months ago, when I thought I would need to use
Unicode. I had of course forgotten all that!

Thanks to both of you for your help. Peter, I especially appreciate
the joeloonsoftware link below, and Grzegorz, your “The problem is
most likely… why are you converting those characters to unicode in
first place if you don’t need to?” was right on.

Bob

···

At 05:59 AM 10/4/2007, you wrote:

This might help:

http://www.joelonsoftware.com/articles/Unicode.html

basically… because you are not using ASCII you should encode the
content you try to write… like this:
b = s.encode(“UTF-8”)
f.write(b)
where s is the text you want to write.
Peter
On 10/4/07, Bob Klahn < > bobklahn@comcast.net> wrote:
This is not a wxPython problem per se, but it is a critical problem I
have in a wxPython application, so I hope you’ll humor me. I hope
I’m missing something basic; I’m prepared to be embarrassed.

I have literally thousands of flat files which contain “extended
ASCII” (ord = 128 to 255 inclusive) characters. I can read
them all, but I can’t seem to write them, as the typical error I get is
“UnicodeEncodeError: ‘ascii’ codec can’t decode character u’\xdb’ in
position 4: ordinal not in range(128)”.

I simply need to write to disk the exact ordinal values, all between
0 and 255 inclusive, as is. E.g., a byte with value hex DB (decimal
219) needs to be written to disk as the byte hex DB.

How do I avoid codec issues? There’s no codec out there (that I
know of) that will allow ordinal values 0 through 255 to be written
“as is.”

Since I can’t write these files, I’m surprised I can read them.
I’m using the built-in open for both reading and writing.

I’m using wxPython 2.8.4 Unicode with Python 2.5.1 under Windows XP
SP2.

Bob

Peter Damoc escribió:

This might help:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – Joel on Software

basically... because you are not using ASCII you should encode the content you try to write... like this:

b = s.encode("UTF-8")
f.write(b)

where s is the text you want to write.

Unfortunately that will split the byte value he wants to write into an escape sequence.

···

--
Rastertech España S.A.
  Grzegorz Adam Hankiewicz
/Jefe de Producto TeraVial/

C/ Perfumería 21. Nave I. Polígono industrial La Mina
28770 Colmenar Viejo. Madrid (España)
Tel. +34 918 467 390 (Ext.17) *·* Fax +34 918 457 889
ghankiewicz@rastertech.es *·* www.rastertech.es <http://www.rastertech.es/&gt;

Are you sure?
I tried it with :

f = open(“test.txt”, “rb”)
s = f.read().decode(“UTF-8”)
for c in s:
print ord(c)

and I had no issue.

Peter

···

On 10/4/07, Grzegorz Adam Hankiewicz ghankiewicz@rastertech.es wrote:

Unfortunately that will split the byte value he wants to write into an
escape sequence.


There is NO FATE, we are the creators.

Peter Damoc escribió:

    Unfortunately that will split the byte value he wants to write into an
    escape sequence.

Are you sure?
I tried it with :
f = open("test.txt", "rb")
s = f.read().decode("UTF-8")
for c in s:
    print ord(c)

and I had no issue.

Bob was willing to create a file with arbitrary bytes in the range 0-255. Fair enough, try your code reading this file then:

In [1]: output = open("test.txt", "wb")
In [2]: output.write(chr(219))
In [3]: output.close()

Besides, you code deals with reading, while I just said that encoding the data as utf-8 would split the characters in escape sequences when they get over chr(127), so writting the utf-8 of unichr(219) wouldn't be useful for his purpose:

In [1]: unichr(219)
Out[1]: u'\xdb'
In [2]: unichr(219).encode("utf8")
Out[2]: '\xc3\x9b'

···

On 10/4/07, *Grzegorz Adam Hankiewicz* <ghankiewicz@rastertech.es > <mailto:ghankiewicz@rastertech.es>> wrote:

--
Rastertech España S.A.
  Grzegorz Adam Hankiewicz
/Jefe de Producto TeraVial/

C/ Perfumería 21. Nave I. Polígono industrial La Mina
28770 Colmenar Viejo. Madrid (España)
Tel. +34 918 467 390 (Ext.17) *·* Fax +34 918 457 889
ghankiewicz@rastertech.es *·* www.rastertech.es <http://www.rastertech.es/&gt;