unicode handling

Hi all,

I have a file with special characters such as “£” etc. I need to read the file into a list as unicode strings…

How can I do this… I tried codecs

import codecs
filename=‘d:/poll/test.XST’
metaHash={}
infile = codecs.open(filename, “r”, encoding=‘utf-16’)
text = infile.read().split(’\n’)
print text

I am getting the error

Traceback (most recent call last):
File “”, line 1, in ?
File “c:/DOCUME~1/ADMINI~1/LOCALS~1/Temp/python-1928Lij.py”, line 9, in ?
text = infile.read().split(’\n’)
File “C:\Python23\lib\codecs.py”, line 380, in read
return self.reader.read(size)
File “C:\Python23\lib\encodings\utf_16.py”, line 48, in read
raise UnicodeError,“UTF-16 stream does not start with BOM”
UnicodeError: UTF-16 stream does not start with BOM

also a sample file content will be

string MetaDataPrompt = “Discovery No”;

string MetaDataFieldName = “Discovery No”;

string MetaDataType = “string”;

string MetaDataValue = “£500”;

}

3{

string MetaDataPrompt = “comments”;

string MetaDataFieldName = “Comments”;

string MetaDataType = “string”;

string MetaDataValue = “Energy Scope £500”;

I know I should have asked this on python-list and not on wxpython … But when “£” is entered through the gui everything is working fine. But when I try reading it from a file I am having problems. So I thought I will try in here as well

any luck

Thomas

···

Thomas Thomas
thomas@mindz-i.co.nz
Phone. +64 7 855 8478
Fax. +64 7 855 8871

Thomas Thomas пишет:

Hi all,
I have a file with special characters such as "£" etc. I need to read the file into a list as unicode strings..
How can I do this.. I tried codecs
import codecs
filename='d:/poll/test.XST'

use os.path.join()

metaHash={}
infile = codecs.open(filename, "r", encoding='utf-16')
text = infile.read().split('\n')
print text
I am getting the error
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "c:/DOCUME~1/ADMINI~1/LOCALS~1/Temp/python-1928Lij.py", line 9, in ?
    text = infile.read().split('\n')
  File "C:\Python23\lib\codecs.py", line 380, in read
    return self.reader.read(size)
  File "C:\Python23\lib\encodings\utf_16.py", line 48, in read
    raise UnicodeError,"UTF-16 stream does not start with BOM"
UnicodeError: UTF-16 stream does not start with BOM

You are sure what file saved in utf-16 encoding?

for utf-16 BOM - 2 bytes (FE FF)
for utf-8 - 3 bytes (EF BB BF)

The trick is UTF-16 Big-endian or Little-endian. Because any "utf-16"
encoded file can have either ordering, a file encoded with the general
"utf-16" method must have a Byte Order Mark (BOM) to be able to
distinguish between the two.

If you know for certain which it was, you can open the file as
'utf-16-be' or 'utf-16-le'. The standard Python codecs do not
automatically prefix the output with a BOM, so you would need to prefix
it manually on output. If you have control over writing data, I would
suggest writing to utf-8, which doesn't have ordering concerns, tends to
be roughly 1/2 the size on disk as utf-16 (for texts with primarily
latin alphabets), etc.

- Josiah

···

"Thomas Thomas" <thomas@mindz-i.co.nz> wrote:

Hi all,

I have a file with special characters such as "£" etc. I need to
read the file into a list as unicode strings..
How can I do this.. I tried codecs

import codecs
filename='d:/poll/test.XST'
metaHash={}
infile = codecs.open(filename, "r", encoding='utf-16')
text = infile.read().split('\n')
print text

I am getting the error

Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "c:/DOCUME~1/ADMINI~1/LOCALS~1/Temp/python-1928Lij.py", line 9, in ?
    text = infile.read().split('\n')
  File "C:\Python23\lib\codecs.py", line 380, in read
    return self.reader.read(size)
  File "C:\Python23\lib\encodings\utf_16.py", line 48, in read
    raise UnicodeError,"UTF-16 stream does not start with BOM"
UnicodeError: UTF-16 stream does not start with BOM

also a sample file content will be
string MetaDataPrompt = "Discovery No";

string MetaDataFieldName = "Discovery No";

string MetaDataType = "string";

string MetaDataValue = "£500";

}

3{

string MetaDataPrompt = "comments";

string MetaDataFieldName = "Comments";

string MetaDataType = "string";

string MetaDataValue = "Energy Scope £500";

I know I should have asked this on python-list and not on wxpython ..
But when "£" is entered through the gui everything is working fine. But
when I try reading it from a file I am having problems. So I thought I
will try in here as well