On Thu, 3 Aug 2006 17:04:08 +1200, "Thomas Thomas" <thomas@mindz-i.co.nz> wr
>Then you need to figure out what encoding the file is using.
how do I do this..
> > it will be as starightforward as copying and pasting the content below
> onto notepad
> ------------------------
> string MetaDataPrompt = "Discovery No";
> string MetaDataFieldName = "Discovery No";
> string MetaDataType = "string";
> string MetaDataValue = "£500";
> string MetaDataPrompt = "comments";
> string MetaDataFieldName = "Comments";
> string MetaDataType = "string";
> string MetaDataValue = "Energy Scope £500";
> -----------------------------------------------------
> > and try reading it from that file..
>If you do that, you will have an 8-bit file encoded with whatever your
>system's default encoding is.
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'mbcs'
>>>
>>> d=u'ENERGY SCOPE \xa3500'
>>> c = 'ENERGY SCOPE \xa3500'
>>> c.decode('latin-1') == d
True
>>>
Thanks Josiah. this will work for me
>Because of that, Python, by default, does not assume an encoding. When
>it encounters a byte outside of the standard ASCII range (0-127), it pukes.
>It is quite likely that your file is iso-8859-1. Try:
> inifile = codec.open(filename, 'r', encoding='iso-8859-1')
Thanks tim this works fine.. But I still cant understand why . bcz
the system says the default encoding is ascii and
inifile = codec.open(filename, 'r', encoding='latin-1')
works fine as well.giving me the desired results ..
text = f.read().split('\n')
>>> text
[u'string MetaDataPrompt = "Discovery No";\r', u'\r', u'string MetaDataFieldName = "Discovery No";\r', u'\r', u'string MetaDataType = "string";\r', u'\r', u'string MetaDataValue = "\xa3500";\r', u'\r', u'string MetaDataPrompt = "comments";\r', u'\r', u'string MetaDataFieldName = "Comments";\r', u'\r', u'string MetaDataType = "string";\r', u'\r', u'string MetaDataValue = "Energy Scope \xa3500";\r', u'']
>>>
I was thinking u'' stands for unicode then how come when i tried 'iso-8859-1' and 'latin-1' python giving me list of unicode encoded values. Dont we have to use utf-8 or 16 for that.
or to be more simple how both this will work
>>> a='string MetaDataValue = "Energy Scope \xa3500";\r'
>>> b='string MetaDataValue = "Energy Scope \xa3500";\r'
>>> a==b
True
>>> b=u'string MetaDataValue = "Energy Scope \xa3500";\r'
>>> a==b
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa3 in position 37: ordinal not in range(128)
>>> a.decode('latin-1')==b
True
>>> a.decode('iso-8859-1')==b
True
>>>
finally I want to know which is best way to proceed in cases like this. I will think I find the encoding of the file and try opening the file using that.
how do I do that.
Thank you very much
···
Thomas Thomas
thomas@mindz-i.co.nz
Phone. +64 7 855 8478
Fax. +64 7 855 8871