Replace and inserting strings within .txt files with the use of regex

Peter Otten __peter__ at web.de
Mon Aug 9 06:06:31 EDT 2010


Νίκος wrote:

> On 9 Αύγ, 11:45, Peter Otten <__pete... at web.de> wrote:
>> Νίκος wrote:
>> > On 9 Αύγ, 10:38, Peter Otten <__pete... at web.de> wrote:
>> >> Νίκος wrote:
>> >> > Now the code looks as follows:
>> >> > for currdir, files, dirs in os.walk('test'):
>>
>> >> > for f in files:
>>
>> >> > if f.endswith('php'):
>>
>> >> > # get abs path to filename
>> >> > src_f = join(currdir, f)
>> >> > I just tried to test it. I created a folder names 'test' in me 'd:\'
>> >> > drive.
>> >> > Then i have put to .php files inside form the original to test if it
>> >> > would work ok for those too files before acting in the whole copy
>> >> > and after in the original project.
>>
>> >> > so i opened a 'cli' form my Win7 and tried
>>
>> >> > D:\>convert.py
>>
>> >> > D:\>
>>
>> >> > Itsjust printed an empty line and nothign else. Why didn't even try
>> >> > to open the folder and fiels within?
>> >> > Syntactically it doesnt ghive me an error!
>> >> > Somehting with os.walk() methos perhaps?
>>
>> >> If there is a folder D:\test and it does contain some PHP files
>> >> (double- check!) the extension could be upper-case. Try
>>
>> >> if f.lower().endswith("php"): ...
>>
>> >> or
>>
>> >> php_files = fnmatch.filter(files, "*.php")
>> >> for f in php_files: ...
>>
>> >> Peter
>>
>> > The extension is in in lower case. folder is there, php files is
>> > there, i dont know why it doesnt't want to go into the d:\test to find
>> > them.
>>
>> > Thast one problem.
>>
>> > The other one is:
>>
>> > i made the code simpler by specifying the filename my self.
>>
>> > =========================
>> > # get abs path to filename
>> > src_f = 'd:\\test\\index.php'
>>
>> > # open php src file
>> > print ( 'reading from %s' % src_f )
>> > f = open(src_f, 'r')
>> > src_data = f.read()                # read contents of PHP file
>> > f.close()
>> > =========================
>>
>> > but  although ti nwo finds the fiel i egt this error in 'cli':
>>
>> > D:\>aconvert.py
>> > reading from d:\test\index.php
>> > Traceback (most recent call last):
>> > File "D:\aconvert.py", line 16, in <module>
>> > src_data = f.read()         # read contents of PHP file
>> > File "C:\Python32\lib\encodings\cp1253.py", line 23, in decode
>> > return codecs.charmap_decode(input,self.errors,decoding_table)[0]
>> > UnicodeDecodeError: 'charmap' codec can't decode byte 0x9f in position
>> > 321: char
>> > acter maps to <undefined>
>>
>> > Somethign with the damn encodings again!!
>>
>> Hmm, at one point in this thread you switched from Python 2.x to Python
>> 3.2. There are a lot of subtle and not so subtle differences between 2.x
>> and 3.x, and I recommend that you stick to one while you are still in
>> newbie mode.
>>
>> If you want to continue to use 3.x I recommend that you at least use the
>> stable 3.1 version.
>>
>> Now one change from Python 2 to 3 is that open(filename, "r") gives you a
>> beast that is unicode-aware and assumes that the file is encoded in utf-8
>> unless you tell it otherwise with open(..., encoding=whatever). So what
>> is the charset used for your index.php?
>>
>> Peter
> 
> 
> Yes yesterday i switched to Python 3.2 Peter.
> 
> When i open index.php within Notapad++ it says its in utf-8 without
> BOM and it contains inside exepect form english chars , greek cjhars
> as well fro printing.
> 
> The file was made by my client in dreamweaver.
> 
> So since its utf-8 what the problem of opening it?

Python says it's not, and I tend to believe it. You can open the file with

open(..., errors="replace")

but you will lose data (which is already garbled, anyway). 

Again: in the unlikely case that Python is causing your problem -- you do 
understand what an alpha version is?

Peter




More information about the Python-list mailing list