[XML-SIG] XBEL Bug Fixes

Thomas B. Passin tpassin@home.com
Wed, 13 Feb 2002 09:05:55 -0500

I had occasion to use XBEL for an experiment, and I found several bugs in
msie_parse, the script that turns IE bookmarks into the XBEL format.  I
don't know if any of the other format parsers have similar bugs or not.  I
include fixes for them.  I used the files from pyxml 0.7, but It doesn't
look like the code has changed for some time.

I'm not sure who would be adding the fixes, or whether this should go to
Sourceforge as a bug report, so I'm reporting them here.  Hope it will be

1) The code starting at line 27 that walks the bookmark directory tree fails
to leave each directory after it has been visited.  This causes massively
incorrect nesting of directories.  Successive folders tend to get nested in
previous ones, which is not what is wanted.

Here is a version that works as expected - one line is added:

    def __walk(self, subpath=[]):
        # traverse favourites folder
        path = os.path.join(self.path, string.join(subpath, os.sep))
        for file in os.listdir(path):
            fullname = os.path.join(path, file)
            if os.path.isdir(fullname):
                self.__walk(subpath + [file])
                self.bms.leave_folder()  ###### Add this line ######
                url = self.__geturl(fullname)
                if url:

2) The format of IE bookmark files (also called "shortcuts" for some reason)
has changed - there is a new "[DEFAULT]" section at the start of the
bookmark - and the parser cannot find the url in many or most cases.
However, some bookmarks could still be in the old format, so you have to
check for both, or look for "[InternetShortcut" beond the first line.  The
code in question starts on line 41.  Here is a version that works, although
it makes no attempt to get the date modified  value which is now sometimes
included in the bookmark:

    def __geturl(self, file):
            fp = open(file)
            while 1:
                if not line:
                    return None
                if line=="[InternetShortcut]\n": ### If still old format
                    s = fp.readline()
                    if not s:
                    if s[:4] == "URL=":
                        return s[4:-1]
                elif line=="[DEFAULT]\n":  ### New format
                    s = fp.readline()
                    if not s:
                    if s[:8] == "BASEURL=":
                        return s[8:-1]
        except IOError:
            return None
        return None