The Windows Path to RegEx Wackiness

It all started when a cross-platform Python application, Blogofile, tried to use the standard regular expression module, re, to munge paths.

I was investigating a small Blogofile bug on Windows which involved the regular expression library operating on Windows paths.  Blogofile's test server was using re.sub to map requests to a subdirectory that would otherwise be directed to the base directory of the server.  This worked fine on Linux paths, but on Window's paths the match wasn't happening.

>>> print re.sub(r'C:\test', r'C:\test\_site', r'C:\test')
C:\test

Above, the text to be substituted doesn't seem to be found in the target string.

>>> print re.sub(re.escape(r'C:\test'), r'C:\test\_site', r'C:\test')
C: est\_site

This time I quote the text that I want to substitute.  The match is made but the re module appears to helpfully interpret the "\t" as a tab.  I'm glad I happened to use a directory that starts with a "t," or I wouldn't have noticed this problem.

>>> print re.sub(re.escape(r'C:\test'), re.escape(r'C:\test\_site'), r'C:\test')
C\:\test\\_site

The above illustrates that using the re module to escape the replacement string doesn't work, because some of what it escapes isn't interpreted by the re module.

>>> print re.sub(re.escape(r'C:\test'), '\\\\'.join(r'C:\test\_site'.split('\\')), r'C:\test')
C:\test\_site

The above works, but is really just an elaborate way of doing the following string replace.

>>> print r'C:\test'.replace(r'C:\test', r'C:\test\_site')
C:\test\_site


But if I'm going to use str.replace, I might as well just use it instead of re.sub to do the overall substitution and simplify things tremendously.

>>> print r'C:\test'.replace(r'C:\test', r'C:\test\_site')
C:\test\_site
comments powered by Disqus