Fixing Windows Python 2.7 unicode issue with subprocess’s Popen.

TL;DR

Fixing Windows Python 2.7 unicode issue with subprocess.Popen being unable to properly send command lines to the system.

Here is the python code fixing unicode issue

Details

What

Python 2.7 is plagued by bugs that won't be fixed.

One of them is that you can't use unicode chars on sub process call command line via subprocess.Popen(..) in windows platform.

This code fixes Popen(..) in python 2.7 under windows. This allows your code simply work on python 2.7 with little changes.

How

The following code leverage ctypes to call the CreateProcessW(..) function of the windows C API. This function should have been used in cPython 2.7 but wasn't. This is the core reason why Python 2.7 on windows does not support unicode command line.

This is how Python 3.0+ works.

The code

This code was not thoroughly tested and I'm posting it on a gist so I can update it and people can comment if they run into any issues.

So here is the python code fixing unicode issue.

A test ?

As a test, we'll create a small test.py python file that you can call with the new subprocess.Popen(..) provided in the Gist.

To be complete, we'll need another fix of python 2.7 unicode support: recipe to read the arguments in unicode.

Here's the full code for the testing:

  • with subprocess_fix module being the code in the Gist. Correctly sending the full unicode command line to the system. This is thus used on the calling side.
  • And commandline_fix module being the recipe to decode the current programs unicode command line with a final added sys.argv = win32_unicode_argv() statement. This is thus used on the called side.

The caller side (named test.py):

# -*- coding: utf-8 -*-

from subprocess import PIPE
from subprocess_fix import Popen

def indent(text, chars=" ", first=None):
    if first:
        first_line = text.split("\n")[0]
        rest = '\n'.join(text.split("\n")[1:])
        return '\n'.join([(first + first_line).rstrip(),
                          indent(rest, chars=chars)])
    return '\n'.join([(chars + line).rstrip()
                      for line in text.split('\n')])

p = Popen(u"python reading.py ć", shell=True,
stdin=PIPE, stdout=PIPE, stderr=PIPE)
out, err = p.communicate()

if p.returncode != 0:
    print("errlvl: %s" % p.returncode)

if err:
    print("stderr:\n%s" % indent(err, " | "))
if out:
    print("stdout:\n%s" % indent(out, " | "))

The called side (named reading.py):

# -*- coding: utf-8 -*-

import sys

import commandline_fix

print "command line received (repr): %r" % (sys.argv, )
print "command line received (str): %s" % (u" ".join(sys.argv), )

test.py uses the fixed Popen(..) to call reading.py in a subprocess and it specifies a unicode character in the command line arguments. reading.py will simply read the unicode command line and print it to the console.

To run the test:

python test.py

Notice that depending on your active charset (you can check with chcp), the console might fail to display the character correctly. You might need then to do a chcp 65001.