Fixing Windows Python 2.7 unicode issue with subprocess’s Popen.

TL;DR

Fixing Windows Python 2.7 unicode issue with subprocess.Popen.

Here is the python code fixing unicode issue

Details

What

Python 2.7 is plagued by bugs that won't be fixed, and you might want or need to provide a software for compatibility or legacy reason that actually works in python 2.7 under windows.

It was not possible to sent non-ascii chars on the command line via subprocess.Popen(..).

How

The following code leverage ctypes to call the CreateProcessW(..) function of the windows C API. This function should have been used in cPython 2.7 but wasn't, and this is the core reason why Python 2.7 on windows does not support unicode command line.

The code

This code was not thoroughly tested and I'm posting it on a gist so I can update it and people can comment if they run into any issues.

So here is the python code fixing unicode issue.

A test ?

As a test, I would suggest you to build a second python program that you would call with the previous new subprocess.Popen(..)

But as python 2.7 unicode support was really incomplete, you'll also need this recipe to read the arguments in unicode.

Here's the full code for the testing, with subprocess_fix module being the code in the Gist. And commandline_fix module being the recipe to decode the unicode command line with a final added sys.argv = win32_unicode_argv() statement.

The launcher (named test.py):

# -*- coding: utf-8 -*-

from subprocess import PIPE
from subprocess_fix import Popen


def indent(text, chars="  ", first=None):
    if first:
        first_line = text.split("\n")[0]
        rest = '\n'.join(text.split("\n")[1:])
        return '\n'.join([(first + first_line).rstrip(),
                          indent(rest, chars=chars)])
    return '\n'.join([(chars + line).rstrip()
                      for line in text.split('\n')])


p = Popen(u"python reading.py ć", shell=True,
          stdin=PIPE, stdout=PIPE, stderr=PIPE)
out, err = p.communicate()

if p.returncode != 0:
    print("errlvl: %s" % p.returncode)

if err:
    print("stderr:\n%s" % indent(err, "  | "))
if out:
    print("stdout:\n%s" % indent(out, "  | "))

The launched (named reading.py):

# -*- coding: utf-8 -*-

import sys

import commandline_fix

print "command line received (repr): %r" % (sys.argv, )
print "command line received (str): %s" % (u" ".join(sys.argv), )

Verify that you are running windows and python 2.7 before launching this:

python test.py ć

Notice that depending on your active charset (you can check with chcp), the console might fail to display the character correctly. You might need then to do a chcp 65001.