UnicodeWriter and UnicodeDictWriter - write unicode strings out to Excel compatible CSV files
Exporting unicode data to Excel in a CSV file is surprisingly difficult. After much experimentation, it turns out the magic combination is UTF-16, a byte order mark and tab-delimiters. This snippet provides two classes - UnicodeWriter and UnicodeDictWriter - which can be used to output Excel-compatible CSV.
by speciying the encoding as "utf-16-le" rather then "utf-16",
and manually writing the BOM [0xff, 0xfe] before to the response before calling writerows(),
this works for me on 2007 with sp3, no garbage characters.
the problem with the orig version is that the encode call to utf-16
prepends the BOM to each and every row, whhen it should only appear
at the start of the file, hence the garbage characters.
Note that my solution would still produce an excess BOM if the file was reopened in a new UnicodeWriter object, so you might want to check if there's a BOM at the start of the file.
Something like (untested):
# read first two bytes in file
fpos = f.tell()
fstartbytes = f.read(2)
# Write BOM if needed
if fstartbytes != codecs.BOM_UTF16:
Writing the BOM by hand and stripping it from each row seems to work:
def __init__(self, f, dialect=csv.excel_tab, encoding="utf-16", **kwds):
# Redirect output to a queue
self.queue = StringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
# Force BOM
self.encoding = encoding
def writerow(self, row):
# Modified from original: now using unicode(s) to deal with e.g. ints
self.writer.writerow([unicode(s).encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = data.encode(self.encoding)
# strip BOM
if self.encoding == "utf-16":
data = data[2:]
# write to the target stream
# empty queue