Задать вопрос

Тел: +7 965 3737 888





HTML to text filter

<p>This filter converts HTML to nicely-formatted text using the text-browser W3M. I use this for constructing e-mail bodies, since it means I don't have to have two templates, one HTML and one plain-text, for each detailed e-mail I want to send. Besides the obvious maintenance benefits, this is nice because Django's templating system isn't well-suited to plain-text where whitespace and line-breaks are significant.</p>
<p>I chose W3M because it renders tables nicely and can take in HTML from STDIN (which Lynx can't do). An alternative is ELinks; to use it, change "cmd" to the following: elinks -force-html -stdin -dump -no-home</p>

Вопрос полезен? Да0/Нет0

Ответы (5):

Ответ полезен? Да0/Нет0

ProWeb365 is a results & relationship-driven company. We only bill you for results that we successfully deliver. Our Minneapolis web design firm provides the most reliable and professional web services, and strives to go above and beyond your expectations. We can help place your business website on the most important real estate on the Internet: Google’s page-one. Contact our Minneapolis internet marketing company at 612-590-8080 and let us prepare your website for Online marketing success.

Ответ полезен? Да0/Нет0

Great idea. I'm trying this and am getting:

File "C:\Python25\lib\subprocess.py", line 885, in _communicate
IOError: [Errno 22] Invalid argument

After a little debugging, I haven't found a solution. Any ideas?

Ответ полезен? Да0/Нет0

cmd = "lynx -force_html -stdin -dump" works for me.

Right you are, but I don't see any reason to use Lynx. Both W3M and ELinks render HTML much better than Lynx.

What's wrong with ...

The etree.tostring() method is just a serialization method. It is not an HTML rendering engine, which is what W3M and ELinks provide.

Also, html2text

That script would be appropriate for very simple HTML, but ~400 lines of Python can't replace a complete rendering engine.

Ответ полезен? Да0/Нет0

What's wrong with

from lxml import etree

def convert(text):
    return etree.tostring(
        encoding='utf8', method='text'

Ответ полезен? Да0/Нет0

cmd = "lynx -force_html -stdin -dump" works for me.