Discussion:
Stripping html using mutt
(too old to reply)
bob prohaska
2021-10-10 01:44:22 UTC
Permalink
I use mutt via ssh and neither need nor want MIME enhancements,
just the text. Can mutt display the text portion of the message
alone? If the text is of interest, I can always go back for the
formatting and MIME enhancements. It's common these days to get
a few words of meaningful message buried in kilobytes of HTML.

Thanks for reading, and any suggestions.

bob prohaska
Roger Bell_West
2021-10-10 03:18:39 UTC
Permalink
Post by bob prohaska
I use mutt via ssh and neither need nor want MIME enhancements,
just the text. Can mutt display the text portion of the message
alone? If the text is of interest, I can always go back for the
formatting and MIME enhancements. It's common these days to get
a few words of meaningful message buried in kilobytes of HTML.
This sounds like what mutt does already: display the plain text, let
you know the other parts are there. If you want the useful content out
of an HTML message,

auto_view text/html

will use your mailcap (and a text-mode web browser such as elinks) to
display HTML inline as though it were useful text. Searching for
auto_view in the manual should be helpful.
Eric Pozharski
2021-10-12 07:39:28 UTC
Permalink
Post by Roger Bell_West
I use mutt via ssh and neither need nor want MIME enhancements, just
the text. Can mutt display the text portion of the message alone? If
the text is of interest, I can always go back for the formatting and
MIME enhancements. It's common these days to get a few words of
meaningful message buried in kilobytes of HTML.
This sounds like what mutt does already: display the plain text, let
you know the other parts are there. If you want the useful content out
of an HTML message,
auto_view text/html
will use your mailcap (and a text-mode web browser such as elinks) to
display HTML inline as though it were useful text. Searching for
auto_view in the manual should be helpful.
Also 'alternative_order' might be needed (unfortunately, this setting is
somewhat vaguely documented, and I'm not bothered to find out what are
defaults). Or, read whole story in the manual, search for "MIME
Multipart/Alternative".
--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom
bob prohaska
2021-10-13 01:01:27 UTC
Permalink
Post by Eric Pozharski
Post by Roger Bell_West
I use mutt via ssh and neither need nor want MIME enhancements, just
the text. Can mutt display the text portion of the message alone? If
the text is of interest, I can always go back for the formatting and
MIME enhancements. It's common these days to get a few words of
meaningful message buried in kilobytes of HTML.
This sounds like what mutt does already: display the plain text, let
you know the other parts are there. If you want the useful content out
of an HTML message,
auto_view text/html
will use your mailcap (and a text-mode web browser such as elinks) to
display HTML inline as though it were useful text. Searching for
auto_view in the manual should be helpful.
Also 'alternative_order' might be needed (unfortunately, this setting is
somewhat vaguely documented, and I'm not bothered to find out what are
defaults). Or, read whole story in the manual, search for "MIME
Multipart/Alternative".
I'm now reduced to reading the mutt manual 8-)

I was hopeful there might be a switch in mutt that strips markup.
Invoking a proper html interpreter is more than I think I need.

Thanks for replying,

bob prohaska
bob prohaska
2021-10-13 15:59:17 UTC
Permalink
A bit of searching found these instructions for invoking lynx automatically:

https://blog.deadlypenguin.com/2009/04/21/mutt-and-lynx/

It seems to work, but acts automatically. The whole (and possibly futile)
point of my enterprise is to avoid involuntary invocation of additional
software while viewing untrusted email.

Is there some way to at least give myself a choice? I tried deleting
auto_view from the .muttrc line, but that triggered an error message.
Is there a command that prompts for permission?

Automatically stripping html would be ideal, followed by an option to
invoke an html viewer.

Thanks for reading!

bob prohaska
Rich
2021-10-13 16:53:06 UTC
Permalink
Post by bob prohaska
https://blog.deadlypenguin.com/2009/04/21/mutt-and-lynx/
It seems to work, but acts automatically. The whole (and possibly futile)
point of my enterprise is to avoid involuntary invocation of additional
software while viewing untrusted email.
Mutt contains neither an HTML viewer nor a HTML stripper. Mutt's
internal viewer simply shows the the message part you've chosen to view
as text data. If that message part is an HTML part, then you'd be
viewing the raw HTML (HTML being a superset of "text" data).
Post by bob prohaska
Is there some way to at least give myself a choice?
You can set auto view to the text/plain content part, which will give
you the plain text part of the message. Then you can use the "v"
command to see the mime parts, and selectively view individual parts as
desired.
Post by bob prohaska
I tried deleting auto_view from the .muttrc line, but that triggered
an error message. Is there a command that prompts for permission?
I do not know of one built in to Mutt, but you could setup a bash
script that is invoked by mutt, that asks permission (look up the
"read" command and the -p option thereto) and then either invokes the
html to text conversion program (lynx et al.) or does not invoke it,
based upon your answer to the prompt.
Post by bob prohaska
Automatically stripping html would be ideal, followed by an option to
invoke an html viewer.
Note that "automatically stripping" would itself involve "involuntary
invocation of additional software while viewing untrusted email",
violating your wish not to do so.
bob prohaska
2021-10-14 00:55:07 UTC
Permalink
Post by Rich
Note that "automatically stripping" would itself involve "involuntary
invocation of additional software while viewing untrusted email",
violating your wish not to do so.
Hoist by my own petard 8-)

Can lynx be invoked from the view menu after selecting the subpart?

The idea would be to view everything as plain text, then back up and
apply lynx to the selected sub-part if it seems worthwhile.

I can start lynx from the view menu, but it is oblivious to the
selected subpart.

Thanks for your patience!

bob prohaska
Eric Pozharski
2021-10-14 10:51:03 UTC
Permalink
*SKIP*
Post by bob prohaska
Can lynx be invoked from the view menu after selecting the subpart?
Yes.
Post by bob prohaska
The idea would be to view everything as plain text, then back up and
apply lynx to the selected sub-part if it seems worthwhile.
Yes, see 'alternative_order'.
Post by bob prohaska
I can start lynx from the view menu, but it is oblivious to the
selected subpart.
This description isn't clear, however it (still) suggests your mailcap
setup isn't in desired state. I just found out The Mailcap Mechanism is
fscked up with (unknown yet) additions on part of (unknown yet)
distribution. That poses Teh Question: can we stop bitching around and
start diagnosing?

*CUT*
--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom
Eric Pozharski
2021-10-13 10:41:47 UTC
Permalink
Post by bob prohaska
Post by Eric Pozharski
Post by Roger Bell_West
I use mutt via ssh and neither need nor want MIME enhancements, just
the text. Can mutt display the text portion of the message alone?
*SKIP*
Post by bob prohaska
Post by Eric Pozharski
Post by Roger Bell_West
This sounds like what mutt does already: display the plain text, let
you know the other parts are there. If you want the useful content out
of an HTML message,
auto_view text/html
will use your mailcap (and a text-mode web browser such as elinks) to
display HTML inline as though it were useful text. Searching for
auto_view in the manual should be helpful.
Also 'alternative_order' might be needed
*SKIP*
Post by bob prohaska
Post by Eric Pozharski
Or, read whole story in the manual, search for "MIME
Multipart/Alternative".
I'm now reduced to reading the mutt manual 8-)
Absolutely not. What you should read is mimeview(1), mailcap(5), and
(darn, I've totally missed it ten years ago) mailcap.order(5).
Post by bob prohaska
I was hopeful there might be a switch in mutt that strips markup.
If mutt would begin have such switch that would be a good sign to start
looking for alternatives.
Post by bob prohaska
Invoking a proper html interpreter is more than I think I need.
Web-designers disagree that lynx is proper html interpreter, I guess.

*CUT*
--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom
Eike Rathke
2021-10-13 23:25:20 UTC
Permalink
Post by bob prohaska
I use mutt via ssh and neither need nor want MIME enhancements,
just the text. Can mutt display the text portion of the message
alone?
Yes it can. Note though that for mixed multipart messages often the
text/plain part does not match the text/html part, especially in mails
from shitty shops and "enterprise grade" mail systems. So it may be
desirable to be able to choose which.

In your muttrc have

# use mailcap entry for defined types
unset implicit_autoview
unauto_view *
auto_view text/html
alternative_order text/plain text text/html

and in ~/.mailcap have

text/html; /usr/bin/elinks -localhost 1 -no-connect 1 -force-html -dump %s; copiousoutput; description=HTML Text; nametemplate=%s.html

(all on one line).

Install the elinks package. The muttrc alternative_order determines
which part is preferably displayed. The mailcap entry produces a textual
view of the text/html part if there is one present and that then is
displayed by mutt. In the index view or while viewing a message you can
still press 'v' and from the multiparts select either the text/plain or
text/html part to view.

Eike
--
OpenPGP/GnuPG encrypted mail preferred in all private communication.
GPG key 0x6A6CD5B765632D3A - 2265 D7F3 A7B0 95CC 3918 630B 6A6C D5B7 6563 2D3A
Use LibreOffice! https://www.libreoffice.org/
Eike Rathke
2021-10-13 23:27:37 UTC
Permalink
Post by Eike Rathke
and in ~/.mailcap have
text/html; /usr/bin/elinks ...
Install the elinks package.
Oh and btw, using elinks here because it has a decent html table
handling, which lynx does not have at all.

Eike
--
OpenPGP/GnuPG encrypted mail preferred in all private communication.
GPG key 0x6A6CD5B765632D3A - 2265 D7F3 A7B0 95CC 3918 630B 6A6C D5B7 6563 2D3A
Use LibreOffice! https://www.libreoffice.org/
bob prohaska
2021-10-15 01:21:18 UTC
Permalink
Post by Eike Rathke
Post by bob prohaska
I use mutt via ssh and neither need nor want MIME enhancements,
just the text. Can mutt display the text portion of the message
alone?
Yes it can. Note though that for mixed multipart messages often the
text/plain part does not match the text/html part, especially in mails
from shitty shops and "enterprise grade" mail systems. So it may be
desirable to be able to choose which.
In your muttrc have
# use mailcap entry for defined types
unset implicit_autoview
unauto_view *
auto_view text/html
alternative_order text/plain text text/html
and in ~/.mailcap have
text/html; /usr/bin/elinks -localhost 1 -no-connect 1 -force-html -dump %s; copiousoutput; description=HTML Text; nametemplate=%s.html
(all on one line).
This combination seems to work nicely. If I just select the whole
message and hit return, mutt displays the plain text. If I use v
to list the attachments, select text/html and hit return, the
browser fires up and shows me the formatted text. That's a bit
nicer than I was originally looking for.
Post by Eike Rathke
Install the elinks package. The muttrc alternative_order determines
which part is preferably displayed. The mailcap entry produces a textual
view of the text/html part if there is one present and that then is
displayed by mutt. In the index view or while viewing a message you can
still press 'v' and from the multiparts select either the text/plain or
text/html part to view.
elinks is turning out to be a problem. It built and installed without
complaint, but doesn't run correctly. This is on a Raspberry Pi2B running
FreeBSD 12.2. The ports tree is stale, I'll update it and try again later.
For now lynx is good enough.

Thank you very much!

bob prohaska
Ant
2021-10-15 10:45:12 UTC
Permalink
Post by bob prohaska
Post by Eike Rathke
Post by bob prohaska
I use mutt via ssh and neither need nor want MIME enhancements,
just the text. Can mutt display the text portion of the message
alone?
Yes it can. Note though that for mixed multipart messages often the
text/plain part does not match the text/html part, especially in mails
from shitty shops and "enterprise grade" mail systems. So it may be
desirable to be able to choose which.
In your muttrc have
# use mailcap entry for defined types
unset implicit_autoview
unauto_view *
auto_view text/html
alternative_order text/plain text text/html
and in ~/.mailcap have
text/html; /usr/bin/elinks -localhost 1 -no-connect 1 -force-html -dump %s; copiousoutput; description=HTML Text; nametemplate=%s.html
(all on one line).
This combination seems to work nicely. If I just select the whole
message and hit return, mutt displays the plain text. If I use v
to list the attachments, select text/html and hit return, the
browser fires up and shows me the formatted text. That's a bit
nicer than I was originally looking for.
Post by Eike Rathke
Install the elinks package. The muttrc alternative_order determines
which part is preferably displayed. The mailcap entry produces a textual
view of the text/html part if there is one present and that then is
displayed by mutt. In the index view or while viewing a message you can
still press 'v' and from the multiparts select either the text/plain or
text/html part to view.
elinks is turning out to be a problem. It built and installed without
complaint, but doesn't run correctly. This is on a Raspberry Pi2B running
FreeBSD 12.2. The ports tree is stale, I'll update it and try again later.
For now lynx is good enough.
Bob, try Links. eLinks is based on it. :)
--
Doyers! :D So many brokenesses, oldnesses, leaks, illnesses, videos, spams, issues, software updates, games, sins, tiredness, busyness, etc. Dang colony life! D:
Note: A fixed width font (Courier, Monospace, etc.) is required to see this signature correctly.
/\___/\ Ant(Dude) @ http://aqfl.net & http://antfarm.home.dhs.org.
/ /\ /\ \ Please nuke ANT if replying by e-mail.
| |o o| |
\ _ /
( )
bob prohaska
2021-10-16 02:01:38 UTC
Permalink
Post by Ant
Post by bob prohaska
For now lynx is good enough.
Bob, try Links. eLinks is based on it. :)
It's in the FreeBSD ports collection, so that should be easy.

A browser is really too capable for my purposes. Browsers, AIUI,
can spawn subordinate programs on the user's behalf, which I'd
like to avoid.

There is a port called html2text, which I know nothing about.
If true to its name, that might come closer to scraping off
the tags so I can see what the email tries to do, without it
being able to actually make good on the goal.

This thread has taught me the essentials, which turn out to be
rather arcane. Now I have to decide just how paranoid to be
about unsolicited email.

Thanks to all who's educated me!

bob prohaska
Jorgen Grahn
2021-10-16 20:13:39 UTC
Permalink
Post by bob prohaska
Post by Ant
Post by bob prohaska
For now lynx is good enough.
Bob, try Links. eLinks is based on it. :)
It's in the FreeBSD ports collection, so that should be easy.
A browser is really too capable for my purposes. Browsers, AIUI,
can spawn subordinate programs on the user's behalf, which I'd
like to avoid.
Well, you need a secure browser which doesn't e.g. let mails "phone
home". I don't know which of the popular text-mode browsers (lynx,
links, elinks, w3m; any others?) do that well.
Post by bob prohaska
There is a port called html2text, which I know nothing about.
If true to its name, that might come closer to scraping off
the tags so I can see what the email tries to do, without it
being able to actually make good on the goal.
Most HTML mails would be quite unreadable if you just stripped off the
tags. But I see what you mean: a program which just takes a HTML file
and renders it as text is less likely to let the mail /do/ anything,
compared to a browser, even a browser in "dump" mode.

Personally I let mutt call w3m to render HTML mail, and hope it
protects my privacy. I don't look at the text version of the mail
(i.e. the other half of the multipart/alternative) since it's
usually useless. Then I curse w3m because it doesn't show the
links in the mail, and so I end up using mutt's view-text command
to search the HTML (and pages of useless CSS) for that link I
want. The whole thing is less than ideal, but if the sender
cannot bother to communicate well, perhaps it wasn't so important
that I read their mails after all.
Post by bob prohaska
This thread has taught me the essentials, which turn out to be
rather arcane. Now I have to decide just how paranoid to be
about unsolicited email.
Thanks to all who's educated me!
bob prohaska
/Jorgen
--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Eike Rathke
2021-10-16 23:30:40 UTC
Permalink
Post by Jorgen Grahn
Well, you need a secure browser which doesn't e.g. let mails "phone
home". I don't know which of the popular text-mode browsers (lynx,
links, elinks, w3m; any others?) do that well.
The elinks command line (should work for links as well) I posted
prevents exactly that with the options -localhost 1 -no-connect 1

Eike
--
OpenPGP/GnuPG encrypted mail preferred in all private communication.
GPG key 0x6A6CD5B765632D3A - 2265 D7F3 A7B0 95CC 3918 630B 6A6C D5B7 6563 2D3A
Use LibreOffice! https://www.libreoffice.org/
Continue reading on narkive:
Loading...