[linux] attachmenty html=>txt

Juraj Bednar juraj na bednar.sk
Neděle Květen 27 22:40:17 CEST 2001


Ahoj,
> > ako by som mohol konvertovat html attachmenty na normalny text ?
> > Najlepsie keby sa to dalo spustit z procmailu a editovalo by to priamo
> > mejl (ziadne ukladanie attachmentov na disk).
> 
> 	Normalny HTML skonvertujes lahko: lynx -dump <URI>
> 
> 	Problemom je vybrat ten attachement z e-mailu. Na toto by sa dal s
> upechom pouzit nejaky Perlovsky script (s pouzitim prislusnych modulov), ale
> ked uz spravis v Perle vytahovanie att tak si lahko spravis aj ich prevod do
> textovej podoby.
> 
> 	Ked nakodujes a bude to GNU GPL, tak to hod pls niekam na Web a
> posli linku. Vdaka.

z mojho ~/.mailcap


# ------------------------MOJ MAILCAP-------------------------------
# Htmls
text/html;                      links %s; nametemplate=%s.html
text/html;                      links -dump %s; nametemplate=%s.html; copiousoutput
# The following Microsoft application MIME attachments are viewed from
# the attachment menu using QuickView Plus for UNIX. 
#
application/msword;             ~/.bin/word2text %s; copiousoutput
application/vnd.msword;         ~/.bin/word2text %s; copiousoutput
application/msword;             ~/.bin/word2text %s
application/vnd.msword;         ~/.bin/word2text %s
#
application/excel;              ~/.bin/excel2text %s; copiousoutput
application/msexcel;            ~/.bin/excel2text %s; copiousoutput
application/vnd.ms-excel;       ~/.bin/excel2text %s; copiousoutput
application/x-excel;            ~/.bin/excel2text %s; copiousoutput
application/x-msexcel;          ~/.bin/excel2text %s; copiousoutput
application/ms-Excel;           ~/.bin/excel2text %s; copiousoutput
application/excel;              ~/.bin/excel2text %s
application/msexcel;            ~/.bin/excel2text %s
application/vnd.ms-excel;       ~/.bin/excel2text %s
application/x-excel;            ~/.bin/excel2text %s
application/x-msexcel;          ~/.bin/excel2text %s
application/ms-Excel;           ~/.bin/excel2text %s

moj .bin/word2text:

wvHtml -c iso-8859-2 "$1" 2> /dev/null |
perl -0777 -p -e '
	s|<img .*?>||gs;		# Delete img tags.
' |
w3m -dump -T text/html |
perl -p -e '
	s/\n\s*\n/\n\n/gs;		# Delete extra whitespace
					# between lines.
	s/\xa0/ /gs;			# Change A0 spaces to ASCII
					# spaces.
'


ja mam problem s diakritikou takto skonvertenych suborov. W3m zjavne
nezvlada zobrazit to so spravnym charsetom. Pouzivam samozrejme mutt.
Ak niekto pride na riesenie tohto problemu, tak by to bolo super.
Ak chcete najst popis tohto vsetkeho carachu, dajte do google hladat
nieco z toho mojho .mailcap, prva stranka bude ta, z ktorej som to
zobral vratane vsetkych skriptov. Aj excelovske subory su velmi
pozeratelne az na charsety :(.

Z mojho .muttrc:


auto_view text/html
auto_view application/excel
auto_view application/ms-Excel
auto_view application/msexcel
auto_view application/msword
auto_view application/vnd.ms-excel
auto_view application/vnd.msword
auto_view application/x-excel
auto_view application/x-msexcel
set mailcap_path=~/.mailcap


      Juraj.




Další informace o konferenci linux