[Java lista] RTF dokumentum olvasása

Böszörményi Péter zmblevlist at gmail.com
2008. Sze. 10., Sze, 13:33:27 CEST


Lehet, hogy egy kicsit tulloves, de van egy JavaCC grammar file, ami
ugy nagyjabol felszedi az rtfet, esetleg azzal ki tudod banyaszni a
szoveget (https://javacc.dev.java.net/files/documents/17/2960/RTFParser.jj)

On 9/10/08, Verhás Péter <peter at verhas.com> wrote:
> Próbáltam egy RTF dokumentumból kiolvasni a szöveget (kód alant), de az
> ékezetes ő, meg ű betűk nem nagyon jönnek. Illetve kérdőjelek jönnek.
>
> Sajnos nincs jó napom (google), bár azt látom, hogy pár japán megoldotta
> ezt a problémát.
>
> Mi a megoldás?
>
> Péter
>
>
> package com.verhas.jrtf;
>
> import java.io.InputStream;
>
> import javax.swing.JEditorPane;
> import javax.swing.text.rtf.RTFEditorKit;
>
> import junit.framework.TestCase;
>
> public class TestReadSimpleRtfFile extends TestCase {
>
>     protected void setUp() {
>
>     }
>
>     protected void tearDown() {
>
>     }
>
>
>     private InputStream getRtfResourceFile(){
>         final String rtfResourceFileName = "com/verhas/jrtf/test.rtf";
>         return
> getClass().getClassLoader().getResourceAsStream(rtfResourceFileName);
>     }
>
>     public void testSwingReadFile(){
>         InputStream is = getRtfResourceFile();
>         try{
>             JEditorPane editor = new JEditorPane();
>             RTFEditorKit kit = new RTFEditorKit();
>             editor.setEditorKit(kit);
>             javax.swing.text.Document doc = editor.getDocument();
>             kit.read(is, doc, 0);
>             String s = doc.getText(0, doc.getLength());
>             System.out.println("sAttachment\n >\n" + s + "\n");
>             }catch(Exception ex){
>             ex.printStackTrace();
>             }
>     }
> }
>
>
>
>
>
> {\rtf1\ansi\ansicpg1250\deff1\adeflang1025
> {\fonttbl{\f0\froman\fprq2\fcharset238 Times New
> Roman;}{\f1\fnil\fprq0\fcharset238 Thorndale{\*\falt Times New
> Roman};}{\f2\fswiss\fprq2\fcharset238 Arial;}{\f3\fnil\fprq0\fcharset238
> Thorndale{\*\falt Times New Roman};}{\f4\fnil\fprq2\fcharset238 Lucida
> Sans Unicode;}{\f5\fnil\fprq2\fcharset238
> Tahoma;}{\f6\fnil\fprq0\fcharset238 Tahoma;}}
> {\colortbl;\red0\green0\blue0;\red128\green128\blue128;}
> {\stylesheet{\s1\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\af5\afs24\lang255\ltrch\dbch\af4\langfe255\hich\f1\fs24\lang1038\loch\f1\fs24\lang1038\snext1
> Normal;}
> {\s2\sb240\sa120\keepn\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\afs28\lang255\ltrch\dbch\langfe255\hich\f2\fs28\lang1038\loch\f2\fs28\lang1038\sbasedon1\snext3
> Heading;}
> {\s3\sa120\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\af5\afs24\lang255\ltrch\dbch\af4\langfe255\hich\f1\fs24\lang1038\loch\f1\fs24\lang1038\sbasedon1\snext3
> Body Text;}
> {\s4\sa120\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\af6\afs24\lang255\ltrch\dbch\af4\langfe255\hich\fs24\lang1038\loch\fs24\lang1038\sbasedon3\snext4
> List;}
> {\s5\sb120\sa120\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\af6\afs24\lang255\ai\ltrch\dbch\af4\langfe255\hich\fs24\lang1038\i\loch\fs24\lang1038\i\sbasedon1\snext5
> caption;}
> {\s6\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\af6\afs24\lang255\ltrch\dbch\af4\langfe255\hich\fs24\lang1038\loch\fs24\lang1038\sbasedon1\snext6
> Index;}
> }
> {\info{\author Peter
> Verhas}{\creatim\yr2008\mo9\dy10\hr12\min8}{\revtim\yr0\mo0\dy0\hr0\min0}{\printim\yr0\mo0\dy0\hr0\min0}{\comment
> StarWriter}{\vern6800}}\deftab709
> {\*\pgdsctbl
> {\pgdsc0\pgdscuse195\pgwsxn11905\pghsxn16837\marglsxn1134\margrsxn1134\margtsxn1134\margbsxn1134\pgdscnxt0
> Standard;}}
> \paperh16837\paperw11905\margl1134\margr1134\margt1134\margb1134\sectd\sbknone\pgwsxn11905\pghsxn16837\marglsxn1134\margrsxn1134\margtsxn1134\margbsxn1134\ftnbj\ftnstart1\ftnrstcont\ftnnar\aenddoc\aftnrstcont\aftnstart1\aftnnrlc
> \pard\plain
> \ltrpar\s1\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\af5\afs24\lang255\ltrch\dbch\af4\langfe255\hich\f1\fs24\lang1038\loch\f1\fs24\lang1038
> {\rtlch \ltrch\loch\f1\fs24\lang1038\i0\b0 \'c1RV\'cdZT\'dbR\'d5
> T\'dcK\'d6RF\'daR\'d3G\'c9P}
> \par \pard\plain
> \ltrpar\s1\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\af5\afs24\lang255\ltrch\dbch\af4\langfe255\hich\f1\fs24\lang1038\loch\f1\fs24\lang1038
> {\rtlch \ltrch\loch\f1\fs24\lang1038\i0\b0 \'e1rv\'edzt\'fbr\'f5
> t\'fck\'f6rf\'far\'f3g\'e9p}
> \par }
>


További információk a(z) Javalist levelezőlistáról