Word Processing Formats
Mark Terry
mark at abernackie.com
Thu Jun 26 04:53:05 PDT 2008
On Jun 26, 2008, at 12:58 AM, James Rea wrote:
>> I thought this approach might work, but I'm stymied by a bug/feature.
>> Selecting characters in a WPSO in a loop to identify their formatting
>> eventually gives the error "Too many nested subroutiines." Where are
>> the nested subroutines here??
>>
>> local CNT
>> CNT=0
>> loop
>> CNT=CNT+1
>> ActiveSuperObject "SetSelection",CNT-1,CNT
>> until CNT=200
>>
>> If this worked, I could go after more data, but it won't go much over
>> 100 characters. Getting more data drops that down to 30 or 40. Can
>> this be fixed?
>
> I'm suprised by your results, I was going to suggest a technique like
> this. I haven't found it yet, but at some point I created a procedure
> that used this technique to convert WPSO text into formatted HTML
> (there
> were some limitations on what formatting would be converted, but it
> did
> the basics - font, size, color, bold, italic, centered, right
> justification). I definitely had it working with over 100 characters.
>
> Jim Rea
> President, ProVUE Development
Thanks, Jim. Hopefully, someone will test the simple proc above, to
see if they have the same problem. All you need to do is make a
selection in a WPSO (to make it active) and then run the proc. I'm
using the latest Pan5.5.
The original procedure was, of course, much more extensive, and was
working ok up to a point. The idea was to convert the formatted text
in a WPSO to a stream of 3-digit characters that could later be
converted into formatted text in any other format. In a loop, using
SuperObject statements, each character was examined for whatever
traits Pan could identify (font, size, justification, leading,
alignment, style, text color and background color. I was saving tabs
for later.) The results were put into a simple pipe-delimited array
(formatting array), and compared to the formatting array of the
previous character. If there was no change, the 3-digit ascii code was
added to the conversion stream. If there were a change, another 3-
digit code (500+ whichever occurrence of format change it was) was
inserted first, and then that new, changed formatting array was added
to a CR-delimited array of formatting arrays (all_formatting arrays)
to keep track of the changes. It was working fine, just not long
enough! ;) The hoped-for result was a conversion stream of 3-digit
characters that could be re-converted to text and formatting
instructions, perhaps using chunkfilter(. If the number represented by
each 3 digits was less than 499, it would be converted as ascii back
to text. Otherwise, it was a reference to the all_text formatting
array, that could tell you what the new formatting was to be.
Given this roadmap (conversion stream and array of format changes),
the next step would be to build the new formatting instructions for
the preferred format. I was going to try RTF, as it appeared at one
time to be doable. The challenge, as I recall, was dealing with the
potential complexity of RTF. I'd bet it might be simpler still with
XML, or one of the newer formats.
When and if this bug/feature/pilot error is resolved, I think we
should be able to come up with a few custom statements that could
significantly expand the utility of the WPSO. The loop approach was
fast enough for letters, etc. I'm thinking that converting to
ArrayFilter( or CharacterFilter( to build an executable procedure, as
Gary Yonaites has demonstrated, might be rmuch faster, still.
Meanwhile, back to the day job... ;-D
M
More information about the Qna
mailing list