Word Processing Formats

Mark Terry mark at abernackie.com
Thu Jun 26 04:53:05 PDT 2008


On Jun 26, 2008, at 12:58 AM, James Rea wrote:

>> I thought this approach might work, but I'm stymied by a bug/feature.
>> Selecting characters in a WPSO in a loop to identify their formatting
>> eventually gives the error "Too many nested subroutiines." Where are
>> the nested subroutines here??
>>
>> local CNT
>> CNT=0
>> loop
>>     CNT=CNT+1
>>     ActiveSuperObject "SetSelection",CNT-1,CNT
>> until CNT=200
>>
>> If this worked, I could go after more data, but it won't go much over
>> 100 characters. Getting more data drops that down to 30 or 40. Can
>> this be fixed?
>
> I'm suprised by your results, I was going to suggest a technique like
> this. I haven't found it yet, but at some point I created a procedure
> that used this technique to convert WPSO text into formatted HTML  
> (there
> were some limitations on what formatting would be converted, but it  
> did
> the basics - font, size, color, bold, italic, centered, right
> justification). I definitely had it working with over 100 characters.
>
> Jim Rea
> President, ProVUE Development

Thanks, Jim. Hopefully, someone will test the simple proc above, to  
see if they have the same problem. All you need to do is make a  
selection in a WPSO (to make it active) and then run the proc. I'm  
using the latest Pan5.5.

The original procedure was, of course, much more extensive, and was  
working ok up to a point. The idea was to convert the formatted text  
in a WPSO to a stream of 3-digit characters that could later be  
converted into formatted text in any other format. In a loop, using  
SuperObject statements, each character was examined for whatever  
traits Pan could identify (font, size, justification, leading,  
alignment, style, text color and background color. I was saving tabs  
for later.) The results were put into a simple pipe-delimited array  
(formatting array), and compared to the formatting array of the  
previous character. If there was no change, the 3-digit ascii code was  
added to the conversion stream. If there were a change, another 3- 
digit code (500+ whichever occurrence of format change it was) was  
inserted first, and then that new, changed formatting array was added  
to a CR-delimited array of formatting arrays (all_formatting arrays)  
to keep track of the changes. It was working fine, just not long  
enough! ;) The hoped-for result was a conversion stream of 3-digit  
characters that could be re-converted to text and formatting  
instructions, perhaps using chunkfilter(. If the number represented by  
each 3 digits was less than 499, it would be converted as ascii back  
to text. Otherwise, it was a reference to the all_text formatting  
array, that could tell you what the new formatting was to be.

Given this roadmap (conversion stream and array of format changes),  
the next step would be to build the new formatting instructions for  
the preferred format. I was going to try RTF, as it appeared at one  
time to be doable. The challenge, as I recall, was dealing with the  
potential complexity of RTF. I'd bet it might be simpler still with  
XML, or one of the newer formats.

When and if this bug/feature/pilot error is resolved, I think we  
should be able to come up with a few custom statements that could  
significantly expand the utility of the WPSO. The loop approach was  
fast enough for letters, etc. I'm thinking that converting to  
ArrayFilter( or CharacterFilter( to build an executable procedure, as  
Gary Yonaites has demonstrated, might be rmuch faster, still.

Meanwhile, back to the day job... ;-D

M


More information about the Qna mailing list