Just as molecules are built from atoms, text is built from characters. And like an atom which can be divided into electrons, protons, and neutrons (among others), characters also have an internal structure. Just as with atoms, the internal structure of characters can usually be ignored, and you may want to skip the following section if you are a beginner. Sometimes however, knowledge of the internal structure of characters can be very helpful.
On modern computer systems there are over 65,000 possible characters. Each character has a number from 0 to 65,535. For example, the symbol for the letter A is represented by character number 65. The number value for each character is defined by an international standard called Unicode.
Panorama uses the Unicode values of characters when it compares two text items to see which is larger or smaller. Since the Unicode value of B (66) is greater than the Unicode value of A (65), the text item B is “larger” than A. However, the Unicode value of a (97) is greater than B (66), so the text item a is “larger” than B. You have to watch out for this problem whenever you compare text that is a mixture of upper and lower case.
Usually it’s not necessary to worry about the numeric value of a particular character—you can just think of it as a character. However, if you want to perform any kind of math on the character itself it is necessary to convert the character in to a number. For example you can add one to a character value to get the next character value (A ➛ B ➛ C etc.). Or you can calculate the number of characters between two characters.
Panorama has two special functions that allow you to work with character values directly. The asc( function converts a character to its Unicode numeric value. The chr( function converts an Unicode numeric value to the corresponding character.
The following example procedure asks the user to enter a range of letters, for example A-F. It uses the asc( function to convert the characters into the corresponding Unicode numeric values, then calculates the number of characters in the range.
local LetterRange,StartLetter,EndLetter,LetterCount
LetterRange=""
gettext "Enter character range:",LetterRange
StartLetter=LetterRange[1,1]
EndLetter=LetterRange[-1,-1]
LetterCount=abs(asc(EndLetter)-asc(StartLetter))
message LetterRange+": "+pattern(LetterCount+1,"# character~")
If the person enters A-F the procedure will display A-F: 6 characters.
The next example procedure is similar but actually displays a list of the characters in the range. It uses the chr( function to convert the numbers back into characters.
local LetterRange,StartLetter,EndLetter
local LetterCount,LetterBump,Letters
LetterRange=""
gettext "Enter character range:",LetterRange
StartLetter=asc(LetterRange[1,1])
EndLetter=asc(LetterRange[-1,-1])
LetterCount=EndLetter-StartLetter
LetterBump=LetterCount/abs(LetterCount)
Letters=""
loop
Letters=Letters+chr(StartLetter)
StartLetter=StartLetter+LetterBump
while StartLetter<>EndLetter
Letters=Letters+chr(StartLetter)
message LetterRange+": "+Letters
If the person enters A-F the procedure will display A-F: ABCDEF. If the person enters Z-U the procedure will display Z-U: ZYXWVU.
Warning: Don’t confuse the asc( and chr( functions with the val( and str( functions. The asc( and chr( functions convert single characters based on their ASCII values. The val( and str( functions convert entire text items based on the number the characters spell out. For example asc(“4”) is 52, because 52 is the Unicode value of the character “4.” On the other hand, val(“4”) is 4. Confused? In most ordinary applications you almost certainly want to use val( and str( unless you are sure you know what you are doing.
The Unicode system contains a number of characters that are normally invisible. In fact, every character with a value of 32 or lower is invisible. Normally you will not be concerned with invisible characters. However, there are three special invisible characters that do get a lot of use: space, carriage return, and tab.
The space character (Unicode value 32) is not quite invisible, because it does take up space. You can easily enter this value by pressing the Space Bar. In a formula you can enter a space directly
" "
or using the chr( function
chr(32)
The carriage return character is used to start a new line of text. This character has an Unicode value of 13. You can enter this value into a formula using the ¶ symbol (Option-7) or as chr(13)
, for example:
"first line"+¶+"second line
or:
"first line"+cr()+"second line
(Trivia question: why is this character called carriage return? In a few years probably no one will remember. In case you are already too young to remember, typewriters (and teletypes) used to place the paper on a carriage that moved back and forth as you typed. When you pressed the Return key the carriage would “return” back to the beginning of the line and also advance down to the next line, hence carriage return. In fact, on old manual typewriters this was accomplished with a lever, not a key.)
The tab character is usually not found inside data, but is often found in text files created by editors or word processors. The tab character has an Unicode value of 9. You
can enter this value into a formula using the ¬ symbol (Option-L) or as chr(9)
, for example:
"first column"+¬+"second column"+¬+"third column"
or:
"first column"+tab()+"second column"+tab()+"third column"
Here are functions that deal with invisible characters.
Before Unicode was invented, computers used a wide variety of encoding systems for text. Panorama includes functions for converting these encoding systems into Unicode, and from Unicode into other encoding systems. You’ll only need to use these in special circumstances, basically when you need to read data from an older computer system, or write data such that it can be read by an old computer system.
See Also
History
Version | Status | Notes |
10.0 | No Change | Carried over from Panorama 6.0 |