Date: 1 November 91 Message No: 033 To: TeX implementors and distributors From: Barbara Beeton Subject: Messages from DEK, part 2 Here is the second installment of DEK's September comments. ######################################################################## Incompatibility of positive/negative integer values Date: Tue, 20 Aug 91 16:58:09 MDT From: Nelson H.F. Beebe Subject: Perhaps a bug (design flaw) in TeX I think that the `feature' described below qualifies as a design flaw in TeX, and should be reported to Don Knuth if it has not come up before; I came across it while testing the statement on p. 178, l. -11, of @string{SV = "Spring{\-}er-Ver{\-}lag"} @Book{Seroul:beginners-tex, author = "Raymond Seroul and Silvio Levy", title = "A Beginner's Book of {\TeX}", publisher = SV, year = "1991", ISBN = "0-387-97562-4, 3-540-7562-4", note = "This is a translation and adaption by Silvio Levy of \cite{Seroul:tex}.", } about the maximum and minimum integers that TeX can handle. [The text has an error there; I've just completed a comprehensive errata list for it.] TeX does not permit the input of the most negative 32-bit integer (-2^{31}) on two's complement machines, but you can generate it by subtraction ((-2^{-31}+1) - 1) and output it correctly. This makes the statement at the top of p. 118 of the TeXbook a lie: registers are capable of containing the number -2147483648, NOT -2147483647, provided the host architecture has two's complement arithmetic, which is true for almost all machines today, and certainly the vast majority of TeX implementations. UNIVAC and CDC mainframes had one's complement arithmetic, but also had words of more than 32 bits, and as far as I am aware, only some calculators may use sign-magnitude representation; both of these systems have signed zeros, and extreme values that are equal in magnitude. I believe that a programming language, which TeX surely is, ought to be able to read what it can write. This asymmetry could be avoided if the code in section 445 of TeX: The Program accumulated the number as a negative value, then flipped the sign if necessary. Authors of textbooks, computer programs, and language run-time libraries, should not make this mistake, yet the error continues to be repeated. While TeX detects overflow from multiplication, it does not detect overflow from negation. Here is an example: This is TeX, C Version 3.0 % Try the value -(2^{31}) (most negative two's complement number) *\count0=-2147483648 ! Number too big. <*> \count0=-2147483648 % Input the value -(2^{31}-1) *\count0=-2147483647 *\showthe\count0 > -2147483647. % Now generate the most negative two's complement number *\advance \count0 by -1 *\showthe\count0 > -2147483648. % Now demonstrate that integer overflow is undetected on sign inversion: \count1=-\count0 *\showthe\count1 > -2147483648. % However, integer overflow is caught on multiplication: \multiply\count0 by \count0 ! Arithmetic overflow. <*> \multiply\count0 by \count0 ======================================================================== [ dek: TeX is _not_ a programming language in the general sense of supporting arithmetic at extreme values. There are lots of _dimen_ values that TeX can write but not read. Probably a flaw, but a permanent one. In general, arithmetic in TeX is not supposed to push the handling conditions; making that all work would cause significant performance penalty. ] ************************************************************************ File name overflow of string pool [ Since this report, I have seen a couple of other reports on this topic in the electronic discussion lists, mostly from Europe. While not a bug, it can certainly be a serious inconvenience. A couple of the reports have mentioned building nonstandard versions of TeX with a separate pool of file names; not good for compatibility. ] Date: Fri, 12 Jul 91 19:06 +0200 From: "Johannes L. Braams" Subject: Bug/misfeature in TeX? We have run into a problem with TeX. We have an application where we would like to \input about 2400 files. We can't do that because TeX runs out of string pool space. This application is rather important because it concerns the reports the lab has to make each quarter of a year. When I studied TeX the program to find out what happens when a file is being \input I found that the name of the file is stored in string pool. AND it never gets removed from the string pool (as far as I could find out). What I don't understand is why filenames are written to string pool in the first place. Isn't it possible to use some kind of stack or array mechanism to store filenames? It should then be possible to free the memory used to store a filename when the file gets closed and the filename is no longer needed. Do you know the answer or someone who does? Or is this a bug? I would rather call it a design flaw actually. Regards, Johannes Braams PTT Research Neher Laboratorium, P.O. box 421, 2260 AK Leidschendam, The Netherlands. Phone : +31 70 3325051 E-mail : JL_Braams@pttrnl.nl Fax : +31 70 3326477 ------- Date: Mon, 15 Jul 91 01:59:22 BST From: Chris Thompson Subject: Re: Bug/misfeature in TeX? I agree that it's a design flaw, not a bug. People do keep falling over it from time to time, though, so maybe Don could be asked to think about it again. I suspect, however, that there is no easy fix, for reasons I will explain below. Johannes asks why the names go in the string pool in the first place: the answer to that is "why not?"... it is the convenient place to keep more or less arbitrarily long strings. The space occupied by things added to the string pool can be reclaimed, provided it is done straight away, before other parts of TeX have been exercised that may add other strings (especially, control sequence names) to the pool. There are two types of file name to think about (neither of which are reclaimed at the moment, with one partial---and wrong---exception): 1. The 1, 2 or 3 strings generated by |scan_file_name|. Usually these are used in some implementation-dependant way to open a file, and maybe then as arguments to |*_make_name_string|, and are then never needed again; and all this would usually happen straight away. Exception: deferred (non-\immediate) \openout's. 2. The string generated by |*_make_name_string|. For things like the log and DVI files, this has to be kept for ever (printing them is almost the last thing TeX does). The interesting case, however, is \input. The string is printed (immediately), and then stored in the |name_field| of the current input stack entry. *Almost* the only thing TeX uses it for thereafter is as a number > 17 (to distinguish the case of an input level being an \input file (as opposed to terminal input or a \read level). The sole exception is in section 84 where it is used to deal with the "E" response to the error prompt: in distribution TeX as part of a message, but in practice as input to the implementation-dependant way of invoking an editor. (BEGIN ASIDE The ``partial and wrong exception'' is the code in section 537 introduced by change 283. |start_input| reclaims the space occupied by the result of |a_make_name_string|, if that is still the top string in the pool, and replaces it by the `name' part of the results of |scan_file_name|. I have had to undo this "fix" in my implementations: the *only* thing that the ``file name'' is needed for is as an argument to the editor, and it is an unwarranted assumption that a. The values of the `area' and `extension' parts of the name are irrelevant to that purpose, and b. The output of |a_make_name_string| doesn't contain extra information, available as a result of the opening process, that may also be relevant. END ASIDE) In theory the contents of the strings of type 2 for \input files could be kept on some sort of separate stack, as Johannes suggests (parallel to the |input_file| and |line_stack| arrays), but this would be quite convoluted and involve a lot of duplication of code. More plausible would be an attempt to reclaim them if they are still the top string in the pool when the file is closed (in |end_file_reading|); this isn't so unlikely in cases like Johannes'... presumably not all 2400 files can use never-before-encountered control sequences, or he will be running out of other things besides the string pool! The strings of type 1 create a difficulty, however, unless they can be got rid of just after the call of |a_make_name_string| (a certain amount of permuting of the string pool would be required to do that). If they, also, are to be got rid of when the file is closed, again subject to the condition that they are at the top of the pool, one will have to (at least) remember how many of them there were. Some of this would, in fact, be rather easier in METAFONT than TeX. METAFONT's string pool entries have a use count, and reclaiming space consists of purging consecutive entries at the top of the pool whose use counts have all fallen to zero. One could easily arrange that the strings of type 1 had use counts of zero after the opening process was over, and that the strings of type 2 for "input" files had a use count of 1 which was decremented to 0 at close time; then the right things would happen more or less automatically. However, TeX *doesn't* have such use counts, and I don't really suppose Don is going to introduce them in order to solve this problem. Chris Thompson ------- [ dek: I think the strings are also needed for font file names. For ordinary input files I put the special code into \S537 [which CET1 disabled] so that the Math Reviews could input lots of files. Of course there's a workaround (using the operating system to concatenate files!) but otherwise all I can suggest is a local change-file routine that tries to reclaim string space when closing files if the unneeded strings are still at the end of the string pool. You could introduce a new array indexed by 1..max_in_open to keep relevant status information if it isn't already present (see \S304). ] ************************************************************************ TeX -- handling of \newlinechar within \special Date: Thu 9 May 91 09:42:09-EST From: Ron Whitney Subject: \newlinechar within \special Recently I've seen an inconsistency in the way a couple of versions of TeX for the PC handle \newlinechar within \special commands. One (Fuchs, \mu-TeX) gives the same treatment in this case as it does with \write-streams. The others use a more literal interpretation of Knuth's statement on p.228 of The TeXBook regarding what TeX does as it writes out \special information: " TeX doesn't look at the token list to see if it makes sense; the list is simply copied to the output." So if one has \newlinechar=`\^^J, \special{ooh^^Jaah} puts this 9-character sequence into the .dvi file instead of "oohaah". (Of course, the ^^J gets contracted to single token first, then gets blown back up to the 3.) I would have said that \mu-TeX's treatment is the proper one, but perhaps it's understood that the string within the \special is not to be tampered with other than to eat the tokens and then spit them out. Is this an old issue? Is it open to interpretation? ------- Date: Thu, 9 May 91 13:09:08 EDT From: karl@cs.umb.edu (Karl Berry) To: RFW@vax01.ams.com Subject: \newlinechar within \special ron> I would have said that \mu-TeX's treatment is the proper one, but > perhaps it's understood that the string within the \special is not to > be tampered with other than to eat the tokens and then spit them out. > Is this an old issue? Is it open to interpretation? trip.tex seems not to test this. I guess it's open to interpretation, although Knuth should probably be asked. My personal opinion is that ^^J should get turned into a newline character(s); it's easy to turn this feature off (in fact, I suppose it's off by default in plain), after all. karl@cs.umb.edu ------- Date: Thu, 09 May 91 23:54:07 BST From: Chris Thompson Cc: Ron Whitney , Karl Berry Subject: Re: \newlinechar within \special I am afraid that I don't really understand what the postings by Ron Whitney and Karl Berry are saying. The suitably processed token list in a \special ends up in the DVI file. So what does it mean to replace characters equal to \newlinechar in this conext by "newline"? What or whose "newline"? DVI files aren't text files. And if you are going to say "ASCII CR, of course" or "ASCII LF, of course", be prepared to [ dek: ^ _or_ _both_ ] fight off the other 50% of the world :-) If you are going to say "should depend on the implementation", then don't: the contents of the DVI file produced are meant to be implementation-independant. Reference-level TeX does not treat characters equal to \newlinechar specially in \special's; they appear unchanged in the DVI file. The mechanical reason for this is that although |special_out| writes the token list to the string pool (|selector:=new_string|), the special treatment of \newlinechar in TeX sections 58--60 only applies when |selector Subject: RE: RE: \NEWLINECHAR WITHIN \SPECIAL and \message I think that Chris remark that dvi files are to be device independent is questionable as far as specials are concerned. In fact the special is supposed to pass some string to the dvi driver and this means that this program is supposed to understand it. Now this means that the driver needs to interprete the bytes inside the special in the same way as the TeX that writes them out. But if we assume that this is done under some ascii conversion table then why not accept ascii . Not that I see many applications for this. Do I miss something? The whole discussion reminded me of some related business with the newline char of TeX which I think is a bug although one can surely plea for a questionable feature. Compare the output of \newlinechar=`\@ \message{foo@bar} to \newlinechar=`\^^J \message{foo^^Jbar} The first message is broken into two lines the second comes out as is. [ dek: I guess because of certain UNIX implementations coercing all tabs to spaces, those implementation cannot possibly "see" a tab. ? Wait, tab is ^^I. What _is_ going on? Oh, I see; Mittelbach and Sch\"opf are right, see below $10.24 ] Same discrepancy happens with \errormessage which is quite unfortunate and certainly makes macro packages non portable if certain characters can't be entered directly. Whether or not this is covered by the documentation in the TeX book is difficult to say since there are quite a few places where Don leaves things open to interpretation. Frank Mittelbach ------- Date: Fri, 10 May 91 17:09:12 +0200 From: schoepf@sc.ZIB-Berlin.DE (Rainer Schoepf) Cc: PZF5HZ@RUIPC1E.BITNET Subject: RE: RE: \NEWLINECHAR WITHIN \SPECIAL and \message Frank writes: I think that Chris remark that dvi files are to be device independent is questionable as far as specials are concerned. In fact the special is supposed to pass some string to the dvi driver and this means that this program is supposed to understand it. Now this means that the driver needs to interprete the bytes inside the special in the same way as the TeX that writes them out. But if we assume that this is done under some ascii conversion table then why not accept ascii . Not that I see many applications for this. Do I miss something? Yes, you do--at least as far as the new line character is concerned. The point here is that normally the meaning of the \newlinechar is "TeX's internal end-of-line marker", full stop. When writing to a text file (irregardless of the code table) this has a definite meaning, namely: start a new line here, full stop. When it comes to \specials, the notion of "lines" seems at least questionable, even more since the sequence of characters inside a \special need not be anything legible. [ dek: Well, I don't intend 8-bit codes to be going there; I hope they are input from other files by DVI drivers. People might develop binary-coded special conventions but they are too non-portable. The main point in Chris's message is that newline is handled in three completely different ways (on PC, MAC, and UNIX) ] \specials are device-dependent, true. But the consequence of your argumentation is that the same device (say, a PostScript printer) would see a different command on a Unix workstation and an IBM mainframe. Keep in mind that the \special string is not written under the control of the character conversion tables. The whole discussion reminded me of some related business with the newline char of TeX which I think is a bug although one can surely plea for a questionable feature. Compare the output of \newlinechar=`\@ \message{foo@bar} to \newlinechar=`\^^J \message{foo^^Jbar} The first message is broken into two lines the second comes out as is. New, this is something different, since it applies to text files where (as I said above) the notion of "start a new line here" is perfectly sensible. In my eyes this is a bug and should be fixed, even if this behaviour is in conformance with the TeXbook. Rainer Sch\"opf ------- [ dek: Well I've thought about it some more and decided that \special should send 8-bit codes to DVI file without changing the printable ASCII form. This applies also to font file names in case future users want 8-bit codes in those names (nonportable but perhaps important to somebody to see the name in Cyrillic or something). I am changing TeX 3.14 to do this more logically, basically by making 8-bit codes more equal to their printable cousins. At present there are several anomalies [like the string one mentioned w.r.t. Piff's work] [also when you have file names, job names, etc. with nonstandard 8-bit codes], and I think I see how to make it all come out right, ... as a byproduct, Mittelbach's \message problem goes away too. Internally characters will not be translated to ^^A form until the last minute when they simply must be translated. ] ######################################################################## %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Character code reference %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Upper case letters: ABCDEFGHIJKLMNOPQRSTUVWXYZ % Lower case letters: abcdefghijklmnopqrstuvwxyz % Digits: 0123456789 % Square, curly, angle braces, parentheses: [] {} <> () % Backslash, slash, vertical bar: \ / | % Punctuation: . ? ! , : ; % Underscore, hyphen, equals sign: _ - = % Quotes--right left double: ' ` " %"at", "number" "dollar", "percent", "and": @ # $ % & % "hat", "star", "plus", "tilde": ^ * + ~ % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [ end of message 033 ] -------