PRINTF 3CW "02 April 2007" "mathcw-1.00"

Table of contents


NAME

fprintf, printf, snprintf, sprintf, vfprintf, vprintf, vsnprintf, vsprintf - formatted-output routines

SYNOPSIS

cc [ flags ] -I/usr/local/include file(s) -L/usr/local/lib -lmcw [ ... ]

#include <stdarg.h> (required only for the functions with va_list arguments)

#include <stdio.h> extern int printf (const char * restrict format, ...); extern int fprintf (FILE *stream, const char * restrict format, ...); extern int snprintf (char *s, size_t n, const char * restrict format, ...); extern int sprintf (char *s, const char * restrict format, ...); extern int vfprintf (FILE * restrict stream, const char *restrict format, va_list arg); extern int vprintf (const char *restrict format, va_list arg); extern int vsprintf (char *s, const char * restrict format, ...); extern int vsnprintf (char *s, size_t n, const char * restrict format, ...);

DESCRIPTION

Functions in the printf() family convert numerical, character, and pointer data to text according to specifications in the format-string argument.

For the string-output routines (snprintf(), sprintf(), vsnprintf(), and vsprintf()), the output string is always properly terminated with a NUL character, as long as the output string is at least one character long.

Using the printf() and vprintf() functions is equivalent to supplying stdout as the first argument to fprintf() and vprintf() respectively.

The mathcw library implementation of these functions fully conforms to the 1989 and 1999 ISO Standards for the C programming language, but is available even with older pre-C99 compilers, and also offers several useful extensions:

A format string may consist of ordinary characters, which are output verbatim, and format specifications, which usually guide the conversion or use of the next argument. Successive specifications consume additional arguments, and it is the caller's responsibility to ensure that the counts and datatypes of arguments and specifications match. If there are too few arguments, or if they are of the wrong type, the user program may fail because of addressing errors, or produce erroneous conversions. Excess arguments are silently ignored.

Because datatypes are supplied in the format string, yet must match argument types, it is good programming practice to supply explicit type casts on each following argument, so that subsequent changes to datatypes in variable declarations elsewhere in the program do not invalidate the datatype correspondence in calls to functions in the printf() family.

A specification begins with a percent (%) character, and is followed by optional flags, an optional list of unsigned numbers, or asterisks, separated by periods, an optional datatype specifier, and finally, a single-character conversion specifier.

The format specification follows one of these templates:

%{flags}{datatype}[AaBbcdEeFfGginopQqsuXx%@]
%{flags}w{datatype}[AaBbcdEeFfGginopQqsuXx%@]
%{flags}w.p{datatype}[AaBbcdEeFfGginopQqsuXx%@]
%{flags}w.p.e{datatype}[AaBbcdEeFfGginopQqsuXx%@]
%{flags}w.p.e.g{datatype}[AaBbcdEeFfGginopQqsuXx%@]
%{flags}w.p.e.g.b{datatype}[AaBbcdEeFfGginopQqsuXx%@]

Braces indicate a string of zero or more characters, and the braces are not included in the specification. Brackets indicate a set of characters, exactly one of which must be chosen.

The flags are zero or more of these characters, in any order:

-
left justify in field width (default: right justify);
+
numeric output has a leading + or - (default: nothing or -);
#
alternate form (leading 0b or 0B for binary integer conversion, leading 0 for octal integer conversion, leading 0x or 0X for hexadecimal conversion when the value is nonzero, decimal point if the precision is zero, preserve trailing zeros for g and G conversion);
0
fill numeric conversion to field width with leading zeros (allowing room for a leading sign or space), but ignore if the - flag is specified, or, for integer conversion, if a precision is supplied;
space
like the + flag, except use a space instead of + for positive values;
=
center justify (ignored if the - flag is also specified) [mathcw extension].
/
trim trailing fractional zeros [mathcw extension];
\
show subnormals with leading zeros [mathcw extension];
^
uppercase @-conversions (ignored for all others) [mathcw extension];

Flag repetitions are permitted, but carry no additional meaning. However, the backslash flag is also the string escape character in C-language strings, so it must be doubled in format strings in order to represent a single backslash.

The fields w (minimum output field width), p (precision), e (number of digits in exponent), g (group size), and b (base) are strings of zero or more digits, or an asterisk, in which case the next argument, which must be of type int, supplies the value.

An omitted width value means that the conversion will take as many characters as it needs, with no additional spacing supplied around the output string. If the conversion requires more characters than the width value, the width is ignored: data are never lost in the conversion.

Leading zeros in the width are interpreted as flag characters; they do not start an octal integer as they do elsewhere in the C-language family.

When the fields are asterisks, meaning that they are obtained from the function argument list, it is possible for negative field values to be supplied. For the width field, a negative value is interpreted as a minus flag, followed by a positive width. For the remaining fields, negative values are considered to be zero values.

The precision is the minimum number of characters required to represent the value, and leading zeros are supplied if needed. The default precision is one. A zero precision means that conversion of a zero value produces no output; while this is sometimes useful, it is more likely to confuse.

If the precision is omitted for e, E, f, F, g, or G conversion, it is taken to be 6. Thus, %he and %.he are equivalent to %.6he.

If the precision is omitted for a, A, b, B, q, Q, or @ conversion, it is taken to be the minimal length required to represent the number exactly if the floating-point base FLT_RADIX is a power of two. Otherwise, Standard C requires only that it be sufficient to distinguish values of type double, except that trailing zeros may be omitted. In the mathcw library implementation, the default precision is sufficient for the data type to be correctly converted in a round-trip conversion (see the Matula precision formula cited later). For example, with IEEE 754 arithmetic, %....10@ is equivalent to %.16...10@, since 1 + 16 = 17 decimal digits are needed for the 53-bit significand.

The ISO C Standards provide no means to specify exponent width, even though there is existing practice in the decade-older 1978 ISO Fortran Standard for such control. The C Standards simply mandate that the exponent is always signed, and has at least two digits, making it impossible to align tables of numbers when some require more than two digits for the exponents. The mathcw library exponent-width extension provides the needed control; the value is the number of digits in the always-signed exponent. Thus, the %w.d.eF format specification closely resembles Fw.d.e in Fortran.

Grouping provides underscore separators between digit groups, at least one of which has the specified group size. For example, %15.10..3.2@ formats 2**(-24) as 2@1.000_000_000_0@e-24, %...3d formats 2**53 as 9_007_199_254_740_992, and %.21..5g formats 10**6 * pi as 31_41592.65358_97930_1527. The group size applies to all integer and floating-point output, including exponents, but is ignored for string and pointer conversion. Digit grouping is particularly valuable for improving the readability of numeric tables, and follows a centuries-old practice in the printing industry.

The base applies only to @ conversion, and is ignored for all others: for example, %.16..5.8@ formats pi as 8@3.11037_55242_10264_3@e+0. A base out of the range 2 ... 36 is replaced by 10. Bases larger than 10 use successive letters of the modern English alphabet, just as hexadecimal notation uses the additional letters a ... f, and lettercase is ignored.

The exponent of based numbers is always a power of the base. By contrast, for binary, octal, and hexadecimal floating-point formats, the exponent is a power of two. Thus, the decimal value 255.0 can be written equivalently as 0x1.fep+7, 0xffp+0, 0o1.774p+7, 0b1.1111111p+7, and in based-number form as 16@f.f@e+1, 16@ff@e+0, 8@3.77@e+2, 2@1.1111111@e+7, and so on. While C89 supports only the decimal form, C99 also allows the hexadecimal form. However, the hoc(1) language recognizes all of them.

The datatype specifiers are mandatory when arguments do not have default types (int, double, and char *). Standard C requires promotion of signed and unsigned char and short int arguments to signed and unsigned int values. The datatype specifiers allow the promotions to be reversed, and data of the shorter types recovered.

The datatypes for integer argument conversion are:

hh
signed or unsigned char [C99 extension];
h
signed or unsigned short int;
l (ell)
signed or unsigned long int;
ll
signed or unsigned long long int [C99 extension];
j
intmax_t or uintmax_t [C99 extension];
z
size_t [C99 extension];
t
ptrdiff_t [C99 extension].

The datatypes for binary floating-point argument conversion are:

h
float [mathcw extension];
L
long double;
LL
long_long_double [mathcw extension];
hL
extended (__float80) [mathcw and HP-UX extension];
lL
quad (__float128) [mathcw and HP-UX extension].

Regrettably, the C language requires the promotion of a float argument to a double value when the argument is among those represented by an ellipsis in the prototype of a function with a variable number of arguments. That conversion converts a signaling NaN to a quiet NaN on most architectures, and usually destroys the NaN payload as well. Arguments of type decimal_float retain their identity; they are not promoted to decimal_double.

The datatypes for decimal floating-point argument conversion are:

H
decimal_float [mathcw extension];
DD
decimal_double [mathcw extension];
DL
decimal_long_double [mathcw extension];
DLL
decimal_long_long_double [mathcw extension].

The H, DD, and DL datatypes follow the proposal ISO/IEC JTC1 SC22 WG14 N1176 Extension for the programming language C to support decimal floating-point arithmetic.

The conversion types for integers are:

d
signed decimal;
i
same as d;
n
write current output character count into the next argument, which must be of type int *;
o
unsigned octal;
u
unsigned decimal;
X
unsigned uppercase hexadecimal;
x
unsigned lowercase hexadecimal;
Y
unsigned uppercase binary [mathcw extension];
y
unsigned lowercase binary [mathcw extension].

The conversion type for pointers is:

p
void *. Standard C defines the output form to be implementation dependent. In the mathcw library, %#x-style is used for modern 32-bit and 64-bit systems, while for the PDP-10, it follows tradition with %06o,,%06o, and on the PDP-11, with %0o . Platform-specific conventions may be provided for other systems as well.

For maximal portability, pointer arguments that refer to functions should be prefixed with a (void *) type cast, since there are segmented-memory architectures for which function pointers have larger lengths than data pointers, and it is then not possible to represent them with a void * cast for output with the printf() function family.

The conversion types for strings are:

%
literal percent (flags, field widths, and datatype are normally omitted, so the complete format specification can be written as %%). This is the only conversion type for which no argument is consumed.
c
unsigned char, or with the l (ell) modifier, unsigned wchar_t;
s
signed or unsigned char *, or with the l (ell) modifier, wchar_t *.

The conversion types for floating-point values are:

a
lowercase hexadecimal floating-point (-0xh.hhh...p+nn) [C99 extension];
A
uppercase hexadecimal floating-point (-0XH.HHH...P+nn) [C99 extension];
b
lowercase binary floating-point (-0bd.ddd...p+nn) [mathcw extension];
B
uppercase binary floating-point (-0Bd.ddd...P+nn) [mathcw extension];
e
lowercase decimal floating-point (-d.ddd...e+nn);
E
uppercase decimal floating-point (-d.ddd...E+nn);
f
lowercase decimal floating-point (-d.ddd...);
F
uppercase decimal floating-point (-d.ddd...);
g
lowercase decimal floating-point (e or f style);
G
uppercase decimal floating-point (E or F style);
q
lowercase octal floating-point (-0od.ddd...p+nn) [mathcw extension];
Q
uppercase octal floating-point (-0Od.ddd...P+nn) [mathcw extension];
@
based floating-point number (-nn@d.ddd...@e+nn) [mathcw extension].

The g/G conversions use either e/E or f/F conversion, depending on the magnitude of the value to be converted. However, in the absence of the # flag, they differ in that trailing zeros, and any final decimal point, are also removed. The f/F style is used when there are no more than three leading zero fractional digits for magnitudes smaller than one and for larger magnitudes, no more than precision digits before the point, not counting filler zeros. Otherwise, e/E conversion style is used.

More precisely, Technical Corrigendum 2 of the 1999 ISO C Standard says this about g/G conversion styles:

Let P equal the precision if nonzero, 6 if the precision is omitted, or 1 if the precision is zero. Then, if a conversion with style E would have an exponent of X:

IEEE 754 NaN and Infinity are output as NAN and INFINITY, or nan or infinity, depending on the conversion lettercasing. Infinity may carry a sign, but NaN does not.
TO DO: Need flags to distinguish between quiet and signaling NaNs and show their payloads.


IMPLEMENTATION LIMITS

The 1989 ISO C Standard requires that any single conversion specification shall be able to produce at least 509 characters. The 1999 ISO C Standard increases that requirement to at least 4095 characters. The mathcw library guarantees at least 10,239 characters. All counts exclude the trailing NUL in string output. The mathcw library also guarantees that there is no limit, other than available memory or filesystem storage, for string conversion with %s.

IMPLEMENTATION-DEFINED BEHAVIOR

There are several locations in the descriptions of the printf() function family where the ISO C Standards leave behavior unspecified, or declare it to be implementation defined. Such imprecision is a barrier to portability, since user code that exploits the behavior of one particular library implementation is likely to misbehave, or fail, when linked with another implementation, either on the same system (such as might happen by choosing a different compiler), or on a different system.

Here is a list of those areas in order of their appearance in Technical Corrigendum 2 of the 1999 ISO C Standard, each with a statement of how the mathcw library implementation behaves:

In the mathcw library, all functions in the printf() family are short wrappers that call a common internal function to handle the format scanning and argument processing. They are thus guaranteed to behave identically, apart from where their output is sent. This has not been true historically of other implementations on some systems, because the family members were introduced at different times, and may have different code.

The C Standard default precision of 6 for the floating-point formats is arbitrary and regrettable. A better choice would have been to use the minimal precision needed to ensure correct round-trip conversions, as derived by I. Bennett Goldberg, 27 Bits Are Not Enough For 8-Digit Accuracy, Comm. ACM 10(2) 105--106, February 1967 and David W. Matula, In-and-Out Conversions, Comm. ACM 11(1) 47--50, January 1968. Matula shows that binary-to-decimal conversion of p-bit significands to d decimal digits requires that 2**p < 10**(d - 1). This is readily solved to obtain d = ceil(p/log2(10) + 1). The low-level conversion functions cvtob(3CW), cvtod(3CW), cvtog(3CW), cvtoh(3CW), and cvtoo(3CW) in the mathcw library provide for this, but the Standard C requirement that omitted format width and precision default to zero makes them indistinguishable from explicit zero values, and makes it impossible to interpret omitted values as requests for the Matula precision formula.
TO DO: Need a flag to get the minimal round-trip conversion precisions.

Standard C is silent about what should be done about specifications with excessively-large precisions. Most implementations of the printf() family produce digits up to the requested precision, even if they are meaningless or unnecessary. The implementation of the mathcw library produces digits only up to the precision predicted by the Matula/Goldberg formula, and thereafter, supplies zero digits. Thus, even though all rational binary numbers can be exactly represented by (possibly long) decimal numbers, only the minimal output string is produced, followed by as many trailing zeros as the precision requires. For example, 2**(-30) = 9.31322_57461_54785_15625e-10 exactly, but for a 32-bit float with a 24-bit significand, the output of that value with %.20..5e is 9.31322_57500_00000_00000e-10. Although the two decimal numbers differ, they produce identical results from decimal-to-binary conversion in the 32-bit float format.


SECURITY ISSUES

There are two significant security issues with the functions in the printf() family:

Security tools such as its4(1) and rats(1), and code analyzers such as splint(1), can be used to scan source code and report these, and other, problems.


RETURN VALUES

Functions in the printf() family return the number of characters converted, or EOF (-1) on error. The count does not include the final NUL in output to a string argument.

For snprintf() and vsnprintf(), if the output buffer s is not large enough, formatting continues, but further data storage is suppressed after storing a NUL in the last buffer position. If the return value exceeds sizeof(s) - 1, then there is insufficient space in the output buffer, and the caller can then decide whether to retry with an output buffer that is larger than the return value.


ERRORS

For file output, filesystem errors, such as a full storage device, can result in an immediate return of EOF at any output character.

For both file and string output, a return of EOF can happen if an erroneous conversion specification is encountered. In such a case, an error message and the faulty specification are reported on stderr.


SEE ALSO

cvtob(3CW), cvtod(3CW), cvtog(3CW), cvtoh(3CW), cvtoo(3CW), fclose(3), fgetc(3), fopen(3), fputc(3), fputs(3), fread(3), fscanf(3CW), fwrite(3), getc(3), getchar(3), getw(3), hoc(1), its4(1), putc(3), putchar(3), puts(3), rats(1), scanf(3CW), splint(1), sscanf(3CW), ungetc(3), ungetwc(3), vfscanf(3CW), vscanf(3CW).