APE - Arbitrary Precision Arithmetic Routines Mark E. Carson Department of Mathematics, University of California, Berkeley _1. _I_n_t_r_o_d_u_c_t_i_o_n The _a_p_e library is a set of routines for arbitrary pre- cision integral arithmetic written in the C language. These routines can be called by either C or f77 programs (for the latter, see section 5 below). For C programs, the line #include should be included in the calling program to get the neces- sary typedefs and external declarations. In the case of f77, the calling program uses only integers (which are con- verted to structure pointers) and (with two exceptions) calls only subroutines, so no special declarations are required. For both C and f77, the flag -lape should be used to tell the loader to access the library. The basic structure used by the _a_p_e routines is called a _M_I_N_T (standing for ``multiprecision integer''). It is defined by the typedef typedef struct mint { int len; short *val; } MINT, *PMINT; which is in ape.h. Here the field ``len'' is an integer whose sign is the sign of the number represented, and whose magnitude is the number of words (short integers) needed to store it. The (absolute value of the) number itself is stored as an array of short integers pointed to by ``val.'' The representation is essentially base 2^15, with the 16th bit of a short integer used only to store intermediate results in calculations. Hence the largest number representable on a 16 bit computer like a PDP-11 is (2^15)^(2^15) which is about 3e147962, though of course in July 16, 1982 - 2 - actuality the number would have to be much smaller to allow any room for calculations or overhead. To maximize the amount of data space available, it is advisible to use the -i flag when loading the program. For a comparison with some quantities from number theory, this means the largest Fermat number which is representable is F18. For a 32 bit computer the limit would be F34 (assuming, that is, that it had 4 billion bytes of core!). In an actual test, the largest Fermat number which could be directly computed and output by the routines was F17, a number of about 39,500 decimal digits. Calculations are done much as they are by hand with numbers represented base 10. All the routines deal with stucture pointers (_P_M_I_N_Ts), so it's better to declare and use these rather than the structures themselves. More space is allocated as needed (by calls to _m_a_l_l_o_c) and (at least mostly) freed when unneeded (by calls to _f_r_e_e). The following sections treat the various routines in some detail. _2. _I_n_i_t_i_a_l_i_z_a_t_i_o_n _a_n_d _R_e_m_o_v_a_l Before a _M_I_N_T or _P_M_I_N_T can be used by other routines, it must be initialized. Initialization comes in three varieties: new(pa) PMINT *pa; Like the Pascal procedure, _n_e_w(&_a) creates a zeroed _M_I_N_T which _a points to. PMINT shtom(sn) short sn; PMINT itom(n) int n; PMINT ltom(ln) long ln; PMINT stom(s) char *s; These functions return pointers to _M_I_N_Ts with values equal to their arguments (or to the numeric equivalent of its argument for _s_t_o_m). _s_t_o_m can be used for ini- tializing with numbers of arbitrary length. It follows the usual C conventions for determining input bases. July 16, 1982 - 3 - PMINT padd(a,b) PMINT psub(a,b) PMINT pmult(a,b) PMINT pdiv(a,b) PMINT pmod(a,b) PMINT psdiv(a,n) PMINT psmod(a,n) PMINT ppow(a,b,c) PMINT prpow(a,n) PMINT psqrt(a) PMINT a,b,c; int n; These functions are simply versions of the correspond- ing arithmetic routines discussed below (with the ``p'' either deleted or replaced by an ``m'' except for _p_m_o_d and _p_s_m_o_d which return the remainders from _m_d_i_v and _s_d_i_v respectively) which return pointers to their results, rather than making their last arguments point to them. They should be used only for initialization, not further calculation, else storage space may be lost. Two routines are provided for freeing unneeded space: xfree(a) afree(a) PMINT a; _x_f_r_e_e frees what _a->_v_a_l points to; _a_f_r_e_e also frees what _a points to. _x_f_r_e_e zeroes an already initialized _P_M_I_N_T while _a_f_r_e_e returns it to an uninitialized state. Obviously, these should only be used when the routines here have been used to initialize the pointers. Since _x_f_r_e_e is used frequently by the _a_p_e routines to ``clear'' _M_I_N_Ts prior to assignments, it is not advis- able to initialize the ``val'' field by methods other than those mentioned here. _3. _A_r_i_t_h_m_e_t_i_c For the following conceptual descriptions, assume these type declarations: PMINT a,b,c,d,q,r; int m,n; long ln; short *R; _R_o_u_t_i_n_e _R_e_s_u_l_t madd(a,b,c) c = a + b msub(a,b,c) c = a - b July 16, 1982 - 4 - mult(a,b,c) c = a * b sdiv(a,m,q,R) q = a / m; R = a mod m mdiv(a,b,q,r) q = a / b; r = a mod b gcd(a,b,c) c = gcd(a,b) reciprocal(a,n,b) b = 10^n / a msqrt(a,b,r) b = sqrt(a); r = a - b*b (r's length is returned) pow(a,b,c,d) d = a^b mod c rpow(a,n,b) b = a^n lpow(n,ln,b) b = n^ln In all cases, calls like _m_a_d_d(_a,_b,_a) are allowed and give the expected results. The routines dealing with _i_n_ts will all work properly no matter what the relationship between _s_h_o_r_t and _i_n_t except _s_d_i_v which needs the _i_n_t _m to be no larger than the largest _s_h_o_r_t. _r_e_c_i_p_r_o_c_a_l is the closest thing provided to decimal fractions. Note that before any of these routines can be used, all the pointers used must be initialized. As an example, this program segment will fail: PMINT a,b,c; a = itom(2); b = itom(3); madd(a,b,c); It should be changed to either PMINT a,b,c; PMINT a,b,c; a = itom(2); a = itom(2); b = itom(3); or b = itom(3); new(&c); c = padd(a,b); madd(a,b,c); Comparisons may be done with the integral function _m_c_m_p. _m_c_m_p(_a,_b) returns something positive if _a > _b, nega- tive if _a < _b, and zero if _a = _b. For seeing whether _a is positive, negative or zero, it suffices to look at _a->_l_e_n. _4. _I_n_p_u_t _a_n_d _O_u_t_p_u_t Input and output are allowed for any reasonable base, to and from both files and character strings. The proto- types are: July 16, 1982 - 5 - PMINT a; int n; [must have n > 1] FILE *f; char *s; m_in(a,n,f) input a number base ``n'' from file ``f'' into ``a'' sm_in(a,n,s) input a number base ``n'' from string ``s'' into ``a'' m_out(a,n,f) output ``a'' base ``n'' onto file ``f'' sm_out(a,n,s) output ``a'' base ``n'' onto string ``s'' Spaces are allowed in long input numbers for greater readability (and hence are _n_o_t sufficient for separating numbers); newlines may be escaped with backslashes. _m__i_n returns 0 on normal termination and EOF (-1) when the end of the input file is encountered. And again, the _P_M_I_N_Ts must be initialized before the input routines can be used. The letters a-z or A-Z may be used for the (base 10) numbers 10-35 when the input base is greater than ten. Unfortunately, no provision is made for larger input ``digits.'' On output, a-f are used for 10-15 for bases between 11 and 16; for larger bases the usual ``hybrid'' notation of space-separated base 10 numbers is used. Output numbers are _n_o_t automatically terminated by newlines or bro- ken at any width; an output formatting program such as _p_r_e_t_- _t_y_a_p_e can be used to do the latter. For _s_m__o_u_t, the string is assumed to be large enough to hold the output number. The function _o_u_t_l_e_n_g_t_h(_a,_b) can be used to get an (over) estimate of the length required. Special versions of these routines are provided for bases 8 and 10. They are PMINT a; FILE *f; Base 8 om_in(a,f) omin(a) [f=stdin] om_out(a,f) omout(a) [f=stdout] July 16, 1982 - 6 - Base 10 fmin(a,f) minput(a) [f=stdin] fmout(a,f) mout(a) [f=stdout] Because 2^15 = 8^5, conversion from the internal form to octal output is much easier and faster (about 30 times fas- ter for a hundred-digit number) for base 8 as opposed to any other base. The names used here are a bit unfortunate; in particular, _m_i_n_p_u_t was chosen to avoid clashes with _m_i_n minimum functions. _5. _U_s_a_g_e _w_i_t_h _F_o_r_t_r_a_n F77 programs can make use of the _a_p_e routines through the ``frontend'' subroutines provided. Generally, these cast the _i_n_t_e_g_e_rs passed by the Fortran program into _P_M_I_N_Ts, pass them to the _a_p_e routines, making the call by reference or value as required, and then (when appropriate) casting the resulting _P_M_I_N_Ts back into _i_n_t_e_g_e_rs. _N_o_t_e: when writing these, I assumed throughout that 4 byte integers (``integer*4'') were used. In particular, programs compiled with the -I2 flag may have difficulties with these routines! Versions that work with 2 byte integers are stored in the file ``~ape/shortran.c''. The subroutines provided generally have the same names as their C counterparts. They are: _5._1. _I_n_i_t_i_a_l_i_z_a_t_i_o_n _a_n_d _R_e_m_o_v_a_l _f_7_7 _v_e_r_s_i_o_n _C _v_e_r_s_i_o_n integer n,a int n; PMINT a; character*M s char s[M]; call new(a) new(&a); call itom(n,a) a = itom(n); call stom(s,a) a = stom(s); call xfree(a) xfree(a); call afree(a) afree(a); _5._2. _A_r_i_t_h_m_e_t_i_c Subroutines _m_a_d_d, _m_s_u_b, _m_u_l_t, _m_d_i_v, _s_d_i_v, _g_c_d, _p_o_w and _r_p_o_w are provided to call their C counterparts. The _i_n_t_e_g_e_r functions _m_c_m_p and _m_s_q_r_t likewise call the same-named C functions. July 16, 1982 - 7 - _5._3. _I_n_p_u_t _a_n_d _O_u_t_p_u_t I/O is provided only from stdin/stdout, for bases 8 and 10. _f_7_7 _v_e_r_s_i_o_n _C _v_e_r_s_i_o_n integer i,a int i; PMINT a; call minput(a,i) i = minput(a); call omin(a,i) i = omin(a); call mout(a) mout(a); call omout(a) omout(a); _5._4. _C_o_n_v_e_r_s_i_o_n_s Subroutines are also provided for converting from the _M_I_N_T structure to the closest approximation in Fortran, the pair of an _i_n_t_e_g_e_r (representing the ``len'' field) and a vector of _i_n_t_e_g_e_r*_2's (representing the array pointed to by the ``val'' field). Usage is as follows: integer a, length (a represents a PMINT) integer*2 vector(N) (must have abs(length) .le. N) call mtovec(a,length,vector) results in: length = a->len vector(i) = a->val[i-1] for i = 1 to abs(length) call vectom(length,vector,a) results in: a->len = length a->val[i-1] = vector(i) for i = 1 to abs(length) The latter is actually a method of initialization. _6. _C_o_m_p_a_r_i_s_o_n _w_i_t_h _o_t_h_e_r _a_r_b_i_t_r_a_r_y _p_r_e_c_i_s_i_o_n _p_a_c_k_a_g_e_s The only other program I am aware of for arbitrary pre- cision arithmetic on PDP-11 UNIX* is _b_c/_d_c. Since this is based on a (_y_a_c_c/_l_e_x) interpreter, it is obviously much slower; see ``help bignumbers'' for contest results. UCB _l_i_s_p/_l_i_s_z_t provides arbitrary precision integers, but I have no first-hand experience with its use. Since no working version of the _a_p_e routines has been put on a VAX yet, obviously no comparison has been done. __________________________ *UNIX is a Footnote of Bell Laboratories. July 16, 1982 - 8 - I have heard of multiprecision packages written in For- tran, Pascal, and assembly languages for various computers but have no details on their quality or their portability. The _a_p_e routines have been constructed to be easily transferable to any computer with a C compiler assuming that a) _m_a_l_l_o_c and _f_r_e_e are provided b) _l_o_n_g integers are twice as long as _s_h_o_r_t ones. An amount of reworking would be needed if b) is not true. July 16, 1982