BigFloat(Variable Precision Floating Library for Ruby)

October 1st,2000: modified according to Ruby-1.6 change.


Japanese


BigFloat is an extension library for the Ruby interpreter. Using BigFloat class, you can obtain any number of significant digits in computation.

Maintenance of BigFloat has been stopped, use BigDecimal which is bundled from Ruby-1.8 instead. The most recent source code can be downloaded from Ruby CVS. 2003 - 8

For the details about Ruby see:

NOTE:
This software is provided "AS IS" and without any express or implied warranties,including,without limitation,the implied warranties of merchantibility and fitness for a particular purpose. For the details,see COPYING and README included in this distribution.

Contents


Introduction

Ruby already has builtin class Bignum. Using Bignum class,you can obtain any integer value in magnitude. But, Bigfloat floating number class is not yet built in. This is why I made variable precision floating class BigFloat. Feel free to send any comments or bug reports to me.
shigeo@tinyforest.gr.jp I try(but can't promise) to fix bugs reported.

Installation

The Ruby latest version can be downloaded from Official Ruby page. Once decompress the downloaded Ruby archive,you can find the directory "ext" which is prepared for extension libraries, and "win32" for Windows specific directory. Move to the directory "ext",and make a new subdirectory with the name "bigfloat". Then move to the directory "ext/bigfloat",copy compressed(.tar.gz or .lzh file ) bigfloat archive file to there and decompress it. Both .tar.gz and .lzh files have no directory informations.

Installation of Ruby interpreter

First of all,you need to build Ruby interpreter. For UNIX/Linux users,read README and follow the building instructions. For Windows users,read win32/README.win32 and follow the building instructions.

Installation on UNIX/Linux

For UNIX/Linux users,move to bigfloat directory and enter following commands.
    ruby extconf.rb
    make
    make install

Installation on Windows

Installation on Windows is almost the same as UNIX/Linux case,move to the bigfloat directory and enter the following commands.
    ruby extconf.rb
    nmake
    nmake install

For the user using Microsoft Visual C/C++ 6.0,the project files are available. Download:

winide143.lzh and decompress it in the directory "../ruby-1.4.3/win32".
or
winide143.lzh (or winide16.tar.gz) and decompress it in the directory "../ruby-1.6.x/win32".

Usage and methods

Suppose you already know Ruby programming, to create BigFloat objects,the program would like:
   require 'BigFloat' # From v1.1.9 change to 'bigfloat' (UNIX user)
   a=BigFloat::new("0.123456789123456789")
   b=BigFloat::new("123456.78912345678",40)
   c=a+b

List of methods

In the following explanations,n specifies the minimum number of resulting significant digits, not exactly but slightly excess memories will be allocated to newly created object. In 32 bit integer system,every 4 digits(in decimal) are computed simultaneously. This means the number of significant digits in BigFloat is always a multiple of 4.
Following methods need no explanation.

About 'coerce'

For the binary operation like A op B:
1.Both A and B are BigFloat objects
A op B is normally performed.
2.A is the BigFloat object but B is other than BigFloat object
Operation is performed,after B is translated to correcponding BigFloat object(because BigFloat supports coerce method).
3.A is not the BigFloat object but B is BigFloat object
If A has coerce mthod,then B will translate A to corresponding BigFloat object and the operation is performed,otherwise an error occures.
Attention must be paid when a String is to be translated to BigFloat. Translation stops without error at the character representing non digit. For instance,"10XX" is translated to 10,"XXXX" is translated to 0.
String representing zero or infinity such as "Infinity","+Infinity","-Infinity",and "NaN" can also be translated to BigFloat unless false is specified by mode method.
BigFloat class supports coerce method(for the details about coerce method,see Ruby documentations). This means the most binary operation can be performed if the BigFloat object is at the left hand side of the operation.

For example:
  a = BigFloat.E(20)
  c = a * "0.123456789123456789123456789" # A String is changed to BigFloat object.
is performed normally.
But,because String does not have coerce method,the following example can not be performed.
a = BigFloat.E(20)
c = "0.123456789123456789123456789" * a # ERROR
If you actually have any inconvenience about the error above. You can define a new class derived from String class, and define coerce method within the new class.

Infinity,Not a Number(NaN),Zero

Infinite numbers and NaN can be represented by string writing "+Infinity"(or "Infinity"),"-Infinity",and "NaN" respectively in your program. Infinite numbers can be obtained by 1.0/0.0(=Infinity) or -1.0/0.0(=-Infinity).

NaN(Not a number) can be obtained by undefined computation like 0.0/0.0 or Infinity-Infinity. Any computation including NaN results to NaN. Comparisons with NaN never become true,including comparison with NaN itself.

Zero has two different variations as +0.0 and -0.0. But,still, +0.0==-0.0 is true.

Computation results including Infinity,NaN,+0.0 or -0.0 become complicated. Run following program and comfirm the results. Send me any incorrect result if you find.

require "BigFloat"

aa  = %w(1 -1 +0.0 -0.0 +Infinity -Infinity NaN)
ba  = %w(1 -1 +0.0 -0.0 +Infinity -Infinity NaN)
opa = %w(+ - * / <=> > >=  < == != <=)

for a in aa
  for b in ba
    for op in opa
      x = BigFloat::new(a)
      y = BigFloat::new(b)
      eval("ans= x #{op} y;print a,' ',op,' ',b,' ==> ',ans.to_s,\"\n\"")
    end
  end
end


Internal structure

BigFloat number is defined by the structure Real in bigfloat.h. Digits representing a float number are kept in the array frac[] defined in the structure. In the program,any floating number(BigFloat number) is represented as:
= 0.xxxxxxxxx*BASE**n

where 'x' is any digit representing mantissa(kept in the array frac[]), BASE is base value(=10000 in 32 bit integer system), and n is the exponent value.
Larger BASE value enables smaller size of the array frac[],and increases computation speed. The value of BASE is defined ind VpInit(). In 32 bit integer system,this value is 10000. In 64 bit integer system,the value becomes larger. BigFloat has not yet been compiled and tested on 64 bit integer system. It will be very nice if anyone try to run BigFloat on 64 bit system and inform me the results. When BASE is 10000,an element of the array frac[] can have vale of from 0 to 9999. (up to 4 digits).
The structure Real is defined in bigfloat.h as:
  typedef struct {
     unsigned long MaxPrec; // The size of the array frac[]
     unsigned long Prec;    // Current size of frac[] actually used.
     short    sign;         // Attribute of the value.
                            //  ==0 : NaN
                            //    1 : +0
                            //   -1 : -0
                            //    2 : Positive number
                            //   -2 : Negative number
                            //    3 : +Infinity
                            //   -3 : -Infinity
     unsigned short flag;   // Control flag
     int      exponent;     // Exponent value(0.xxxx*BASE**exponent)
     unsigned long frac[1]; // An araay holding mantissa(Variable)
  } Real;
The decimal value 1234.56784321 is represented as(BASE=10000):
    0.1234 5678 4321*(10000)**1
wher frac[0]=1234,frac[1]=5678,frac[2]=4321, Prec=3,sign=2,exponent=1. MaxPrec can be any value greater than or equal to Prec.

Binary or decimal number representation

I adopted decimal number representation for BigFloat implementation. Of cource,binary number representation is common on the most computers.

Advantages using decimal representation

The reason why I adopted decimal number representation for BigFloat is:
Easy for debugging
The floating number 1234.56784321 can be easily represented as:
frac[0]=1234,frac[1]=5678,frac[2]=4321,exponent=1,and sign=2.
Exact representation
Following program can add all numbers(in decimal) in a file without any error(no round operation).

   file = File::open(....,"r")
   s = BigFloat::new("0")
   while line = file.gets
      s = s + line
   end
If the internal representation is binary,translation from decimal to binary is required and the translation error is inevitable. For example, 0.1 can not exactly be represented in binary.
0.1 => b1*2**(-1)+b1*2**(-2)+b3*2**(-3)+b4*2**(-4)....
where b1=0,b2=0,b3=0,b4=1...
bn(n=1,2,3,...) is infinite series of digit with value of 0 or 1, and rounding operation is necessary but where we should round the series ? Of cource,exact "0.1" is printed if the rouding operation is properly done,
Significant digit we can have is automatically determined
In binary representation,0.1 can not be represented in finite series of digit. But we only need one element(frac[0]=1) in decimal representation. This means that we can always determine the size of the array frac[] in Real structure.

Disadvantage of decimal representation

Advantages stated so far can also be disadvantages if the input from outside is represented in binary. Translation error from decimal to binary or vice versa is inevitable. So,translation from Float(binary) to BigFloat(decimal) is not alway done exactly.

Which is the first input?

Because most people uses decimal notatin for numeric data representation, BigFloat can handle numeric data without loss of translation error.

Resulting number of significant digits

For the fundamental arithmetics such as addition,subtraction, multiplication,and division,I prepared 2 group of methods

1. +,-,*,/

For the operation + - * /,you can not specify the resulting number of significant digits.
Resulting number of significant digits are defined as:
1.1 For * and /,resulting number of significant digits is the sum of the significant digits of both side of the operator.
1.2 For + and -,resulting number of significant digits is determined so that no round operation is needed.
For example, c has more than 100 siginificant digits if c is computed as:
c = 0.1+0.1*10**(-100)

As +,-,and * are always exact(no round operation is performed), attention must be paid for the program like:
e = BigFloat.new("1")
while e + 1.0 != 1.0
  e = e / 10
end
Above example continues till all available memories is exhausted. (Because no round operation is performed on e+1.0)
As for the division as c = a/b,the significant digits of c is the same as a*b. Division such as c=1.0/3.0 will be rounded.

2. assign/assign!,add/add!,sub/sub!,mult/mult!,div/div!

The length of the significant digits obtained from +,-,*,/ is always defined by that of right and left side of the operator. To specify the length of the significant digits by your self, use class methods assign!,add!,sub!,mult!,or div!. Or use methos assign,add,sub,mult,or div. No new BigFloat object is created if you use class method(assign! etc), which means you can reduce the chance of garbage collection(GC) takes place. But using class method also reduces the readability of your source code. Following 2 examples compute the ratio of the circumference of a circle to its dirmeter(pi=3.14159265358979....) using J.Machin's formula.

2.1 Using class method
#
# PI (Calculates 3.1415.... using J. Machin's formula.
#

sig = 2000 # <== Number of significant figures

exp    = -sig
sig    = sig + sig/100    # no theoretical reason
pi     = BigFloat::new("0",sig)
two    = BigFloat::new("2")
m25    = BigFloat::new("-0.04")
m57121 = BigFloat::new("-57121")

k = BigFloat::new("1")
w = BigFloat::new("1")
t = BigFloat::new("-80",sig)
v = BigFloat::new("0",sig)
u = BigFloat::new("0",sig)
r = BigFloat::new("0",sig+sig+1)

n1 = 0
n2 = 0
ts = Time::now
while (u.exponent >= exp) 
  n1 += 1
  BigFloat::mult!(v,t,m25)
  BigFloat::assign!(t,v,1)
  BigFloat::div!(u,r,t,k)
  BigFloat::add!(v,pi,u)
  BigFloat::assign!(pi,v,1)
  BigFloat::add!(w,k,two)
  BigFloat::assign!(k,w,1)
end

k = BigFloat::new("1")
w = BigFloat::new("1")
BigFloat::assign!(t,"956",1)
BigFloat::assign!(u,0,1)
while (u.exponent >= exp )
  n2 += 1
  BigFloat::div!(v,r,t,m57121)
  BigFloat::assign!(t,v,1)
  BigFloat::div!(u,r,t,k)
  BigFloat::add!(v,pi,u)
  BigFloat::assign!(pi,v,1)
  BigFloat::add!(w,k,two)
  BigFloat::assign!(k,w,1)
end
p pi   
print "# of iterations = ",n1,"+",n2,"\n"
exit
2.2 Using instance method
#
# PI (Calculates 3.1415.... using J. Machin's formula.
#

sig = 2000 # <== Number of significant figures

exp    = -sig
sig    = sig + sig/100    # no theoretical reason
pi     = BigFloat::new("0")
two    = BigFloat::new("2")
m25    = BigFloat::new("-0.04")
m57121 = BigFloat::new("-57121")

n1 = 0
n2 = 0

u = BigFloat::new("1")
k = BigFloat::new("1")
w = BigFloat::new("1")
t = BigFloat::new("-80")
while (u.exponent >= exp) 
  n1 += 1
  t   = t*m25
  u,r = t.div(k,sig)
  pi  = pi + u
  k   = k+two
end

u = BigFloat::new("1")
k = BigFloat::new("1")
w = BigFloat::new("1")
t = BigFloat::new("956")
while (u.exponent >= exp )
  n2 += 1
  t,r = t.div(m57121,sig)
  u,r = t.div(k,sig)
  pi  = pi + u
  k   = k+two
end
p pi   
print "# of iterations = ",n1,"+",n2,"\n"
exit

Shigeo Kobayashi (E-Mail:<shigeo@tinyforest.gr.jp>)