Learn how Python represents integers of any size with Bignum.
· follow
Published in · 7 minute read · June 3, 2020
--
Programmers writing in lower-level languages like C/C++ must consider the amount of memory used to represent integers. They must take into account the minimum and maximum values of the variable to avoid overflow problems. These developers should consider whether theAnd t
is enough or if alargo
it is necessary.
Compared to C/C++, Python has an advantage when working with integers. There is no integer overflow problem in Python; therefore, Python programmers don't have to worry about which type of variable to use for each integer. Python allows programmers to manipulate huge numbers without fear of losing precision. The only limitation of Python's integer representation is when the machine runs out of free memory, but that's a hardware limitation.
In practice, this is useful for calculating large values such as factorials. Without using external libraries, Python can calculate factorials regardless of the size of the result. Here is an example of a factorial function:
factorial definition(n):
if n == 0 or n == 1:
return 1
return n * factorial(n-1)
Exercisedelete
function with input231
returns the magnitude of the integer that Python can represent.
>>> silnia(231)
1792233667382633521618843263044232513197622942259968207385215805123682159320161029848328112148883186161436034535802659466 20511186710961457324231695438360438946452453546775940132626488356652304356081187317999607218815529008186162801025046843041 1 8549357073966058335409210318845715212791451245810943745474124030865641181439579407277346347694391122603830173024891069327 16 07996148737294252994723840000000000000000000000000000000000000000000000000000000000000
Note that there are more efficient algorithms for calculating factorials; This example is used to illustrate the magnitude of the output.
integer representation
Before continuing with the discussion, please note that this article is only about the CPython implementation. This implementation is the default and most widely distributed version of Python. Different implementations may represent integers differently, but this discussion will only cover the representation of integers in CPython. One of the advantages of using CPython is that all of the codebase is publicly available in the CPython repository on Github.
As of Python 3, all integer values are represented in the following structure:
structure _long object {
PyObject_VAR_HEAD
digit ob_digit[1];
};
It is possible to expand macros and represent the structure as follows:
structure {
sssize_t ob_refcnt;
estructura _typeobject *ob_type;
ssize_tob_size;
uint32_t ob_digit[1];
};
The first two elements of the above structure are not relevant to this discussion. Elementob_refcnt
is used in Python's garbage collectors, andtype_ob
is used to identify the type, which in this case is an integer.
The total value is represented by the other two variables:ob_digit
Isize_ob
. uses of pythonob_digit
an array to store each digit of a number separately in different index locations. Also, the ob_size variable is used to store two values. Stores the lengthob_digit
a matrix and an integer sign (positive or negative).
In most systems,ob_digit
Esuint32_t
array, but on some older machinesob_digit
Maybeuint16_t
training. In this article, we will consider only the first case.uint32_t
mesas.
This method of representing integer values by sequences of digits using strings or arrays is known as Bignum arithmetic. Typically, Bignum implementations represent binary values; however, this would not be space efficientuint32_t
mesas.
Base 2³⁰
Given the systems they use.uint32_t
arrays to represent integers, Python cannot use all 32 bits to hold a digit. The brief explanation for this limitation is that many built-in functions in Python require a certain number of bits to represent integers for performance and practical reasons. For the more curious reader, there are comments about this limitation in the official CPython repository.
Since Python can only use 30 of the 32 bits of each element, all integers are converted to base 2³⁰. Therefore, all digits in the array have values between 0 and 1073741823 (2³⁰-1). Note that the variablesize_ob
stores the length of the array in base 2³⁰, not base 10.
Also, the matrix representation is in Little-Endian order. In other words, the order is least significant first (lowest index value). For example, suppose the numbers are written in base 10, not base 2³⁰.234
represented by a matrix will be:<4,3,2>
.
For example, a number234254646549834273498
in Python it will first convert to base 2³⁰. Since we do not have enough characters to represent all the digits of a base 2³⁰ number, the digits of a base 2³⁰ number will be represented in base 10 for illustrative purposes.234254646549834273498
in base 2³⁰ is462328538,197050268,203
, Where462328538
represents the first digit, and so on for the other two values. This is because 462328538 × (2³⁰)⁰ + 197050268 ×(2³⁰)¹ + 203 × (2³⁰)² = 234254646549834273498.
Therefore, the number234254646549834273498
in Python base 2³⁰ it has 3 digits:462328538,197050268,203
and this would be represented in Python as follows:
If that number were negative, the Python representation would have the same array but
size_ob
I would be-3
.
Optimization of common integers.
This process of converting and representing integers using Bignum Arithmetic requires a lot of time for runtime operations. For this reason, sinceAnd t
The type is immutable in Python, Python creates representations for all values in between.-5
I256
before running the program. During execution, Python reuses these objects when requested.
One clear disadvantage of using Bignum arithmetic is memory usage. Any integer value in Python takes up at least 28 bytes of memory, which is 7 to 14 times what C would need to create a variable of type.And t
.
bignum addition
One of the benefits of using Bignum Arithmetic is the simplicity of performing arithmetic operations. In this article we will only discuss addition, but the rest of the operations are based on the same concept.
The idea behind Bignum addition is to do addition the same way people add base 10 numbers with pencil and paper. The process starts with the least significant digit and continues to the most significant digits. Each digit of each number is added separately and the result is shifted oneto usevalue to the next higher digit.
Bignum Arithmetic takes this approach using matrices. The process is to add each value at the same index of each separate array and move the value that exceeds the decimal to the next index. The algorithm starts at the index.0
and iterates to the length of the smallest array, adding digits using the carry method.
The algorithm starts by creating a new empty array to hold the result. Note that the result of the sum of the two values has a maximum of one digit more than the greater number of the sum. For example, the sum between9
I93
Es102
. The highest value in this example is93
and has 2 digits. The result of the addition has 3 digits, one more than93
. In some cases, the number of digits in the result is equal to the number of digits in the larger number. In this case, the algorithm reduces the size of the matrix to fit the result without0
'S The only case where the last cell in the array (upper index) has a value0
used to represent a number0
.
For illustrative purposes, here is an example of how to add two numbers represented by Bignum arithmetic. The added values are234254646549834273498
I23425464654983
.
Number234254646549834273498
will be represented like this:
I number23425464654983
will be represented like this:
The algorithm starts by creating a new array of size 4 (one cell longer than the array that represents the largest value in the appendix).
The algorithm then starts the index-by-index transfer process.
After traversing all the indices down to the smallest length in the array, the algorithm calculates the following values:
Finally, the program reduces the size of the array by 1 to remove the last empty cell. Then the entire structure created from the plugin is represented as follows:
The original algorithm is written in C, but here's a Python function that simulates the process.
Function
add
takes two python lists. Each list represents one of the integers added in this function. The numbers are already converted to base 2³⁰ and each item in the list stores one digit of the value.
In short, Python uses Bignum arithmetic to represent integers. Compared to other languages like Java and C/C++, Python makes working with integers a breeze. While other languages require the programmer to specify a size variable to hold the number, Python bypasses this need. However, this method also has a drawback in terms of memory consumption. While languages like C use 2 or 4 bytes to represent a variable of typeAnd t
, Python requires at least 28 bytes. For simple scripts, this additional memory usage makes no difference; however, for data-intensive programs, it may be interesting to use other languages such as C.
Thank you very much for reading this article! I'll be posting more about Python and other programming topics soon.