In the previous article, I said that concatenating strings was an O(n) operation on the length of the first string.
I feel the need to explain that.
In erlang, strings are lists of integer. Since they are integers, they can store any unicode characters such as the snowman ☃ which has a value of 9731.
When one wants to concatenate 2 strings, one can use string:concat/2 implemented in lib/stdlib/src/string.erl as follows:
(All code extracts in this article are subject to the Erlang Public License.)
%% concat(String1, String2)
%% Concatenate 2 strings.
-spec concat(String1, String2) -> String3 when
String1 :: string(),
String2 :: string(),
String3 :: string().
concat(S1, S2) -> S1 ++ S2.Ok, strings are lists, just concatenate lists!
The ++ operator is a BIF, a Built-In Function implemented in C in the Beam VM. It is implemented in erts/emulator/beam/erl_bif_lists.c.
/*
* erlang:'++'/2
*/
Eterm
ebif_plusplus_2(BIF_ALIST_2)
{
return append(BIF_P, BIF_ARG_1, BIF_ARG_2);
}
static BIF_RETTYPE append(Process* p, Eterm A, Eterm B)
{
Eterm list;
Eterm copy;
Eterm last;
size_t need;
Eterm* hp;
int i;
if ((i = list_length(A)) < 0) {
BIF_ERROR(p, BADARG);
}
if (i == 0) {
BIF_RET(B);
} else if (is_nil(B)) {
BIF_RET(A);
}
need = 2*i;
hp = HAlloc(p, need);
list = A;
copy = last = CONS(hp, CAR(list_val(list)), make_list(hp+2));
list = CDR(list_val(list));
hp += 2;
i--;
while(i--) {
Eterm* listp = list_val(list);
last = CONS(hp, CAR(listp), make_list(hp+2));
list = CDR(listp);
hp += 2;
}
CDR(list_val(last)) = B;
BIF_RET(copy);
}First, the length of A is computed. This operation is O(n) where n is the length of the list. Trust me on this one, of just go look at erts/emulator/beam/utils.c.
If either lists are empty, return the other one.
On lines 29 - 30, space is allocated on the process heap for a new list. Each list item has a size of 2 Eterm: one for the list itself, one for the element.
The code on lines 31 to 42 just copies A to our newly allocated list and adds B on its tail.
last = CONS(hp, CAR(list_val(list)), make_list(hp+2));Ok, what is that?
From erts/emulator/beam/erl_term.h, we need some more definitions:
#define CONS(hp, car, cdr) \
(CAR()=(), CDR()=(), make_list())
#define CAR(x) (()[0])
#define CDR(x) (()[1])make_list is used to return an Eterm tagged as a list from a pointer on the process heap.
list_val does the opposite and returns the address on the heap of the list.
CONS puts 2 elements in hp[0] (the element in the list) and hp[1] (the next list item) and returns the list at hp as an Eterm.
That complicated expression can now be read as:
hp[0] the element of the list we want to copy,Eterm following on the stack (hp+2) since the new list is allocated as one block on the stack.When list A has been fully copied, the last element is set to B at the line 42 with CDR(list_val(last)) = B;.
If you’re curious, erts/emulator/beam/sys.h can tell you that an Eterm has usually a size of 4 bytes.
Small integers can be stored directly in those 4 bytes. There are 26 bits usable to store a small integer so it can store up to 2²⁵ = 33554432 because of the sign bit. For reference, pile of poo which was added in october 2010 has a value of 128169.
We can consider that unicode characters can be stored directly in the list.
If a string has n characters, it will use 2 * n words.
A binary on the opposite will use 3 to 6 words (depending on the data size) plus the size of the data itself.
A string stored as a list takes about 8 times the space it would take stored as a binary.
The pointed values are not copied, only the list items themsleves.
Appending a list of records of n tuples to a list of m tuples will take the same time/memory as concatenating a string of length n to one of length m.
The only difference is that tuples are not boxed in an Eterm. The list only contains an Eterm as a pointer to the tuple: where it’s stored in the heap.
Of course there is one! You should use binary and IoList whenever possible. An IoList is iolist = [char() | binary() | iolist()]. They are “deep-list”.
Want to append an IoList a to an IoList b? Simple! Just create a new list with 2 elements: a and b! Concatenation has become an O(1) operation. Most IO apis accept IoLists.