previous article, I said that concatenating strings was an O(n)
operation on the length of the first string.
I feel the need to explain that.
In erlang, strings are lists of integer. Since they are integers, they can store any unicode characters such as the snowman ☃ which has a value of 9731.
When one wants to concatenate 2 strings, one can use string:concat/2
implemented in lib/stdlib/src/string.erl
as follows:
(All code extracts in this article are subject to the Erlang Public License.)
%% concat(String1, String2)
%% Concatenate 2 strings.
-spec concat(String1, String2) -> String3 when
String1 :: string(),
String2 :: string(),
String3 :: string().
concat(S1, S2) -> S1 ++ S2.
Ok, strings are lists, just concatenate lists!
The ++
operator is a BIF, a Built-In Function implemented in C in the Beam VM. It is implemented in erts/emulator/beam/erl_bif_lists.c
.
1 /*
2 * erlang:'++'/2
3 */
4
5 Eterm
6 ebif_plusplus_2(BIF_ALIST_2)
7 {
8 return append(BIF_P, BIF_ARG_1, BIF_ARG_2);
9 }
10
11 static BIF_RETTYPE append(Process* p, Eterm A, Eterm B)
12 {
13 Eterm list;
14 Eterm copy;
15 Eterm last;
16 size_t need;
17 Eterm* hp;
18 int i;
19
20 if ((i = list_length(A)) < 0) {
21 BIF_ERROR(p, BADARG);
22 }
23 if (i == 0) {
24 BIF_RET(B);
25 } else if (is_nil(B)) {
26 BIF_RET(A);
27 }
28
29 need = 2*i;
30 hp = HAlloc(p, need);
31 list = A;
32 copy = last = CONS(hp, CAR(list_val(list)), make_list(hp+2));
33 list = CDR(list_val(list));
34 hp += 2;
35 i--;
36 while(i--) {
37 Eterm* listp = list_val(list);
38 last = CONS(hp, CAR(listp), make_list(hp+2));
39 list = CDR(listp);
40 hp += 2;
41 }
42 CDR(list_val(last)) = B;
43 BIF_RET(copy);
44 }
First, the length of A
is computed. This operation is O(n)
where n
is the length of the list. Trust me on this one, of just go look at erts/emulator/beam/utils.c
.
If either lists are empty, return the other one.
On lines 29 - 30, space is allocated on the process heap for a new list. Each list item has a size of 2 Eterm
: one for the list itself, one for the element.
The code on lines 31 to 42 just copies A
to our newly allocated list and adds B on its tail.
last = CONS(hp, CAR(list_val(list)), make_list(hp+2));
Ok, what is that?
From erts/emulator/beam/erl_term.h
, we need some more definitions:
#define CONS(hp, car, cdr) \
(CAR(hp)=(car), CDR(hp)=(cdr), make_list(hp))
#define CAR(x) ((x)[0])
#define CDR(x) ((x)[1])
make_list
is used to return an Eterm
tagged as a list from a pointer on the process heap.
list_val
does the opposite and returns the address on the heap of the list.
CONS
puts 2 elements in hp[0]
(the element in the list) and hp[1]
(the next list item) and returns the list at hp
as an Eterm
.
That complicated expression can now be read as:
hp[0]
the element of the list we want to copy,Eterm
following on the stack (hp+2
) since the new list is allocated as one block on the stack.When list A
has been fully copied, the last element is set to B
at the line 42 with CDR(list_val(last)) = B;
.
If you’re curious, erts/emulator/beam/sys.h
can tell you that an Eterm
has usually a size of 4 bytes.
Small integers can be stored directly in those 4 bytes. There are 26 bits usable to store a small integer so it can store up to 2²⁵ = 33554432 because of the sign bit. For reference, pile of poo which was added in october 2010 has a value of 128169.
We can consider that unicode characters can be stored directly in the list.
If a string has n
characters, it will use 2 * n
words.
A binary
on the opposite will use 3 to 6 words (depending on the data size) plus the size of the data itself.
A string stored as a list takes about 8 times the space it would take stored as a binary
.
The pointed values are not copied, only the list items themsleves.
Appending a list of records of n
tuples to a list of m
tuples will take the same time/memory as concatenating a string of length n
to one of length m
.
The only difference is that tuples are not boxed in an Eterm
. The list only contains an Eterm
as a pointer to the tuple: where it’s stored in the heap.
Of course there is one! You should use binary
and IoList
whenever possible. An IoList
is iolist = [char() | binary() | iolist()]
. They are “deep-list”.
Want to append an IoList a
to an IoList b
? Simple! Just create a new list with 2 elements: a
and b
! Concatenation has become an O(1) operation. Most IO apis accept IoLists.