Coccinelle (meaning ladybug in French) can be a gardener’s good friend and even though I enjoy taking care of my house plants, I am writing here a bit about software development.

Being a C developer, I know how dangerous the language can be and how important it is to be able to protect oneself when using it.

This is the reason why Coverity is used on Terminology. It does provide some static analysis on Terminology’s code.

However, it may not be enough or not have all the rules I wish.

What is Coccinelle?

Coccinelle is a program matching and transformation engine which provides the language SmPL (Semantic Patch Language) for specifying desired matches and transformations in C code.

Coccinelle can read C code and allow to write semantic patches on it. Such patches can be used to use a new API (or macro) like in the following example:

@@
expression n,d;
@@

(
- (((n + d) - 1) / d)
+ DIV_ROUND_UP(n,d)
|
- ((n + (d - 1)) / d)
+ DIV_ROUND_UP(n,d)
)

This script will generate patches to use the macro DIV_ROUND_UP() to make the code easier to read.

Longer example

Recently, I stumbled accross some code that was messing up with the type of enum a function should return. Below is an example of bad code:

enum foo {
    FOO_1,
    FOO_2,
    FOO_3
};

enum bar {
    BAR_1,
    BAR_2
};

enum bar
my_func(void)
{
    enum foo f = FOO_3;

    do_something();

    return f;
}

Clang and Coverity can find such implicit enum conversions but I wanted to see if I could write some Coccinelle patch to find and fix such cases.

Here’s what I wrote:

1     @@
2     type T, B != T;
3     identifier func, i;
4     expression E;
5     @@
6 
7     T func (...) {
8     ...
9     (
10     T i;
11     |
12     T i = E;
13     |
14     -B i;
15     +T i;
16     |
17     -B i = E;
18     +T i = E;
19     )
20     <+...
21     return i;
22     ...+>
23     }

On line 7, the return type of the function is set to T.

Between the parentheses on lines 9 and 19, different patterns are matched.

On line 21, the return statement is matched and tells coccinelle to tag a variable matching i. The <+... and +...> snippets tell Coccinelle that the return statement may not be at a deeper level in the code, like inside an if block.

Back to those parentheses, the type T can be instantiated with the variable i on line 10, 12. On line 14 and 16, if the variable i is not using the correct type, then let Coccinelle do the change and use the type T.

Coccinelle produces this patch:

--- main.c
+++ /tmp/cocci-output-1131-72d04b-main.c
@@ -12,7 +12,7 @@ enum bar {
 enum bar
  my_func(void)
   {
   -    enum foo f = FOO_3;
   +    enum bar f = FOO_3;

        do_something();

Checks & CI

Such patches can run in a CI environment and ensure that some APIs are used correctly.

The following patch ensures that when a function ending with _lock, that is followed by a function ending with _unlock, then all the code paths have an _unlock call. This can find issues that arise typically in error-management code when one would forget to add an _unlock call.

// A mutex_lock is not matched by a mutex_unlock before an error return/goto.
@@
expression l;
identifier LOCK =~ "^.*_lock$";
identifier UN =~ "^.*_unlock$";
@@
LOCK(l);
... when != UN(l)
    when any
    when strict
(
{ ... when != UN(l)
+   todo_add_unlock(l);
    return ...;
}
|
UN(l);
)

It is possible to use regular expression on identifiers to enforce that the patch only applies to some functions, variables, types.

I wrote a small script to test Terminology’s code against such semantic patches. If Coccinelle produces a patch, then the CI fails and one has to fix the code.

I’ve added some scripts that check the following:

two comparisons of the same expression to different constants
a pointer should not be compared to 0
continue at the end of a for loop has no purpose
calling free() on variables on the stack
!x&y: missing bit-wise operator and negation
NULL tests where the value is known not to be NULL
pointer is dereferenced when known to be NULL
a variable is only initialized and never used
a variable used an iterator is used outside the iterator loop, without breaking the loop, and thus the variable has a bad value
missing unlock
missing usage of macros DIV_ROUND_UP, ROUND_UP, MIN, MAX, …
bad type used in return statement
…

Sources & Pointers

Most of the scripts I used can be found on Coccinelle website. The following links also helped me write some semantic patches:

Writing Coccinelle patches can be a bit tedious sometimes. Thus, I encourage anyone who wants to write some semantic patches with Coccinelle, to first write a test case to validate that the script does indeed what is expected of it.

I only presented Coccinelle as a tool to find bugs but semantic patches are also great to automatically refactor some code.

1	@@
2	type T, B != T;
3	identifier func, i;
4	expression E;
5	@@
6
7	T func (...) {
8	...
9	(
10	T i;
11	\|
12	T i = E;
13	\|
14	-B i;
15	+T i;
16	\|
17	-B i = E;
18	+T i = E;
19	)
20	<+...
21	return i;
22	...+>
23	}