Knowledge Hub.

You are here:

Home
>
Knowledge Hub
>
Thought Leadership
>
Factors Affecting Code Performance.

Factors Affecting Code Performance.

12 January 2021

“Premature optimisation is the root of all evil”. Whilst that famous quote from Donald Knuth obviously has some truth it, thinking about performance early on when designing and writing code can be a very good thing too.

Here are some small things Software Engineers could think about when writing software that might yield better performance benefits for little or no work. It is not a complete list, of course, and it does not consider databases, web browser DOMs, GPUs, etc. It just considers basic, every-day code, with C# as the particular focus.

Numeric Variable Types

Programming languages typically have multiple numeric variable types. In C#, there are int, long, float, double and decimal, plus the unsigned and smaller integer types. The int type is 32-bit and therefore has a smaller range than the 64-bit long type. If you know you will not need that larger range, i.e. you will be using numbers less than about 2 billion in magnitude, then using a long is just wasteful. It will take longer to process, longer read or write from memory and longer to read or write to disk. Such gains, however, are small fry. The biggest differences are between the floating-point types. The decimal type gives 128-bit floating point numbers based on powers of 10 and therefore has advantages over the 2-based 64-bit double type, e.g. in financial calculations. A double cannot store even £0.01 precisely and might instead end up as £0.01000000000000000021, which is very close, but if you then round up to the nearest pence due to some rule, then you end up with £0.02 and look pretty silly. The base-2 types are native to the chip architectures so typically execute hundreds of times faster than the non-native base-10 one. This can be a free way of making a huge improvement in code performance: don’t use decimal without good reason.

Memory Cache Size

CPUs are about 1000 times faster than they were 30 years ago, but RAM is only about 30 times faster, so RAM needs to be treated with a little respect at times. To help hide this issue, chips typically have multiple levels of memory cache built into them. For example, I am typing this on a laptop with 9MB of level 3 cache. This is the largest but slowest of the caches on the chip and is the last stand before a call has to be made to the much slower main memory. If your application’s active memory usage sits nicely within your cache, it will perform significantly faster.

The graph below shows the result of a little experiment, which involved increasing the size of a dataset from a small fraction of the L3 (level 3) cache up to many times its size. If the time taken was proportional to the size of the dataset, i.e. the time taken per element is fixed, then this graph would be a horizontal line. But as you can see, there’s a notable increase in time per element, which begins once the data size approaches the cache size.

graph-2-1-1024x564

Firstly, the chart makes an important point about assessing code performance. Don’t test on a small dataset and expect the time taken to scale as naively expected to a larger one. Secondly, just don’t be wasteful with memory. For example, if you avoid creating a large array of items to be processed later in multiple steps, when actually you could process them in one go as you are creating them, then you might see a huge benefit in performance.

Memory Locality

You can get huge zero-effort gains if you keep in mind memory locality. For example, if looping over a two-dimensional array, then you need to choose whether to loop over the first or second index in the inner loop.

In the above C# example, changing j alters the memory address less than changing i. Having the inner loop change j rather than i is therefore many times faster for large arrays. It is also often faster to use a true multi-dimensional array e.g. double[,] rather than an array-of-arrays, e.g. double [ ] [ ], at least for large arrays and for certain operations. An array-of-arrays is a set of separate arrays, which are not so easy to handle as the single block of memory used for the true multi-dimensional array. It is obvious, however, that for certain operations, e.g. swapping rows, it is much faster to use arrays-of-arrays, and is also much easier to code.

first

In the above C# example, changing j alters the memory address less than changing i. Having the inner loop change j rather than i is therefore many times faster for large arrays. It is also often faster to use a true multi-dimensional array e.g. double[,] rather than an array-of-arrays, e.g. double [ ] [ ], at least for large arrays and for certain operations. An array-of-arrays is a set of separate arrays, which are not so easy to handle as the single block of memory used for the true multi-dimensional array. It is obvious, however, that for certain operations, e.g. swapping rows, it is much faster to use arrays-of-arrays, and is also much easier to code.

LINQ

LINQ is a useful toolbox in C# and can make code easier to write, more standard, and easier to read. It can, however, yield some slow results. A key point is that LINQ does not understand your data, but you do.

Avoid using List.Count() since List.Count executes faster and involves less typing, though frankly, you will not be saving much CPU time here.
Avoid using LINQ as what seems like a coding short cut rather than using an appropriate collection class correctly. For example, dictionary[key] is likely to be of order one hundred times faster than: dictionary.Single(keyValuePair => keyValuePair.Key == key).
Avoid LINQ with very large arrays when an in-place implementation is likely to be much faster. For example, in the code below the LINQ version is probably more readable but it is using twice as much memory to achieve the same result and is a factor of a few times slower.

second

Do consider that LINQ can use lazy evaluation, which can prevent costly operations from being carried out when they are not actually necessary.
Do consider using LINQ.AsParallel() to readily get code parallelisation.

Miscellaneous

Here are some extra pointers:

Keep in mind disk data locality and size. Disk I/O is even slower than RAM and some of the same ideas carry over. Keep your reads close together if you can do so and keep your files small if you can.
Note that binary data formats are smaller than human-readable ones and require less parsing effort from the code, but they come at the obvious cost than a human cannot read them. With a large file, however, this can make a day-and-night difference in performance.
Be mindful of metadata for small objects. For example, if your data is a collection of 32-bit integers and you use a bi-directional linked list on a machine using 64-bit memory addresses then your metadata is actually four times the size of your data.
Avoid throwing exceptions on an inner loop. Exceptions shouldn’t be thrown unnecessarily in any case, they should be reserved for exceptional cases, but also bear in mind that they are much slower than returning false, say, from a function.
Avoid boxing and unboxing value types into object, e.g. object o = 1. Value types are stored on the stack, objects are stored in the heap.
Put computationally cheap items first on e.g. test1 || test2 || test3, if possible, since the later items will only be evaluated if they actually impact the result.
Avoid concatenating large numbers of strings without e.g. a StringBuilder, since you create a new string at each step
Avoid using reflection in an inner loop
Avoid using yield return in an inner loop unless you know it will help for other reasons