Pitfalls and optimisations in Entity Framework

William Frankish

There are plenty of reasons to choose Entity Framework as your database ORM (object-relational mapping) over something like Dapper that exposes SQL directly. In addition to providing a database migration framework, EF lets you write your queries in C# and therefore get the advantages of C# style code reuse, refactoring and IntelliSense. Lastly, it means in addition you can avoid adding SQL as yet another language in your project’s list.

What it does not do is save you from understanding the SQL behind it. If you don’t know what EF is doing behind the hood, it is dangerously easy to write innocuous looking code with abysmal performance. This article will tell you how to avoid the worst pitfalls and show you how to get optimal performance from your queries.

This article is primarily concerned with Entity Framework Core, but much of this advice can be applied to Entity Framework 6 or similar technologies like NHibernate as well. The generated SQL is slightly simplified to be human readable.

Database vs In-Memory

For an analogy, say you want to the biggest pumpkin from your local grocery store. The absolute wrong way to achieve this would be to purchase every pumpkin in stock, have them all delivered to your house, determine the biggest one yourself and ignore the rest. In the same way, you want as much filtering as possible done within the database so that only the data you need will be delivered and processed by EF.

To achieve this, Entity Framework has an abstract collection class called a Queryable, which represents a plan that you can build up or narrow down for what to request from the database. Eventually, you must convert this into a concrete collection by calling ToList or similar, at which point the query is run and you are committed to doing the rest in memory. The biggest and easiest mistake to make is to call ToList too early in your query and tell the database to give you the entire table when you only want a few entries.

Unfortunately, it’s not the easiest case to avoid. Many architectures and best practises (such as Onion Architecture or the Repository Pattern) will encourage or outright require you to convert your query to a list earlier than you’d like.

It’s not just Lists and Arrays that cause this: casting your Queryable to an Enumerable also will cause you to run your query in memory. This is because behind the scenes EF makes its queries work via extension methods, and extension method resolution depends on what a class an object claims to be, not what class it really is.

Example: Calling ToList immediately fetches all users from the database.

// C#
IList<User> users = Context.Users.ToList();

var fredCount = users
    .Count(u => u.FirstName == "Fred");
-- Generated SQL
SELECT * FROM Users

IEnumerable will also fetch all users from the database regardless of its real type.

IEnumerable<User> users = Context.Users;

var fredCount = users
    .Count(u => u.FirstName == "Fred");
SELECT * FROM Users

Instead build your query as an IQueryable before fetching data.

IQueryable<User> users = Context.Users;

var fredCount = users
    .Count(u => u.FirstName == "Fred");
SQL
SELECT COUNT(1) FROM Users
WHERE FirstName = 'Fred'

In addition to the difference in performance, you should also know that operations can differ between database and in-memory, because of their respective SQL and C# implementations. This includes string matching, which defaults to case-insensitive in SQL and case-sensitive in C#, and null-propagation which happens automatically in SQL but must be specified in C#.

Be careful: String matching behaviour varies depending on if your query is run in-memory or on the database

// Queryable will get names starting with 'a' or 'A'
var users = Context.Users
    .Where(u => u.Name.StartsWith("a"))
    .ToList();

// List will get names starting with 'a' only
var users = Context.Users
    .ToList()
    .Where(u => u.Name.StartsWith("a"))
    .ToList();

Be careful: Null propagation behaviour also varies between in-memory and database queries

// List needs null propagation to avoid errors
var users = Context.Users
    .ToList()
    .Where(u => u.BestFriend?.Name == "Fred")
    .ToList();

// Queryable has null propagation built in
var users = Context.Users
    .Where(u => u.BestFriend.Name == "Fred")
    .ToList();

Lazy and Eager Loading

A basic concept in any database is relationships between tables, for example a People table and a Pets table, where each Person has a child collection of Pets. By default, Entity Framework will not pull in child collections until it knows it needs to. This is an excellent idea in theory, but it works terribly with loops. EF ends up asking for each entity’s child collection one at a time. If you have N entities with N children each, that results in N+1 queries. If each child has N grandchildren, you have N2 + N + 1 queries. For even a moderately sized database this can be incredibly slow.

The obvious fix to this is to use the provided Include method to tell EF ahead of time that you need the child collection. This works but requires you to remember to add a new include method every time you change your code to need another collection.

The better alternative is to make use of a less obvious feature of EF – its ability to analyse lambda functions. If you have a select function that specifies directly what it needs from your entity, then EF will notice and make sure it includes that data – and only that data. While this creates more verbose code, it not only handles child collections but also individual properties, resulting in far more data-efficient queries.

Example: Referring to a lazily loaded collection on every iteration causes multiple SQL queries.

// C#
public class UserModel
{
    public string FirstName { get; set; }
    public IList<PetModel> Pets { get; set; }

    public UserModel(User user)
    {
        FirstName = user.FirstName;
        Pets = user.Pets
            .Select(p => new PetModel(p))
            .ToList();
    }
}

var viewModels = Context.Users
    .Select(u => new UserModel(u))
    .ToList();
SQL
-- Generated SQL
SELECT * FROM Users
SELECT * FROM Pets WHERE UserId = 1
SELECT * FROM Pets WHERE UserId = 2
SELECT * FROM Pets WHERE UserId = 3
-- etc.

You could call include to tell EF to eager load these collections. However, this is an anti-pattern, as you need to remember to do so for every use of UserModel and you need to remember to change all the includes if UserModel’s constructor changes to use different collections.

var viewModels = Context.Users
    .Include(u => u.Pets)
    .Select(u => new UserModel(u))
    .ToList();
SELECT * FROM Users U
LEFT JOIN Pets P ON P.UserId = U.Id

Instead use Lambda functions that mention collections to automatically invoke eager loading while excluding unneeded properties.

Source public class UserModel
{
    public string FirstName { get; set; }
    public IList<PetModel> Pets { get; set; }

    public UserModel(string firstName,
        IList<PetModel> pets)
    {
        FirstName = firstName;
        Pets = pets;
    }
}

var viewModels = Context.Users
    .Select(u =>
        new UserModel(
            u.FirstName,
            u.Pets.Select(p =>
                new PetModel(p.Species)
            ).ToList()
        )
    )
    .ToList();
SourSELECT U.FirstName, P.Species 
FROM Users U
LEFT JOIN Pets P ON U.Id = P.UserId

To fetch single entities with the same efficiency, use Single at the end of the same query.

Sourvar viewModels = Context.Users
    .Where(u => u.Id == 1)
    .Select(u =>
        new UserModel(
            u.FirstName,
            u.Pets.Select(p =>
                new PetModel(p.Species)
            ).ToList()
        )
    )
    .Single();
SQL
SELECT U.FirstName, P.Species 
FROM (
	SELECT TOP 2 Id, FirstName
	FROM Users
	WHERE Id = 1
) AS U
LEFT JOIN Pets P ON U.Id = P.UserId

Additional Lambda magic

You don’t need to create a view model every time you want to Select specific columns. You can also call Select into an anonymous type and it works just as well. This is also useful to avoid code repetition as lambdas do not otherwise support variables, though the calculations are still repeated in the generated SQL.

This opens even more options. You can call GroupBy into a dynamic object, followed by Select. EF will recognise that properties referenced by the GroupBy need to be columns in its own SQL GROUP BY, and that convert methods like Count or Max in the Select into the SQL equivalent. There’s no need to pull entire the list into memory and group after that.

This also works with Concat, provided both concatenated Queryables end with a Select into identical objects. This becomes the SQL equivalent, a UNION ALL.

Example: Anonymous types can help with the lack of variables in C# lambda functions, but do not convert to SQL variables

// C#
var names = Context.Users
    .Select(u => new
    {
        FullName = u.FirstName + " " + u.LastName 
    })
    .Where(d => d.FullName.Contains("a"))
    .ToList();
-- Generated SQL
SELECT FirstName + ' ' + LastName FROM Users
WHERE (FirstName + ' ' + LastName) LIKE '%a%'
 

Example: Anonymous types can be used as the keys for GroupBy

var names = Context.Users
    .GroupBy(u => new
    {
        u.EyeColour, u.HairColour
    })
    .Select(g => new
    {
        g.Key.EyeColour,
        g.Key.HairColour,
        g.Count(),
        g.Max(u => u.Age)
    })
    .ToList();
SELECT 
	EyeColour, HairColour, COUNT(1), MAX(Age)
FROM Users
GROUP BY EyeColour, HairColour

Example: Anonymous types can combine with Concat to become SQL Unions

var userNames = Context.Users
    .Select(u => new {Name = u.FirstName});

var petNames = Context.Pets
    .Select(p => new {p.Name});

var allNames = userNames
    .Concat(petNames)
    .ToList();
SELECT FirstName [Name] FROM Users
UNION ALL
SELECT Name FROM Pets

Conclusions

  • Avoid In-Memory filtering by keeping your query as a Queryable until the end.
  • Beware of functional differences between database filtering and in-memory filtering.
  • Pull only the columns and child collections you need through use of lambda functions.
  • Use anonymous types to reduce duplicated code
  • Use anonymous types to access SQL commands like GROUP and UNION