13

If I write something like this:

var things = mythings
    .Where(x => x.IsSomeValue)
    .Where(y => y.IsSomeOtherValue)

Is this the same as:

var results1 = new List<Thing>();
foreach(var t in mythings)
    if(t.IsSomeValue)
        results1.Add(t);

var results2 = new List<Thing>();
foreach(var t in results1)
    if(t.IsSomeOtherValue)
        results2.Add(t);

Or is there some magic under the covers that works more like this:

var results = new List<Thing>();
foreach(var t in mythings)
    if(t.IsSomeValue && t.IsSomeOtherValue)
        results.Add(t);

Or is it something completely different altogether?

4 Answers4

27

LINQ queries are lazy. That means the code:

var things = mythings
    .Where(x => x.IsSomeValue)
    .Where(y => y.IsSomeOtherValue);

does very little. The original enumerable (mythings) is only enumerated when the resulting enumerable (things) is consumed, e.g. by a foreach loop, .ToList(), or .ToArray().

If you call things.ToList(), it is roughly equivalent to your latter code, with perhaps some (usually insignificant) overhead from the enumerators.

Likewise, if you use a foreach loop:

foreach (var t in things)
    DoSomething(t);

It is similar in performance to:

foreach (var t in mythings)
    if (t.IsSomeValue && t.IsSomeOtherValue)
        DoSomething(t);

Some of the performance advantages of the laziness approach for enumerables (as opposed to calculating all the results and storing them in a list) are that it uses very little memory (since only one result is stored at a time) and that there's no significant up-front cost.

If the enumerable is only partially enumerated, this is especially important. Consider this code:

things.First();

The way LINQ is implemented, mythings will only be enumerated up to the first element that matches your where conditions. If that element is early on in the list, this can be a huge performance boost (e.g. O(1) instead of O(n)).

Cyanfish
  • 816
7

The following code :

var things = mythings
    .Where(x => x.IsSomeValue)
    .Where(y => y.IsSomeOtherValue);

Is equivalent to nothing, because of the lazy evaluation, nothing will happen.

var things = mythings
    .Where(x => x.IsSomeValue)
    .Where(y => y.IsSomeOtherValue)
    .ToList();

Is different, because the evaluation will be launch.

Each item of mythings will be given to the first Where. If it passes, it will be given to the second Where. If it passes, it will be part of the output.

So this looks like more like this :

var results = new List<Thing>();
foreach(var t in mythings)
{
    if(t.IsSomeValue)
    {
        if(t.IsSomeOtherValue)
        {
            results.Add(t);
        }
    }
}
Cyril Gandon
  • 1,346
7

Deferred execution aside (which the other answers already explain, I'll just point out another detail), it's more like in your second example.

Let's just imagine you call ToList on things.

The implementation of Enumerable.Where returns a Enumerable.WhereListIterator. When you call Where on that WhereListIterator (a.k.a. chaining Where-calls), you no longer call Enumerable.Where, but Enumerable.WhereListIterator.Where, which actually combines the predicates (using Enumerable.CombinePredicates).

So it's more like if(t.IsSomeValue && t.IsSomeOtherValue).

sloth
  • 275
  • 2
  • 9
1

No it is not the same. In your example things is an IEnumerable, which at this point is still only an iterator, not an actual array or list. Moreover since things is not used, the loop is never even evaluated. The type IEnumerable allows to iterate through elements yield-ed by Linq instructions and process them further with more instructions, which means in the end you really have only one loop.

But as soon as you add an instruction like .ToArray() or .ToList(), you're ordering the creation of an actual data structure, thus putting bounds to your chain.

See this related SO question: https://stackoverflow.com/questions/2789389/how-do-i-implement-ienumerable

Julien Guertault
  • 720
  • 4
  • 15