This post is mostly for my own sanity in that I tried to pull the LINQ bits apart a few months ago and, whilst I didn't get completely there, I got most of it but because I have the memory of a goldfish I can no longer remember it.
So, here goes. This is in my own words and almost certainly not complete or probably even halfway correct but it'll perhaps help me again in the future and could possibly help someone else along a bit as well.
This is based on the May CTP of LINQ rather than a later one so things have changed a little since then.
1 Object Initialisers
Given a class such as;
class Person
{
private string firstName;
public string FirstName
{
get { return firstName; }
set { firstName = value; }
}
private string lastName;
public string LastName
{
get { return lastName; }
set { lastName = value; }
}
}
Object initialisers allow me to construct an instance in a single line and do exactly what you'd expect in terms of the code that's generated.
Person p = new Person() { FirstName="Fred", LastName="Smith" } ;
2 Collection Initialisers
Allows what we could previously do for arrays;
int[] numbers = { 10, 20, 30, 40, 50, 60, 70, 80, 90, 100 };
to be done with generic types that implement ICollection<T>;
List<int> list = new List<int>() { 10, 20, 30 };
and calls ICollection<T>.Add in order to slot the entries into the list.
3 Implicit Typing
Lets the compiler work out what type the variable should be. Strong typing. Can only be used for locally scoped variables (not members, parameters etc).
var i = 10;
Initially this seems perverse because you get all of the opacity of object with a limited set of places where you can use it. Only really makes sense (to me) with the follow on feature...
4 Anonymous Types
Create a data type without giving it a name - hence the "need" for implicit typing as it is hard to declare a variable which has a type that is unnamed (or at least unnamed to the programmer). Combines awfully nicely with object initialisers :-)
object o = new { FirstName="Mike", LastName="Taulty" };
or more realistically;
var v = new { FirstName="Mike", LastName="Taulty" };
Console.WriteLine(v.FirstName);
Console.WriteLine(v.LastName);
The anonymous type is a real type and can be reflected upon etc but the most natural use is through the implicitly typed reference. In the May CTP it seems like an anonymous type of a particular "shape" in one method is not the same type as an anonymous type of the same shape in another method (this seemed to differ to some extent in VB). That is;
static void F1()
{
var v = new { FirstName="Mike", LastName="Taulty" };
Console.WriteLine(v.GetType().ToString());
}
static void F2()
{
var v = new { FirstName="Mike", LastName="Taulty" };
Console.WriteLine(v.GetType().ToString());
}
static void Main(string[] args)
{
F1();
F2();
}
F1 does not print the same type as F2.
Anonymous types are odd in that the compiler needs to "have visibility" of the places where the anonymous type is used which gives the impression that they're not particularly useful.
static void Main(string[] args)
{
var v = new { FirstName="Mike", LastName="Taulty" };
// How the heck do we pass "v" to a fn to do something
// useful with it?
}
which is where Lambdas come in to their own but, first, something else.
5 Inference
I never knew this but apparently given a function such as;
static P Fn<P>(P p)
{
return (p);
}
The C# V2.0 compiler can infer the generic type parameter when this is called with something like;
static void Main(string[] args)
{
int x = Fn(10);
}
Neat.
6 Lambda Expressions and Statements
Lambdas are like anonymous methods. i.e.
static void Main(string[] args)
{
List<int> list = new List<int>() { 5, 1, 6, 2, 7, 3, 8, 4 };
list.Sort(
delegate(int p1, int p2) { return(p1 - p2); });
}
can be replaced with;
static void Main(string[] args)
{
List<int> list = new List<int>() { 5, 1, 6, 2, 7, 3, 8, 4 };
list.Sort((int p1, int p2) => p1 - p2);
}
which is neat but it goes a little further than that because the compiler can perform inference around them such as;
static void Main(string[] args)
{
List<int> list = new List<int>() { 5, 1, 6, 2, 7, 3, 8, 4 };
list.Sort((p1, p2) => p1 - p2);
}
and we can miss the types out which initially seems like a nicety but in the light of anonymous types this becomes very significant for fairly obvious reasons. This means that we can write a method such as Convert;
delegate R ConversionRoutine<R, P>(P p);
static R Convert<R, P>(P p, ConversionRoutine<R,P> routine)
{
return (routine(p));
}
lets us write code such as;
int y = Convert<int,string>("Hello",
delegate(string s) { return(s.Length); });
but (at least in the May CTP) using a Lambda lets us take this further in that we can write;
int x = Convert("Hello", (string s) => s.Length);
and even less explicit;
int x = Convert("Hello", s => s.Length);
and that means that we can use Convert with anonymous types because we never have to mention the type anywhere :-)
var person = new { FirstName="Mike", LastName="Taulty", Age=36 };
var otherPerson = Convert(person,
x => new {
Name = x.FirstName + " " + x.LastName,
AdultYears = x.Age - 18 });
7 Extension Methods
Extension methods allow us to write a method that looks like it's part of a class when it isn't. It's syntactical sugar and the extension method has only public access to the class that it extends and "local" member methods are preferred over extension methods.
So, we can go ahead and move our Convert method to an extension (keyword "this" on first parameter is crucial);
public delegate R ConversionRoutine<R, P>(P p);
public static class MyExtensions
{
public static R Convert<R, P>(this P p, ConversionRoutine<R, P> routine)
{
return (routine(p));
}
}
The compiler will look for extensions in all the "in scope" namespaces and so a switch of a using statement can change which identically defined extension method will be used over another. Our original code becomes slightly nicer in that it now looks like;
var person = new { FirstName="Mike", LastName="Taulty", Age=36 };
var otherPerson = person.Convert(
x => new {
Name = x.FirstName + " " + x.LastName,
AdultYears = x.Age - 18 });
Console.WriteLine(otherPerson.Name);
Console.WriteLine(otherPerson.AdultYears);
8 IEnumerable, Execution Now, Execution Later
With extension methods, we can now go and build a method like Select which is a first step towards "Linq'ing".
Initially, going from one IEnumerable<T> to another IEnumerable<S> seems like a great thing to be able to do and as an example;
public delegate R Projection<R, P>(P p);
public static class MyExtensions
{
public static IEnumerable<R> Select<R, P>(
this IEnumerable<P> p, Projection<R,P> projection)
{
List<R> projectedData = new List<R>();
foreach (P entry in p)
{
R r = projection(entry);
projectedData.Add(r);
}
return(projectedData);
}
}
we can build a function like Select here which will turn one kind of IEnumerable into another kind of IEnumerable by using a supplied Lambda (here called Projection) and then we can use it with something like;
var person = new [] {
new { FirstName="Mike", LastName="Taulty", Age=36 }
};
var otherPerson = person.Select(
x=> new { FullName = x.FirstName + x.LastName,
AdultYears = x.Age - 18 });
foreach (var op in otherPerson)
{
System.Console.WriteLine("{0}, {1}", op.FullName,
op.AdultYears);
}
Now that implementation of Select becomes interesting in that if we do something like this below (fragment A)
var person = new [] {
new { FirstName="Mike", LastName="Taulty", Age=36 }
};
var otherPerson = person.Select(
x=> new { FullName = x.FirstName + x.LastName,
AdultYears = x.Age - 18 });
person[0].FirstName = "Fred";
var morePeople = person.Select(
x=> new { FullName = x.FirstName + x.LastName,
AdultYears = x.Age - 18 });
foreach (var op in otherPerson)
{
System.Console.WriteLine("{0}, {1}", op.FullName,
op.AdultYears);
}
foreach (var op in morePeople)
{
System.Console.WriteLine("{0}, {1}", op.FullName,
op.AdultYears);
}
Then we'll find that, because we produce the projection when it is actually asked for we get 2 different result sets here - i.e.
MikeTaulty, 18
FredTaulty, 18
However, if we wrote Select like this;
public static class MyExtensions
{
public static IEnumerable<R> Select<R, P>(
this IEnumerable<P> p, Projection<R,P> projection)
{
foreach (P entry in p)
{
yield return projection(entry);
}
}
}
Then if we re-ran fragment A with this new version of Select we get a different result from what we got before in that we will see the same result set come back twice rather than 2 different result-sets. This second version of Select does a bit of deferred execution because of the way in which the iterator code generation works.
In some ways this is a bit scary because it means that you can't know exactly what you're going to get unless you know how Select has been implemented although there are methods like ToList() and ToArray() kicking around which let you force the issue.
At this point, I can rewrite my query to use the query expression syntax rather than the explicit method call that I've been using so that we end up with something like;
var person = new [] {
new { FirstName="Mike", LastName="Taulty", Age=36 }
};
var otherPerson =
from p in person
select new { FullName = p.FirstName + p.LastName,
AdultYears = p.Age - 18 };
and this is still using my version of Select from my class MyExtensions.
I can then go and build a version of Where;
public static class MyExtensions
{
public static IEnumerable<R> Select<R, P>(
this IEnumerable<P> p, Projection<R,P> projection)
{
foreach (P entry in p)
{
R r = projection(entry);
yield return r;
}
}
public static IEnumerable<T> Where<T>(
this IEnumerable<T> list, Predicate<T> predicate)
{
foreach (T entry in list)
{
if (predicate(entry))
{
yield return entry;
}
}
}
}
and then I can go ahead and do something like;
static void Main(string[] args)
{
int[] numbers = { 10, 20, 30, 40, 50 };
var data =
from n in numbers
where n > 30
select n * 5;
}
And all that's happening is that a list of numbers flows into my Where function where it gets transformed into another list of numbers that flows into my Select function where it gets transformed into another list of numbers (good job we have garbage collection :-)).
But...this would be horrendous for SQL. If we had;
var customers = from c in customers where c.CustNumber > 10000 select c;
then that would mean that the entire customers table would need to come back from the database in order to flow it into the Where function which would be a bit problematic.
So...we need another way of dealing with this for something like SQL in order that the whole query can be picked up, translated into something else (SQL) and executed in some fashion.
9 Expressions
This is where it gets a bit tricky/interesting depending on your point of view. Along with executing a Lambda, we can also grab a Lambda as a piece of data to store somewhere.
That is;
Expression<Predicate<int>> ex = x => x < 10;
and we can rip apart that Expression as though it was a piece of data and it's a complicated enough (and recursive) a task that the LINQ preview has a debugger visualizer in the samples that will show you it as in;
which tells us that the compiler has taken all its knowledge of (x => x < 10) and turned it into a recursive data structure referenced by our variable ex above.
To solve "the SQL problem" of getting a whole picture of the query it'd be nice to effectively build up a list of these expressions in pieces as the various clauses of the query (select, where, group, etc.) contribute additional items into it.
This means that we need an implementation of Where, Select and so on that takes an Expression<T> rather than T where T is some delegate type. We also need to be able to differentiate that version of Where, Select from the other one that we've already built around IEnumerable.This is where IQueryable<T> comes in.
IQueryable<T> inherits from IEnumerable<T> and adds a property to store an Expression and a couple of methods called CreateQuery.
Another set of extension methods can be defined that take an IQueryable rather than an IEnumerable and they can also take expression trees as parameters rather than delegates. As examples (notice I haven't implemented them :-));
public static System.Query.IQueryable<R> Select<R,P>(
this System.Query.IQueryable<P> list,
Expression<Projection<R,P>> selector)
{
return(null);
}
public static System.Query.IQueryable<T> Where<T>(
this System.Query.IQueryable<T> list,
Expression<Predicate<T>> predicate)
{
return(null);
}
These look just like the existing the versions based upon IEnumerable<T> except that now the "list" parameter is passed as IQueryable<T> and the Lambdas are now passed as Expressions rather than purely as delegates to be invoked.
Rather than executing anything, these methods need to contribute to the Expression that's being built up. In my case I want my new Select method to call my existing Select method and I want my Where method to behave similarly. I achieve this (as far as I know) by adding MethodCallExpression instances into the Expression and then when the resultant Expression is executed the methods specified in those MethodCallExpressions will get invoked.
In terms of when the calls to these methods occur - it depends on when the query is actually executed and you'd expect this not to happen until someone ultimately enumerates the resulting IQueryable<T> (remembering that IQueryable<T> : IEnumerable<T>).
Here's how the extensions now line up - I'm not really sure quite how I find a particular overloaded generic method from a Reflection point of view and hence the rather hacky looking FindGenericMethod that's in there.
public static class MyExtensions
{
public static System.Query.IQueryable<R> Select<R,P>(
this System.Query.IQueryable<P> list,
Expression<Projection<R,P>> selector)
{
// Find the method that we want to have called.
MethodInfo genericSelect = FindGenericMethod("Select",
typeof(System.Query.IQueryable<R>), typeof(R), typeof(P));
// Turn that into a method call expression for the tree
MethodCallExpression methodCall =
Expression.Call(genericSelect, null,
new Expression[] { list.Expression, selector });
// Ask the IQueryable to turn that into a query
System.Query.IQueryable<R> queryable =
(System.Query.IQueryable<R>)list.CreateQuery<R>(methodCall);
return(queryable);
}
public static System.Query.IQueryable<T> Where<T>(
this System.Query.IQueryable<T> list,
Expression<Predicate<T>> predicate)
{
MethodInfo genericWhere = FindGenericMethod("Where",
typeof(System.Query.IQueryable<T>), typeof(T));
MethodCallExpression methodCall =
Expression.Call(genericWhere, null,
new Expression[] { list.Expression, predicate });
System.Query.IQueryable<T> queryable =
(System.Query.IQueryable<T>)list.CreateQuery<T>(methodCall);
return(queryable);
}
public static IEnumerable<R> Select<R, P>(
this IEnumerable<P> p, Projection<R,P> projection)
{
foreach (P entry in p)
{
R r = projection(entry);
yield return r;
}
}
public static IEnumerable<T> Where<T>(
this IEnumerable<T> list, Predicate<T> predicate)
{
foreach (T entry in list)
{
if (predicate(entry))
{
yield return entry;
}
}
}
private static MethodInfo FindGenericMethod(string name,
Type returnType,
params Type[] typeParams)
{
MethodInfo[] methods =
typeof(MyExtensions).GetMethods(
BindingFlags.Static | BindingFlags.Public);
MethodInfo chosenMethod = null;
foreach (MethodInfo mi in methods)
{
if (mi.Name == name)
{
MethodInfo candidate = mi.MakeGenericMethod(typeParams);
if (candidate.ReturnType == returnType)
{
chosenMethod = candidate;
break;
}
}
}
return(chosenMethod);
}
}
Now, the remaining problem is - where the heck do you get an IQueryable<T> to pass to all this stuff because arrays and Lists don't implement it.
There's a handy method System.Query.Queryable.ToQueryable which will take an IEnumerable<T> and wrap it up in an IQueryable<T> (it's an extension method but I'm fully qualifying it because I'm using my own extensions and therefore not bringing in System.Query as a namespace).
This allows us to do something like;
static void Main(string[] args)
{
int[] numbers = { 10, 20, 30, 40, 50, 60, 70, 80 };
System.Query.IQueryable<int> iq =
System.Query.Queryable.ToQueryable(numbers);
var q =
from n in iq
where n > 50
select n + 1;
}
and the interesting thing is that all that has been built up here is an expression tree. We can see this in the debugger by looking at the Expression property on the IQueryable<int> represented by q above;
(It's a bit of a whopper and it's a pretty simple query!).
Judicious use of breakpoints/tracepoints/Debug.WriteLine would show that the query doesn't get evaluated until someone does;
foreach (int i in q)
{
Console.WriteLine(i);
}
10 Implementing IQueryable<T>
There appears to be a fairly crucial class in the framework called SequenceQuery which does the donkey work for the IQueryable<T> implementation over in memory objects and I guess there's a similar class for LINQ to XML and LINQ to SQL. The variable q in the previous code would be of type SequenceQuery and in the previous calls that did;
public static System.Query.IQueryable<R> Select<R,P>(
this System.Query.IQueryable<P> list,
Expression<Projection<R,P>> selector)
{
...
// Ask the IQueryable to turn that into a query
System.Query.IQueryable<R> queryable =
(System.Query.IQueryable<R>)list.CreateQuery<R>(methodCall);
...
}
The list parameter would be a SequenceQuery.
Whilst I've stared at the implementation of SequenceQuery quite a bit, I've not really figured it out too well so that'll have to be for another post.
Posted
Wed, Jan 31 2007 12:14 AM
by
mtaulty