Mike Taulty's Blog
Bits and Bytes from Microsoft UK
Deconstructing LINQ to SQL (Part 2)

Blogs

Mike Taulty's Blog

Elsewhere

Following up on this previous post, I wanted to play a bit more with how this works for LINQ to SQL.

If I've got classes such as these (probably from the code generation tools but I typed this one in myself);

 [Table(Name="Customers")]
 class Customer
 {
  [Column(IsPrimaryKey=true)]
  public string CustomerID;
  [Column]
  public string Country;
 }

 and;

 class NorthwindContext : DataContext
 {
  public NorthwindContext(string conStr) : base(conStr)
  {
  }
  public Table<Customer> Customers;
 }

then I can go and write a query such as;

 

   NorthwindContext ctx = new NorthwindContext("server=.");

   var query = from c in ctx.Customers
               where c.Country == "UK"
               select c;

   foreach (Customer c in query)
   {
    Console.WriteLine(c.CustomerID );
   }

<updateToPost correctness="NotSure" status="Checking" date="19th March">

The really clever thing to me is that I can also do something like this;

  NorthwindContext ctx = new NorthwindContext("server=.");

   IQueryable<Customer> queryable = ctx.Customers;

   var query = queryable.Where(c => c.Country == "UK");

   foreach (Customer c in query)
   {
    Console.WriteLine(c.CustomerID);
   }

   queryable = new List<Customer>()
               {
                new Customer() { CustomerID="Foo", Country="France" },
                new Customer() { CustomerID="Bar", Country="UK" }
               }.AsQueryable();

   foreach (Customer c in query)
   {
    Console.WriteLine(c.CustomerID);
   }

Notice what happens in this example. We define a query (called query) once. The query is captured.

We then enumerate that query twice. The first time we enumerate it, it will cause T-SQL to be sent to a SQL Server. The second time we enumerate it, it will just call the methods on the Enumerable class in order to filter one list into another list.

This was really the point that I was trying to make in the previous post. The fact that the query can be rewritten at the point where we come to enumerate it is really "wacky/clever" because it means that the same query can be used in different "contexts". Here, we re-use the same query against Table<T> and also against IEnumerable<Customer> and it works "the way we expect" but under the covers that's really smart.

The way in which that happens should be at least semi-apparent from the previous post and, hopefully, it illustrates how smart this stuff can be because;

  1. The query does not execute at the point where we define it.
  2. The query does not even "know" how it will execute at the point where we define it. As we can see here, the same query might ultimately cause T-SQL code to be generated when we enumerate it or it might cause in-memory filtering operations to execute against a list when we enumerate it.

</updateToPost>

How does this specifically happen in the case of LINQ to SQL? I had a quick flick through in Reflector but it's pretty involved.

We're calling Where() here on Table<T> and Table<T> is IQueryable<T> so the extension method in Queryable<Customer>.Where() kicks in . This method calls back to CreateQuery on Table<T> itself.

Table<T>.CreateQuery() creates an instance of a class called DataQuery<T> which is IQueryable<T> and passes the query expression to it along with the DataContext.

As far as I can see, when we enumerate the query, we end up calling DataQuery.Execute which calls into context.Provider.ExecuteQuery(). The Provider property is of type SqlProvider.

So, logically we should be able to continue our example above by doing something like;

 

   SqlProvider provider = new SqlProvider();
   provider.ExecuteQuery(query.Expression);

but we can't actually do that because SqlProvider needs an internal class that implements IDataServices for it and I don't think we're actually allowed to create one of those so the framework can do this but we can't. Fair enough.

Continuing on with the story, what does SqlProvider.ExecuteQuery do ?

It seems to first check to see if it's got a cached copy of that query and then it uses a method called BuildQuery to build it if it doesn't already have it. Once it's "built", another ExecuteQuery method goes ahead and runs the built query through the regular ADO.NET infrastructure of SqlCommand, SqlConnection and so-on and so we do see this stuff executing its work through ADO.NET.

In terms of BuildQuery, it looks like...

internal QueryInfo BuildQuery(Expression query, SqlNodeAnnotations annotations)
{
    this.CheckDispose();
    query = Funcletizer.Funcletize(query);
    QueryConverter converter = new QueryConverter(this.services, this.typeProvider, this.translator, this.sqlFactory);
    converter.ConverterStrategy = (this.Mode == ProviderMode.Sql2005) ? ConverterStrategy.SkipWithRowNumber : ConverterStrategy.Default;
    SqlNode node = converter.ConvertOuter(query);
    return this.BuildQuery(this.GetResultType(query), node, annotations);
}

You have to love the naming here :-) Note that the provider looks to have a SQL 2000 and a SQL 2005 mode on it which makes sense but I hadn't thought about it before even though I had thought that functions such as Take might use something 2000/2005 specific.

From there on in, it gets "really hard, really fast" or at least it did to me. From BuildQuery onwards there are a lot of classes (SqlSelect, SqlBinder, SqlRetyper, SqlMultiplexer, SqlFormatter and a whole lot more) that look to be involved in taking what we have as a tree and turning it into T-SQL.

I've not followed this through completely but, from what I can see, we end up in a mapping process from the original tree to a "more SQL specific tree" and I guess this is the point where our MethodCallExpressions disappear and get turned into "SQL nodes". We end up using a class called SqlFormatter with a method Format which uses an internal class called Visitor to wander around that "SQL node" tree and it has a tonne of methods such as VisitSelect which goes ahead and turns a "Select" node into the actual SELECT, FROM, WHERE and so on that it needs. There are similar methods for VisitDelete and so on and so forth.

So...I feel a bit happier that I can answer Daniel's original questions in that I do now know where the T-SQL comes from and I do know that the infrastructure uses SqlCommand and so on but I still wouldn't like to say that I have this "nailed" because I think it's pretty hard to walk through without actually single-stepping the live source code and I don't have that (and, at some point, it's sensible to give up and accept that "it works" :-)).


Posted Fri, Mar 16 2007 3:01 AM by mtaulty

Comments

Craig Nicholson wrote Deconstructing LINQ to SQL
on Mon, Mar 19 2007 2:54 PM
I happened to stumble upon the blog of Mike Taulty and the following three very interesting articles
Microsoft » Blog Archives » T-SQL 2005 wrote Microsoft &raquo; Blog Archives &raquo; T-SQL 2005
on Wed, Aug 1 2007 4:48 PM