In talking about Parallel FX yesterday, I was using syntax such as;
Task t = new Task(() => { Console.WriteLine("Hello World"); });
and it generated some puzzled looks from the attendees.
Not surprisingly, not everyone out there is on .NET Framework V3.5 Service Pack 1 and not everyone is yet writing LINQ queries and using Lambdas and so on and so there’s still a need to try and catch up with some of what’s been going on.
This is going to be particularly true in the VB world where some of the language features are not present until Visual Studio 2010 ships.
I thought I’d try and write a “potted history” of what’s been going on in with all these lambdas and anonymous methods and so on to see if that’s of any help.
In .NET, from day 1 we had this idea of the delegate which is a type-safe function pointer. So, in C++ I used to write stuff like;
class Foo { public: int Add(int x, int y) { return(x + y); } }; int _tmain(int argc, _TCHAR* argv[]) { Foo* pFoo = new Foo(); int (Foo::*fn_ptr)(int,int) = &Foo::Add; std::cout << (pFoo->*fn_ptr)(10,20) << std::endl; }
and that was, erm, nice but it had a syntax from hell and you could generally find ways to apply casts all over the place to break type-safety using it.
Along comes .NET
.NET came along and said that we didn’t need to do that. We could use delegates as in;
public delegate void Action(); class Program { static void Main(string[] args) { Action a = new Action(Foo); a(); } static void Foo() { } }
so we define a delegate type ( here it’s called Action ) and that delegate type defines the “shape” of the methods that it can be pointed at ( in this case a method with no parameters and no return value ) and then we can safely point it at such methods like we do above.
We can also have delegates with parameters and return values of course as in;
public delegate int Action(int x, int y); class Program { static void Main(string[] args) { Action a = new Action(Add); a(10, 20); } static int Add(int x, int y) { return(x + y); } }
and, in a very nice way, we can use the same delegate type regardless of whether we’re pointing at an instance method or a static method and the delegate will do the right thing to make sure it invokes the right method on the right object as in;
public delegate int Action(int x, int y); class MyCalculator { public int Add(int x, int y) { return (x + y); } } class Program { static void Main(string[] args) { MyCalculator calc = new MyCalculator(); Action a = new Action(calc.Add); a(10, 20); } static int Add(int x, int y) { return(x + y); } }
I think that was pretty much it. We also had the idea of multi-cast delegates ( i.e. a list of function pointers that can be invoked in one go ) and so we can do this kind of thing;
public delegate int Action(int x, int y); class MyCalculator { public int Add(int x, int y) { return (x + y); } } class Program { static void Main(string[] args) { MyCalculator calc = new MyCalculator(); Action instanceMethod = new Action(calc.Add); Action staticMethod = new Action(Add); Action both = (Action)MulticastDelegate.Combine(instanceMethod, staticMethod); both(20, 30); } static int Add(int x, int y) { return(x + y); } }
Along comes .NET 2.0
.NET 2.0 came along and introduced two things that impacted delegates. One was generics and the other was anonymous methods.
In terms of generics, it meant that we can now have generic delegates. That means that we can define types such as;
public delegate void Action<T>(T t);
and we can use that one definition to point to any function with a single parameter of any type. That is;
public delegate void Action<T>(T t); class Program { static void Main(string[] args) { Action<int> a = new Action<int>(FuncInt); Action<string> b = new Action<string>(FuncString); } static void FuncInt(int i) { } static void FuncString(string s) { } }
Note that the actual Action type didn’t show up in the .NET Framework until V3.5 – that Action type above is defined by me. Similarly, we could write other generic delegates to try and cope with more parameters and also with functions that have return values such as;
public delegate void Action<T>(T t); public delegate void Action<T,U>(T t, U u); public delegate void Action<T, U, V>(T t, U u, V v); public delegate TResult Func<TResult,U>(U u); public delegate TResult Func<TResult,U,V>(U u, V v); public delegate TResult Func<TResult, U, V,W>(U u, V v, W w); class Program { static void Main(string[] args) { Func<string, int, int> func = new Func<string, int, int>(Foo); } static string Foo(int x, int y) { return ((x + y).ToString()); } }
and now we’ve got some very general purpose generic delegate types that we can pretty much point at anything.
Now, my memory’s a little rusty here but I think that the compiler also learned how to implicitly create a delegate instance for us in the sense that instead of having to write this;
public delegate void Action<T>(T t); class Program { static void Main(string[] args) { Action<int> a = new Action<int>(FuncInt); } static void FuncInt(int i) { } }
we could just write this;
public delegate void Action<T>(T t); class Program { static void Main(string[] args) { Action<int> a = FuncInt; } static void FuncInt(int i) { } }
and miss out the explicit instantiation of the delegate type Action<int>.
The other thing that the compiler learned how to do was to write anonymous methods which was a really neat trick and a big trick 🙂 Instead of having to write code such as;
public delegate void Action(); class Program { static void Main(string[] args) { Action a = SomeSeparateFunction; } static void SomeSeparateFunction() { Console.WriteLine("Hello"); } }
We could write the function inline as in;
public delegate void Action(); class Program { static void Main(string[] args) { Action a = delegate() { Console.WriteLine("Hello"); }; } }
and that’s also true of functions that take arguments and return values as in;
public delegate int Action(int x, int y); class Program { static void Main(string[] args) { Action a = delegate(int p, int q) { return (p + q); }; a(10, 20); } }
if I use anonymous methods like this then my C# V3.0 compiler adds a method to the class Program called <Main>b__0 and in that method it adds the two integer values together. So, it’s literally just doing;
public delegate int Action(int x, int y); class Program { static void Main(string[] args) { Action a = b__0; a(10, 20); } static int b__0(int x, int y) { return (x + y); } }
except that it uses a name that I’m not allowed to use as a function name in C#.
If we were using instance state then the compiler would make an instance method as in;
public delegate void Action(int x); public class Calculator { public void Accumulate(int x) { Action a = delegate(int y) { total += y; }; a(x); } int total; } class Program { static void Main(string[] args) { Calculator c = new Calculator(); c.Accumulate(10); } }
which ( with my V3 compiler ) leads to something like;
public delegate void Action(int x); public class Calculator { public void Accumulate(int x) { Action a = b__0; a(x); } void b__0(int y) { total += y; } int total; } class Program { static void Main(string[] args) { Calculator c = new Calculator(); c.Accumulate(10); } }
now, where ( for me ) this got a little funky was if we were to do something like this;
public delegate void Action(int x); class Program { static void Main(string[] args) { int x = 100; Action a = delegate(int y) { Console.WriteLine(y + x); }; a(10); } }
because the anonymous method now accesses a local variable which is not passed as a parameter into the delegate and so the compiler has to work quite a lot harder and what it does looks something like this;
public delegate void Action(int x); class Program { class c__DisplayClass1 { public void b__0(int y) { Console.WriteLine(y + x); } public int x; } static void Main(string[] args) { c__DisplayClass1 hidden = new c__DisplayClass1(); hidden.x = 100; Action a = hidden.b__0; a(10); } }
and that means that we get a “natural” sort of semantics around that local variable x such as;
public delegate void Action(int x); class Program { static void Main(string[] args) { int x = 100; Action a = delegate(int y) { Console.WriteLine(y + x); }; a(10); x++; a(10); } }
prints the value 110 and then prints the value 111 which is a little odd until you realise that it’s basically been rewritten as;
public delegate void Action(int x); class Program { class c__DisplayClass1 { public void b__0(int y) { Console.WriteLine(y + x); } public int x; } static void Main(string[] args) { c__DisplayClass1 hidden = new c__DisplayClass1(); hidden.x = 100; Action a = hidden.b__0; a(10); hidden.x++; a(10); } }
and then it becomes obvious how the stack based variable x has become captured by the class that the compiler generated.
It’s key to note that this is different from the case where the stack based variable is passed into the anonymous method as a parameter as in;
public delegate void Action(int x, int y); class Program { static void Main(string[] args) { int x = 100; Action a = delegate(int y, int z) { Console.WriteLine(y + z); }; a(x, 10); } }
because there’s no need for the compiler to try and do it’s clever capturing trick around the variable x here.
Along comes C# V3.0
When C# V3.0 came along with VS 2008 and .NET Framework V3.5 the Lambda syntax got introduced and types such as Action<T> and Func<T> entered the framework. Initially, anonymous methods and Lambdas look very, very similar.
We can write code such as;
public delegate int Action(int x, int y); class Program { static void Main(string[] args) { Action anonymous = delegate(int x, int y) { return (x + y); }; Action lambda = (int x, int y) => { return (x + y); }; } }
and it looks almost exactly the same and if we went back to the situation where we were capturing some local variable as in;
public delegate int Action(int x, int y); class Program { static void Main(string[] args) { int a = 100; Action anonymous = delegate(int x, int y) { return (a + x + y); }; Action lambda = (int x, int y) => { return (a + x + y); }; } }
then we still get the same effect as before with the generated class and so on.
One of the tricks of the Lambda is that if it’s a single line statement we can drop the braces and we can also drop the return keyword which makes the Lambda neater;
public delegate int Action(int x, int y); class Program { static void Main(string[] args) { Action anonymous = delegate(int x, int y) { return (x + y); }; Action lambda = (int x, int y) => x + y; } }
and ( most intriguingly ) we can often drop the data types of the parameters and let the compiler figure them out for itself;
public delegate int Action(int x, int y); class Program { static void Main(string[] args) { Action anonymous = delegate(int x, int y) { return (x + y); }; Action lambda = (x, y) => x + y; } }
this can be useful because the compiler can do some clever tricks with inference of types. For example, if we want to write some generic Select method which filters a list of any type like this ( .NET Framework V3.5 has lots of similar methods );
static IEnumerable<T> Select<T>(IEnumerable<T> list, Func<T, bool> predicate) { foreach (T item in list) { if (predicate(item)) { yield return (item); } } }
then we can use this in equivalent ways with both the Lambda syntax and the anonymous method syntax as in;
public delegate int Action(int x, int y); class Program { static void Main(string[] args) { IEnumerable<int> ints = System.Linq.Enumerable.Range(1, 100); foreach (int odds in Select(ints, i => (i % 2) == 1)) { Console.WriteLine(odds); } foreach (int evens in Select(ints, delegate(int i) { return((i % 2) == 0); })) { Console.WriteLine(evens); } } static IEnumerable<T> Select<T>(IEnumerable<T> list, Func<T,bool> predicate) { foreach (T item in list) { if (predicate(item)) { yield return (item); } } } }
and it’s starting to become clear that the Lambda syntax is a lot neater than the anonymous method syntax but there’s 2 more important things.
The first is that the Lambda syntax can cope without explicit reference to the types being passed. This can be important because C# V3.0 introduces anonymous types which don’t have any type names so we can use Lambdas like this;
class Program { static void Main(string[] args) { var list = new[] { new { Firstname = "Fred", Lastname = "Smith", Age = 60 }, new { Firstname = "Bob", Lastname = "Jones", Age = 30 } }; foreach (var odds in Select(list, i => i.Age > 40)) { Console.WriteLine(odds); } } static IEnumerable<T> Select<T>(IEnumerable<T> list, Func<T,bool> predicate) { foreach (T item in list) { if (predicate(item)) { yield return (item); } } } } whereas there’s no equivalent way to achieve this with the anonymous method syntax.
The other thing that is important is that Lambdas have this magical ability to be converted into expression trees by the compiler in that we can write something like;
Expression<Func<int, int>> exp = x => x + 1;
and work with the Lambda as a piece of data rather than as an executable function per se and that’s the sort of thing that powers IQueryable in LINQ and allows translated for frameworks like LINQ to SQL and LINQ to Entities.
We can’t do this with the anonymous method syntax ( although it seems technically reasonable, as far as I know it just isn’t there ) so we can’t do;
Expression<Func<int,int>> exp = delegate (int x) { return(x + 1); };
Bringing It All Back Home
One of the “problems” with Microsoft is that they go and build something ( e.g. Lambdas and LINQ and so on ) and then they go build a bunch of stuff on top of it before you’ve had a chance to catch your breath.
We’re seeing that with the arrival of Parallel Extensions. We have three constituent parts to it;
- Co-ordination Data Structures.
- Task Parallel Library
- Parallel LINQ
Now, the Task Parallel Library is using a lot of delegate types which means that when people are demonstrating it they’re possibly going to end up showing Lambdas because they’re the neatest syntax for doing something like this;
Task t = new Task(() => { Console.WriteLine("Hello"); });
and so getting some familiarity with that syntax will be a useful thing so that when you encounter code like this;
static void Main(string[] args) { int y = 100; Task t = new Task((x) => { Console.WriteLine(x); }, y); y = 200; t.Start(); t.Wait(); }
it’s “easy” to see that it’s different from code like this;
static void Main(string[] args) { int y = 100; Task t = new Task(() => { Console.WriteLine(y); }); y = 200; t.Start(); t.Wait(); }
Hopefully I’ve not messed up any of the examples above thereby adding to the confusion – let me know if I did and I’ll fix it.