From the Windows 8 Camps: GridViews/ListViews and Large Data Sets

One of the questions that got asked at a recent Windows 8 developer camp was around how a control like a XAML GridView/ListView can deal with a large amount of data or, even, an “infinite” amount of data.

I thought that I’d write that up here with the essence of it being that the controls that present items do have support for large collections.

This process is usually called “virtualisation” and the essence of it for me is something like;

  1. You have a large number of items to display.
  2. You have limited screen space which can display only a certain number of items at a time. Of course, that relies on the control;
    1. Being able to realise that it has limited screen space – in XAML terminology this means not putting it into a container which offers it ‘infinite’ space in the direction of scrolling.
    2. Being able to determine the size of the visual items to be displayed so that it can figure out what items to ask for.
  3. You don’t want to load all of the items up-front as it would potentially take a long time and waste a tonne of whatever resources you need to create and store an item (CPU, bandwidth, memory).
  4. You need to be able to load items asynchronously because a lot of WinRT APIs (e.g. storage, network) force you to work asynchronously and, fact is, in many cases loading the items will involve crossing a network.
  5. In a lot of cases, you want to load a set of items at a time (in a paged way) rather than loading them individually.
  6. In an ideal world, if you have loaded many 1000s of items but are only displaying 10 then the UI controls held in memory would only relate to the 10 on display rather than the 1000s you have loaded.
  7. In an ideal world, using XAML’s templated controls, if an item N is scrolled out of view to be replaced by an item ( N + DISPLAY_SIZE ) then the control could avoid re-creating all the controls involved in that new item’s visual display template and simply “reset and reuse” the one that was being used for item N to show the new data from the new item.

Having written all of that, I then found an article on MSDN which drills into it and gives these areas their proper terms – data virtualisation and UI virtualisation and it’s worth a read;

Using Virtualisation with a List or Grid

The other thing to say is that if you are using data from the file system then there is special help in the framework. Specifically, take a look at FileInformationFactory because it already has methods GetVirtualizedFilesVector() and GetVirtualizedFoldersVector() which return virtualised collections of data that serve these purposes. In the JavaScript world this lines up with the StorageDataSource whereas in the XAML world (C++/.NET) you simply take the returned value and set it as an ItemsSource for your control.

The WinJS JavaScript control also supports working with large data sets but I’ll focus on the XAML control here and might come back and make a post on the WinJS control.

Playing with On-Demand, Random Access Loading of Items

Let’s say I’ve got a GridView control and I want it to display a large number of items, perhaps 10,000 which (for me) seems to be a larger number than most users are ever going to bother scrolling through.

Here’s my GridView;

    <GridView
      x:Name="myGridView"
      ItemTemplate="{StaticResource Standard250x250ItemTemplate}">

    </GridView>

and I can quickly give it a set of 10,000 items to display making use of the fact that in a Windows 8 app the XAML framework (hooray) supports binding to anonymous types and that the Standard250x250ItemTemplate has a TextBlock in it that binds to a property called Title.

As an aside, binding to anonymous types is nice but I’m not 100% sure what you then do with those objects when you get them back out of the control because you know very little about their shape unless you want to try and go all dynamic on them.

I set up the source;

this.myGridView.ItemsSource = new MySource();

with my newly minted implementation of IList;

public class MySource : IList
{
  public int Count
  {
    get { return (10000); }
  }
  public object this[int index]
  {
    get
    {
      return (
        new
        {
          Title = string.Format("Item {0}", index)
        }
      );
    }
    set
    {
      throw new NotImplementedException();
    }
  }

  // Dear caller - don't call any method beyond here.

  public int Add(object value)
  {
    throw new NotImplementedException();
  }

  public void Clear()
  {
    throw new NotImplementedException();
  }

  public bool Contains(object value)
  {
    throw new NotImplementedException();
  }

  public int IndexOf(object value)
  {
    throw new NotImplementedException();
  }

  public void Insert(int index, object value)
  {
    throw new NotImplementedException();
  }

  public bool IsFixedSize
  {
    get { throw new NotImplementedException(); }
  }

  public bool IsReadOnly
  {
    get { throw new NotImplementedException(); }
  }

  public void Remove(object value)
  {
    throw new NotImplementedException();
  }

  public void RemoveAt(int index)
  {
    throw new NotImplementedException();
  }



  public void CopyTo(Array array, int index)
  {
    throw new NotImplementedException();
  }

  public bool IsSynchronized
  {
    get { throw new NotImplementedException(); }
  }

  public object SyncRoot
  {
    get { throw new NotImplementedException(); }
  }

  public IEnumerator GetEnumerator()
  {
    throw new NotImplementedException();
  }
}

and my GridView displays 10,000 items in the now-familiar ‘fast and fluid’ manner;

image

And, if the control does the right thing, you’d imagine that it’s only holding on to a certain set of the items that I gave it rather than the whole 10,000 from the list.

If I add a little simple diagnostic tracing then I see the control asking initially for around 60 items (on my screen, at my 1920×1080 resolution) and then as I scroll forward I see it asking for more items (doing some read-ahead) and if I scroll backwards I see it asking for the earlier items again.

There are some interesting settings on the control like DataFetchSize, IncrementalLoadingThreshold and IncrementalLoadingTrigger but I don’t believe that they are related to this behaviour (more on those later) and changing my DataFetchSize to 120 made no difference to the behaviour I see here.

In the asynchronous world of Windows 8 apps, it’s unlikely that you’ll be able to create your items synchronously on demand (especially if those items come from a web service of some sort) and so it’s necessary to be able to create items asynchronously and have them show up later on. I can simulate that with (e.g.) a 50ms delay in producing my items;

public class MySource : IList, INotifyCollectionChanged
{
  public event NotifyCollectionChangedEventHandler CollectionChanged;
  Dictionary<int, object> _storage;

  public MySource()
  {
    this._storage = new Dictionary<int, object>();
  }
  public int Count
  {
    get { return (10000); }
  }
  public object this[int index]
  {
    get
    {
      object returnedValue = null;

      if (!this._storage.TryGetValue(index, out returnedValue))
      {
        LoadItemAsync(index);
      }
      return (returnedValue);
    }
    set
    {
      throw new NotImplementedException();
    }
  }
  async void LoadItemAsync(int index)
  {
    // Simulate something that takes a little while.
    await Task.Delay(50);

    // Now tell the world that our item is ready to go.
    var item = new
    {
      Title = string.Format("Item {0}", index)
    };
    this._storage[index] = item;

    RaiseItemReplaced(index, item);
  }
  void RaiseItemReplaced<T>(int index, T item)
  {
    var handlers = this.CollectionChanged;

    if (handlers != null)
    {
      var args = new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Replace,
        item, null, index);

      handlers(this, args);
    }
  }

  // Dear caller - don't call any method beyond here.

Now, that’s a lot of change notifications but it’s because I’m loading up my items one by one which is perhaps a bit unrealistic and in ‘the real world’ I’d perhaps be more likely to load them in pages of (say) 10 or 50 or similar thereby cutting down the number of change notifications that I was firing.

I could try and simulate that by changing my dictionary such that if it contains a key for value 0 then it means that items 0-19 are available (i.e. a page or chunk of data as I’ve called it in the code below);

public class MySource : IList, INotifyCollectionChanged
{
  public event NotifyCollectionChangedEventHandler CollectionChanged;
  Dictionary<int, object> _storage;
  List<int> _loadingChunks;
  readonly int CHUNK_SIZE = 20;

  public MySource()
  {
    this._storage = new Dictionary<int, object>();
    this._loadingChunks = new List<int>();
  }
  public int Count
  {
    get { return (10000); }
  }
  public object this[int index]
  {
    get
    {
      object returnedValue = null;

      if (!this._storage.TryGetValue(index, out returnedValue))
      {
        int chunkIndex = index / CHUNK_SIZE;

        LoadItemsChunkAsync(chunkIndex * CHUNK_SIZE);
      }
      return (returnedValue);
    }
    set
    {
      throw new NotImplementedException();
    }
  }
  async void LoadItemsChunkAsync(int startIndex)
  {
    if (!this._loadingChunks.Contains(startIndex))
    {
      this._loadingChunks.Add(startIndex);

      // Simulate something that takes a little while.
      await Task.Delay(50);

      // Now generate CHUNK_SIZE of items
      var items = new List<object>();

      for (int i = startIndex; i < startIndex + CHUNK_SIZE; i++)
      {
        var item = new 
        {
          Title = string.Format("Item {0}", i)
        };
        items.Add(item);
        this._storage[i] = item;
      }
      RaiseItemsReplaced(startIndex, items);

      this._loadingChunks.Remove(startIndex);
    }
  }
  void RaiseItemsReplaced(int startIndex, IList<object> items)
  {
    var handlers = this.CollectionChanged;

    if (handlers != null)
    {
      // NB: still sending too many change notifications but didn't get quite the right
      // combination of arguments into the constructor to get it to do what I want
      // it to do.
      for (int i = 0; i < items.Count; i++)
      {
        var args = new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Replace,
          items[i], null, startIndex + i);

        handlers(this, args);
      }
    }
  }

There’s a couple of ‘problems’ here. One is that if the user is scrolling really quickly then there’s a chance that I’m still loading items 0-20 when they are already demanding items 100-120 and I may as well forget about loading items 21-100 at that point or, at least, load them in a different order.

The other is that my underlying storage (Dictionary<int,object>) constantly grows and I never clear anything out of it. So, if I have loaded items 0-20 then they are in memory for the duration.

Another is that I don’t seem to be able to cast the right spell over NotifyCollectionChangedEventArgs to be able to fire a single event into the GridView to tell it that multiple items have changed in one go. That’s probably just me not getting it right though so I’ll put that one to one side for the moment.

To solve the first problem, rather than having multiple asynchronous requests for data in-flight at the same time it’s perhaps better to queue them as they come in from the control and process them in LIFO order (to reflect where the user currently “is” on the control);

public class MySource : IList, INotifyCollectionChanged
{
  public event NotifyCollectionChangedEventHandler CollectionChanged;
  Dictionary<int, object> _storage;
  Stack<int> _loadingChunks;
  bool _loading;
  readonly int CHUNK_SIZE = 20;

  public MySource()
  {
    this._storage = new Dictionary<int, object>();
    this._loadingChunks = new Stack<int>();
  }
  public int Count
  {
    get { return (10000); }
  }
  public object this[int index]
  {
    get
    {
      object returnedValue = null;

      if (!this._storage.TryGetValue(index, out returnedValue))
      {
        int chunkIndex = index / CHUNK_SIZE;

        LoadItemsChunkAsync(chunkIndex * CHUNK_SIZE);
      }
      return (returnedValue);
    }
    set
    {
      throw new NotImplementedException();
    }
  }
  async void LoadItemsChunkAsync(int startIndex)
  {
    if (!this._loadingChunks.Contains(startIndex))
    {
      this._loadingChunks.Push(startIndex);
    }
    if (!this._loading)
    {
      this._loading = true;

      while (this._loadingChunks.Count > 0)
      {        
        var requestIndex = this._loadingChunks.Pop();

        // Simulate something that takes a little while.
        await Task.Delay(50);

        // Now generate CHUNK_SIZE of items
        var items = new List<object>();

        for (int i = requestIndex; i < requestIndex + CHUNK_SIZE; i++)
        {
          var item = new
          {
            Title = string.Format("Item {0}", i)
          };
          items.Add(item);
          this._storage[i] = item;
        }
        RaiseItemsReplaced(requestIndex, items);
      }
      this._loading = false;
    }
  }
  void RaiseItemsReplaced(int startIndex, IList<object> items)
  {
    var handlers = this.CollectionChanged;

    if (handlers != null)
    {
      // NB: still sending too many change notifications but didn't get quite the right
      // combination of arguments into the constructor to get it to do what I want
      // it to do.
      for (int i = 0; i < items.Count; i++)
      {
        var args = new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Replace,
          items[i], null, startIndex + i);

        handlers(this, args);
      }
    }
  }

  // Dear caller - don't call any method beyond here.

In writing that I must say that I got to about the edge of what I was prepared to reason about in terms of async/await and especially trying to think about re-entrancy and whether variables were shared across threads ( I think not but am prone to get things wrong ).

I found myself thinking it would be ‘easier’ (or clearer) to just spin up a separate task to do the work of processing the work that is present in my Stack<int>.

The peculiar thing about that implementation is that if the user doesn’t scroll to an exact CHUNK_SIZE boundary (and it’s unlikely that they will) then it’s more than likely that the items will appear in an unnatural order. That is – if they scroll to item 1030 then items 1020-1040 can appear before items 1000-1019 which feels odd as a user so that’s not perfect.

Another thing that it’s doing is that if the user scrolls straight to item 1000 then the code will go back and load all items from 999 all the way back to 0 because it has queued up those requests.

I could avoid that work and leave it until the user scrolls back to those items by ignoring load requests that are ‘some distance’ from the last requested item. I define that distance as 100 items in the following code;

public class MySource : IList, INotifyCollectionChanged
{
  public event NotifyCollectionChangedEventHandler CollectionChanged;
  Dictionary<int, object> _storage;
  Stack<int> _loadingChunks;
  int _lastRequest;
  bool _loading;
  static readonly int CHUNK_SIZE = 20;
  static readonly int DISCARD_SIZE = 5 * CHUNK_SIZE;

  public MySource()
  {
    this._storage = new Dictionary<int, object>();
    this._loadingChunks = new Stack<int>();
  }
  public int Count
  {
    get { return (10000); }
  }
  public object this[int index]
  {
    get
    {
      object returnedValue = null;

      this._lastRequest = index;

      if (!this._storage.TryGetValue(index, out returnedValue))
      {
        int chunkIndex = index / CHUNK_SIZE;

        LoadItemsChunkAsync(chunkIndex * CHUNK_SIZE);
      }
      return (returnedValue);
    }
    set
    {
      throw new NotImplementedException();
    }
  }
  async void LoadItemsChunkAsync(int startIndex)
  {
    if (!this._loadingChunks.Contains(startIndex))
    {
      this._loadingChunks.Push(startIndex);
    }
    if (!this._loading)
    {
      this._loading = true;

      while (this._loadingChunks.Count > 0)
      {        
        var requestIndex = this._loadingChunks.Pop();

        if (ItemsWithinRequestRange(requestIndex))
        {
          // Simulate something that takes a little while.
          await Task.Delay(50);

          if (ItemsWithinRequestRange(requestIndex))
          {
            // Now generate CHUNK_SIZE of items
            var items = new List<object>();

            for (int i = requestIndex; i < requestIndex + CHUNK_SIZE; i++)
            {
              var item = new
              {
                Title = string.Format("Item {0}", i)
              };
              items.Add(item);
              this._storage[i] = item;
            }
            RaiseItemsReplaced(requestIndex, items);
          }
        }
      }
      this._loading = false;
    }
  }
  bool ItemsWithinRequestRange(int index)
  {
    return (Math.Abs(index - this._lastRequest) < DISCARD_SIZE);
  }
  void RaiseItemsReplaced(int startIndex, IList<object> items)
  {
    var handlers = this.CollectionChanged;

    if (handlers != null)
    {
      // NB: still sending too many change notifications but didn't get quite the right
      // combination of arguments into the constructor to get it to do what I want
      // it to do.
      for (int i = 0; i < items.Count; i++)
      {
        var args = new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Replace,
          items[i], null, startIndex + i);

        handlers(this, args);
      }
    }
  }

  // Dear caller - don't call any method beyond here.

Even in that preceding code, I’m still keeping all loaded items around in my _storage for ever so it might be sensible to get rid of some set of them as/when I realise that the user is a long distance from them – e.g. again perhaps 100 items away and this happens in the TrimList function below;

public class MySource : IList, INotifyCollectionChanged
{
  public event NotifyCollectionChangedEventHandler CollectionChanged;
  Dictionary<int, object> _storage;
  Stack<int> _loadingChunks;
  int _lastRequest;
  bool _loading;
  static readonly int CHUNK_SIZE = 20;
  static readonly int DISCARD_SIZE = 5 * CHUNK_SIZE;

  public MySource()
  {
    this._storage = new Dictionary<int, object>();
    this._loadingChunks = new Stack<int>();
  }
  public int Count
  {
    get { return (10000); }
  }
  public object this[int index]
  {
    get
    {
      object returnedValue = null;

      this._lastRequest = index;

      if (!this._storage.TryGetValue(index, out returnedValue))
      {
        int chunkIndex = index / CHUNK_SIZE;

        LoadItemsChunkAsync(chunkIndex * CHUNK_SIZE);
      }
      return (returnedValue);
    }
    set
    {
      throw new NotImplementedException();
    }
  }
  async void LoadItemsChunkAsync(int startIndex)
  {
    if (!this._loadingChunks.Contains(startIndex))
    {
      this._loadingChunks.Push(startIndex);
    }
    if (!this._loading)
    {
      this._loading = true;

      while (this._loadingChunks.Count > 0)
      {        
        var requestIndex = this._loadingChunks.Pop();

        if (ItemsWithinRequestRange(requestIndex))
        {
          // Simulate something that takes a little while.
          await Task.Delay(50);

          if (ItemsWithinRequestRange(requestIndex))
          {
            // Now generate CHUNK_SIZE of items
            var items = new List<object>();

            for (int i = requestIndex; i < requestIndex + CHUNK_SIZE; i++)
            {
              var item = new
              {
                Title = string.Format("Item {0}", i)
              };
              items.Add(item);
              this._storage[i] = item;
            }
            RaiseItemsReplaced(requestIndex, items);
          }
        }
      }
      this._loading = false;

      TrimList();
    }
  }
  bool ItemsWithinRequestRange(int index)
  {
    return (Math.Abs(index - this._lastRequest) < DISCARD_SIZE);
  }
  void TrimList()
  {
    var keys = this._storage.Keys.Where(k => !ItemsWithinRequestRange(k)).ToList();

    foreach (var key in keys)
    {
      this._storage.Remove(key);
    }
  }
  void RaiseItemsReplaced(int startIndex, IList<object> items)
  {
    var handlers = this.CollectionChanged;

    if (handlers != null)
    {
      // NB: still sending too many change notifications but didn't get quite the right
      // combination of arguments into the constructor to get it to do what I want
      // it to do.
      for (int i = 0; i < items.Count; i++)
      {
        var args = new NotifyCollectionChangedEventArgs(NotifyCollectionChangedAction.Replace,
          items[i], null, startIndex + i);

        handlers(this, args);
      }
    }
  }

  // Dear caller - don't call any method beyond here.

Again, this gives some slightly odd loading behaviour from the user’s perspective but I think it demonstrates that I can work on this to my heart’s content and come up with more varied schemes as way of loading items on demand as the control requests them and then making a decision as to how many of those I actually store in memory because the control isn’t demanding that I keep the entire collection around.

Time for a different approach…

Playing with a Sequentially Growing Collection

All along, my XAML hasn’t really changed in that I’ve still got my GridView;

   <GridView
      x:Name="myGridView"
      ItemTemplate="{StaticResource Standard250x250ItemTemplate}">

    </GridView>

I can rework the data source that I’m giving it such that it implements the ISupportIncrementalLoading interface. This is a fairly simple interface with just a couple of methods on it;

  • HasMoreItems
  • LoadMoreItemsAsync

Like a lot of interfaces, it says a lot but it can’t say everything. The implication to me is that this is a way of loading items on demand as the user goes from item 0 to item N because the notion of “HasMoreItems” isn’t expressed relative to a current item. It’s simply asking “have we loaded the entire set yet?”.

Also, If you take a look at the LoadMoreItemsResult that comes back asynchronously you’ll spot that it's job is to return a simple Count of items that were actually loaded but it doesn’t have any way of saying “here are the new items” so that has to be communicated back to the control via some other means – i.e. the data source being observable and firing change notifications.

If I rework my MySource class as below;

class MySource : ObservableCollection<object>, ISupportIncrementalLoading
{
  public bool HasMoreItems
  {
    get
    {
      return (this.Count < 10000);
    }
  }
  public IAsyncOperation<LoadMoreItemsResult> LoadMoreItemsAsync(uint count)
  {
    // Simulate a delay
    var delay = Task.Delay(50);

    var load = delay.ContinueWith(
      t =>
      {
        var startSize = this.Count;

        count = (uint)Math.Min(count, (10000 - startSize));

        // We 'fake' up as many items as we were asked for but we don't have to.
        // We could return a different number.
        for (int i = 0; i < count; i++)
        {
          this.Add(new
          {
            Image = "http://www.microsoft.com/global/surface/en/us/publishingimages/new/hero.jpg",
            Title = (startSize + i).ToString()
          });
        }
        return (new LoadMoreItemsResult() { Count = count });
      },
      TaskScheduler.FromCurrentSynchronizationContext()
    );
    return (load.AsAsyncOperation<LoadMoreItemsResult>());
  }
}

then with that in place, I see this pattern of calls as I run the code and scroll the GridView;

  1. I see a call to ObservableCollection<object>.Count which returns 0.
  2. I see a call to HasMoreItems where I return true.
  3. I see a call to LoadMoreItemsAsync() asking for 1 item. Presumably this 1st item is for the control to take a look at it.
  4. I see a call to HasMoreItems where I return true.
  5. I see a call to LoadMoreItemsAsync() asking for 120 items.

and that pattern of (4-5) repeats as I scroll the control up to item 10,000 and the control displays items so is, presumably, reaching into the ObservableCollection<object> indexer to grab those items as it needs them once it gets the change notifications that those items have arrived.

image

Where does the 120 come from? It’s based on the DataFetchSize of the control. If I change my control’s definition to say;

 <GridView
      x:Name="myGridView"
      ItemTemplate="{StaticResource Standard250x250ItemTemplate}"
      DataFetchSize="1">

then I find that the number changes from 120 to 40 and so the initial setting looks to be loading 3 pages worth of data. If I change my screen resolution down a little (to 1366×768) then I see the number come down to 18.

If I the change that DataFetchSize to be 5 (and reset my screen resolution) then I see the control requesting 200 items (i.e. 5 pages at size of 40).

What wasn’t 100% clear to me though was when the control requests those 200 items – how is it deciding when to make that call?

At the start point, my screen displays items 0 to 31. As I scroll the GridView I find that it displays up to around item 160 before it calls into my data source again asking for another 200 items.

This relates to another property on the control, the IncrementalLoadingThreshold.

If I change my IncrementalLoadingThreshold as below;

<GridView
      x:Name="myGridView"
      ItemTemplate="{StaticResource Standard250x250ItemTemplate}"
      DataFetchSize="5"
      IncrementalLoadingThreshold="2">

    </GridView>

then I see that when I scroll to around item 80 (2 * page size of 40) then the control calls my code again asking for more data so, again, this value is based on the control’s view of the page size.

There’s another setting there in that the IncrementalLoadingTrigger can have value of Edge or None which, for now, seem to be equal to On or Off.

Obviously, there’s a trade-off between how much data to load and how far ahead of the user’s need for it you actually go and load it and it relates to how the application’s likely to be used and how far and quickly or slowly the user’s going to scroll.

This scheme though allows me to initially load up the minimal of data and then asynchronously load more data on demand as the user requires it.

Mixing the Two?

I haven’t experimented with this too much yet but even if I use incremental loading to bring items into my collections in the first place, it doesn’t seem unreasonable to combine that with the scheme of potentially re-setting their values to NULL at a later point to get rid of them if they aren’t currently on the screen and then re-loading them as/when the control asks for them.

An implementation of IList should be able to do that as I experimented with in the first half of the post – it’s a shame that ObservableCollection<T> doesn’t allow you to override the indexer making it easier to implement such a list but it seems like something you could definitely undertake if you had a specific optimisation to make.