Indexing for objects in your code – Not only a data storage thing


As you might already know, reader, if you hit this page, is that Indexing is and will always be:

  • Indexing comes from a simple need, for dev related things or not.
  • It helps getting to the data you want FASTER.

“I want to find all people with their first name starting with S”, I can sort that out with that folder sorting.

When it comes to a data storage things are the same:

  • File systems indexing the position of files on the actual hard drive and flash drive,
  • databases can allows you to create multiple indexes on certain columns, to make sure you can sort it faster.

Why making the point for code then? Well, it comes down to the exact same problem: performance.

Most of the times, people using LINQ, to actually parse data and get subsets of it for processing, in a loop for example:

foreach (var countryCode in _countryCodes)
    var countriesPerCode = _entitiesLists.Where(e => e.Country == countryCode).ToList();
    var count = countriesPerCode.Count;
    // Code supposingly doing something with entities.

It can be fine if not too much of these loops are run, but why is it so different when it runs on let’s say a million rows?

Same problem occurs with a database: if the sql enginge running the query doesn’t know anything about how to get rows that match your WHERE clause, it will have to run on all rows.

In the case of our loop above, LINQ does the same: how LINQ would know about which items match you lambda function until it tried it on all items in your list?

To solve that issue we are going to use an uncommon LINQ object: Lookup.

The goal is simple: we are going to use it to build out an index out of our data to group it by a given key. Running this only once on our dataset fixes our problem, in the sense that getting back data subset for each loop iteration will be instant with our Lookup.

Here are the performance difference you can get from our test app (output from our console app):

Data: building
Data: done
Test1: start
Test1: processed in 535 milliseconds
Test2: start
Test2: lookup done in 140 milliseconds
Test2: processed in 141 seconds

To summarize our little article (TL,DR):

  • The cost of initializing the dictionary first takes some time, but doing it once offers around a x4 performance gain
  • Make use of indexing capabilities when you dataset starts to be above a certain number of items (that is to say most of the times!)
  • This is a really simple sample that does not reflect other things that could happen around your code, as I have seen performance getting from 45 minutes to 5 (a x9 increase then)
  • Repro code is to be found in Github here.


Multi tenancy with Azure – Guide

Dealing with client data is quite important, especially when GDPR is coming 🙂

Still, building apps for multiple clients is and has always been a complicated task for multiple valid reasons:

  • Monetization: consumer vs company data segmentation
  • Resources Cost: data split strategy vs costs
  • Non-technical: legal or data protection for country or unions (e.g. European Union)

Lets go through options we can have that actually works with pros and cons, for each of those concerns.

Continue reading Multi tenancy with Azure – Guide

Entity Framework – Code First Migration – Solving merge errors

Using Entity Framework with git and having multiple branches updating a model can be quite challenging.

Here is a concrete example:

  • develop branch is on code first migration v1
  • feature X updates model to add fields to an entity, adding migration v2
  • feature Y also updates model  to add fields on an entity, adding migration v3
  • feature X is merged into develop
  • feature Y is merged, coming on top of previous one


First, both feature migrations are independent and both built on top of v1:

EF will be able to process them, but what will happen is this:

Changes noted here as incoming will bring back migration v2 changes when running “Add-Migration” again, even if they are present.

It basically means EF does not see v2 because v3 was generated with v2 not being present in the branch at the time of adding a migration.

This has to do with data found in migration RESX file, which contains Base 64 binary encoded info to keep track of link between each.

Bottom line: to avoid having this to happen:

  1. Rebuild migration v3 on top of v2 in develop branch by deleting v3 after getting v2 from develop
  2. Build a model update feature that everyone merge into their branch one updated with their model change
  3. merge feature X into feature Y only for db change, then add your fields in feature Y so that new fields rely on migration v2

Happy merging!


Connecting to CosmosDB with Microsoft Azure Storage Explorer now

You’ve probably noticed the CosmosDB announcement a couple of weeks ago, and this a great step to get secondary index for Table Storage-like data you are using today ins Azure.

I am relying quite often on Microsoft Azure Storage Explorer to access my tables date, but the Cosmos DB part is done done yet, so howto you do that?


Because Cosmos DB has now a Table API that behaves exactly the same as a Storage Table, just:

  • open your “Local and Attached” top root navigation node in Explorer
  • right-click “Storage Accounts”, add select “Connect to Azure Storage”
  • Select “Use a connection string or a shared access signature URI” and follow the rest of the process to add your Cosmos DB table and use it as a Storage table!

This is a work-around to play with your CosmosDB data in a simple way, without having to wait.

Still, CosmosDB does not work the same way as the traditional Table Storage, especially on import/export on large volume of data: where Table storage is limiting the query performance, CosmosDB just cut connection straight away.

Happy indexing!

VSTS – build & publish .NET your Console App as a Webjob

Building App Services quite often, I came across the need to deploy Webjobs as part of CI/CD capabilities of VSTS.

What is so special about WebJobs?

  • They run in an App Service
  • They provide a simple way to build jobs using a variety of different languages (F#, cmd, exe, bash, nodejs, php…)
  • They can be found using Kudu:
    • under “d:\home\site\wwwroot\app_data\jobs\triggered\{job name}” for triggered jobs
    • under “d:\home\site\wwwroot\app_data\jobs\continuous\{job name}” for always running jobs

You can go here to read more details about Webjobs and how they work in Kudu.

VSTS configuration to build a Webjob

As part of our CI/CD strategy, we would like to deploy those jobs from VSTS.

VSTS does not have yet a predefined task to do so, but if we look again on where files should be placed, we could actually push artifacts in the proper place to make it works the way we need.

First let’s look at how we need to build our console app.

What are going to do here is to add a “Copy” task to actually copy our build output to the proper folder, using the same App Service folder structure:

What you can see here is that we are copying the output of the console app build from “$(Build.SourcesDirectory)/MyApp/MayAppConsoleApp/bin/Prod” to “$(build.artifactstagingdirectory)\App_Data\jobs\continuous\MyAppJob”.

Looks similar to what we saw earlier right? 🙂

By doing so we are creating our build artifact already with the required structure, so we can deploy our artifacts then straight  to the root App Service folder, as the WebJob folder will already be there.

This means when looking at release App Service deployment part:

As you can see, nothing special is to be found here 🙂


If you want to deploy your Webjob in your App Service:

  1. In your build, create a “Copy” task which outputs the copy to you artifact directory : “$(build.artifactstagingdirectory)\App_Data\jobs\continuous\MyAppJob”,
  2. Nothing to do in your deployment part, TADA!
  3. There no 3.

Happy coding!

Using Hangfire, or how to run your code as jobs

I encountered Hangfire a while ago, and tried it out sometimes a year ago, but was not having time or need to properly address some job capabilities.

If you are reading me, it’s because your either looking at understanding what is Hangfire or how to address some of your needs.

Hangfire is about running some portion of your code as jobs away from your main (server/web) process. It also adds the capability to run this code as recurring (very convenient to put in place simple update/cleaning/reminder/mailing jobs).

The most important part you have to get when willing to run Hangfire job, is that you code has to be cable to give itself a proper context:

  • no HttpContext.Current or similar objects: only what you give to your object at method calling time matters (this is what get serialized as Json on the Hangfire back-end).
  • no complex object graph: if the class/service you are willing to instantiate has many dependencies (other objects inits or similar), please make sure everything is in proper order from the call you initiate with Hangfire OR let your object initialize itself properly.
  • Bottom-line, be context friendly! if you have keys or ids to identify data you want to manipulate, pass on these values for serialization: simple to serialize and easier to maintain.

When digging into implementing Hangfire, you’ll see by yourself going over the documentation that almost all you need has been thought through.

As per writing some code using Hangfire, here a re a few hints:

  • You can only add Hangfire.Core Nuget package to a given project if you only intend to add jobs from it (less is better)
  • When willing to use an IOC container, make sure your use the proper Enqueue prototype; if you don’t, Hangfire will simply store the actual type (not the interface) that was used at Job enqueuing time,  which might work at first, but  won’t switch to your new type if you change your interface implementation in your ioc container:
BackgroundJob.Enqueue(x => x.MyInerfaceMethod(param1, param2, param3));
  • If you are planning to run the Hangfire server part in an ASP.NET app, don’t forget to have it running all the time! Hangfire does not auto-start the web app because it’s there 🙂
  • As you can access multiple queues when building with Hangfire, don’t forget to assign Hangfire processing servers to different queues.

Thanks for reading, happy coding!

Azure Directory Library & TokenCache persistance – upgrade issue

When using the Microsoft.IdentityModel.Clients.ActiveDirectory nuget package to deal with ADAL token cache, you can ctaully serialize the state of your cache to use it at a later point:

public class RefreshTokenCache : TokenCache
    private void AfterAccessNotification(TokenCacheNotificationArgs args)
        if (this.HasStateChanged)
            var data = Convert.ToBase64String(this.Serialize());

When migrating from 2.x to 3.x versions of the library, I encountered some issues trying to serialize back my token cache to its initial state:

var tc = new RefreshTokenCache();

The problem was that serialized data obtained on v2.x was not de-serialized properly when using 3.x versions.

I that case, I am forcing the token to be asked again to the end-user, so that I can serialize it back to the 3.x format.

Happy coding!