To var or not to var (C#)
The var
keyword has many different uses in different languages, in C# 3.0 the var
implicit type was added so that we could store a reference to a variable without explicitly defining the type of that variable.
Overnight we had a new way to declare variables, should we go through the code base and replace all explicitly typed variables with the new implicit type, should we use implicit types at all? Even today, 15 years later "To var
or not to var
?" is still a relevant question and sometimes a contraversial topic.
var
added?
Why was C# v3.0 was released as part of .Net 3.5, Nov 2007. This realease also contained the introduction of partial files, object and collection initialisers, anonymous types and of course LINQ.
var
was a pivotal syntax requirement for anonymous types that are used extensively in LINQ expressions and the projected results from those expressions. Without var
the type declarations needed to support LINQ would have been so verbose that LINQ may not have been adopted at all.
The introduction of var
opened up opportunities for more fluent and expressive syntax.
Most development groups settle down pretty quickly on implicit global code style conventions and usually appropriate use of
var
. Like spaces vs tabs if we want to be productive as a team then we need to be consistent and adopt a convention across the whole team and try not to fight against it. What surprised me in my time on CodeMentor is how many many dev teams have a clear policy to avoid the use ofvar
, but do not have any justification at all for it.
var
in C# different to other languages
How is As breifly mentioned in this great summary: Settling the Debate Surrounding var and C# other weakly typed or late bound languages commonly use var
as a variant, loose or generic type defintion, and some scripting or functional languages use this keyword similarly to the C# ref
modifier to declare that an argument is passed by reference between execution scopes instead of by value.
In C# var
is still a strong reference, the type of this variable is not dynamic, nor is it boxed or undefined/ambiguous like object
. This non-transient type is specifically known and easily inferred at compile time and cannot be changed.
Why the argument?
Developers are creatures of habbit that believe they are always right, just ask me! We like to fix things, and when reading code our eyes start to water when we see something that we don't agree with. The amusing part is that just like light mode vs dark mode, the main argument for and against var
is that it makes the code easier (or harder) to read.
I personally find the use of
var
can improve code readability, but many teams who require all variables be explicitly typed say that an explicitly typed variable is easier to read, if we are all always right, then which of us is wrong?
When the initialization is visible with the logic that is using the variable, then in most cases the type is visually implicit to the reader anyway, explicitly defining the type just makes more code to read and in some cases forces you to scroll horizontally or if line wrapping is enabled it will reduce the lines of code that will fit in the current view port.
The following variable initialization is hiddeous either way, but the second line using the implicit operator is much simpler to read.
Dictionary<string, Dictionary<string, string>> groupsOfLookupLists = new Dictionary<string, Dictionary<string, string>>();
var groupsOfLookupLists = new Dictionary<string, Dictionary<string, string>>();
My personal amusment derives from the observation that in many blocks of code, the type initialization is not visible in the same view port as the line of code you might be trying to read so both the main arguments really become moot at this point, this adds to my personal amusement on this topic.
Is it even helpful to know the type of a variable
... No, unless we are talking about primitive or framework common types, the more abstract or generic your logic gets, the less information we gain from knowing the specific type anyway. Due to C#'s polymorphism and many modern code bases using IoC patterns to prefer composition of Interfaces over strict OO Inheritance, any individual block of code may be contain type references that the reader is not familiar with anyway, most new reviewers that might benefit from knowing the type would be familiar with the concrete implementations but not the specific interface types in use in the code.
What is more important than the type of a given variable is it's business meaning, the name should convey to you both meaning within the business domain and if possible the intent. This can lead to some long variable names, but goes a long way towards improving readability of your code, if you need to create a list of users to delete then you should name your variable to that effect:
var listOfUsersForDelete = new HashSet<User>();
var userIds4Delete = new HashSet<int>();
To reduce the length of your variable names, but still maintain the meaning and intent (the what and why) it is common to implement a naming convention that either strips out vowels or follows your own flavour of Apps Hungarian and no I do not advocate use of the Systems Hungarian, remember my argument is that the type itself is irrelevant... A good concept to keep in mind when establishing a naming convention or choosing a name at all is to follow the Principle Of Least Astonishment, that is that the name of the function or variable should reflect what it does, the result of a function should be inferred from its name, otherwise you will leave the reader or caller of the code unpleasantly surprised.
This post from Chris Sutton deserves a read, he implies that the type really doesn't matter:
An obvious place to use var is when you do a LINQ query and you are selecting a limited set of columns back from the original type:
NorthwindDataContext north = new NorthwindDataContext(); var emps = from e in north.Employees where e.City == "London" select new { e.FirstName, e.LastName };
There isn’t much choice here since emps is an anonymous type made up of the employee’s FirstName and LastName. The custom or partial projection (everything after the select keyword) forces the result to be an anonymous type.
Then the suggestion came up that you should only use var when you don’t know the type. Here is where I differ in opinion and usage. Look at the following snippet:
using System.ServiceProcess; . . . var procs = from p in ServiceController.GetServices() where p.Status == ServiceControllerStatus.Running select p; procs.ToList().ForEach(p=> Console.WriteLine(p.ServiceName));
procs
is certainlyIEnumerable<ServiceController>
but it doesn’t matter to me. I primarily care that procs is a list and that the individual items in the list have a Property calledServiceName
. The underlying type is important to the compiler, but the people that have to read code aren’t compilers right?I’m very glad that the compiler and runtime do what they do with types, but I want the reader’s focus on the simple LINQ query and iterating over the list it to get a property, not on the type. Replacing var with IEnumerable<ServiceController> is harder to read and less useful.
var
be used for all variables
Can var
CANNOT be used everywhere, so if you like all or nothing policies, then you're stuck on the side of "not to var
". You can only use implicit typed variables in statements where an existing reference or value is passed in or the variable declaration includes the initialisation of the variable. The easiest way to think about this is that the type of the variable must be known at the time the variable is declared, there cannot be ambiguity and it cannot be resolved later. Like all other explicitly typed variable declarations the type cannot be changed at runtime.
Initializing var type in LINQ
OP in this post has fallen into a commonvar
trap. The type MUST be resolved in the initialization of the variable, so if you have (multiple) possible intialization branches, then you must declare the variable with an explicit type or refactor it so that the type is explicit.OutputType response = null; if (Type == 1) response = new OutputType(FixedValue, "this is a special case..."); else response = new OutputType(Type);
And you know what, that's OK!
Often there are other forms or refactoring or structuring of the code that will allow you to usevar
if you really want to, but ultimately you can't adopt a policy to always use implicit variables, that's going to lead to unnecessarily convoluted code.
There are even some cases where you should avoid var
, and that is generally when the usage of that variable becomes ambiguous. (Remember the type iteself cannot be ambiguous) or when reading the code it is visually ambigous. Whilst there are many technically valid ways to use var
, you should avoid using it if to do so would raise questions.
- Always stay true to the Principal of Least Astonishment!
There is also a common case when resolving references in LINQ expressions, the compiler might interpret an implicit variable as an expression and it may very well be one, but in many LINQ-to-SQL or LINQ-to-Entity expressions we may want to pass through a value as a variable into the expression, and not directly chain the expression itself.
For this reason you may see many LINQ expressions that resolve Id or values from expressions or other objects to an explicitly cast local variable immediately before using the variable in the expression:
int id = myItem.Id;
var query = context.ContainerItems.Where(i => i.Id == id);
If you see this pattern, resist the urge to in-line the id
reference into the query, in many cases code like this exists to avoid common runtime errors. Just to re-iterate, don't blindly change the previous example to this:
var query = context.ContainerItems.Where(i => i.Id == myItem.Id);
FYI: if you find yourself fixing issues by elevating arguments in your LINQ predicates to local variables, then be curteous to your fellow developers and include a comment that will make the next person think twice before trying to fix your code.
var
vs numeric type argument
The Numeric (and Nullable<T>
) initialisation poses an interesting scenario that is often touted as the killer argument against using var
, however once again this is also an argument as to why you should try to use implicit types.
The problem is this, what is the type of the following variable x
and what was the intended type?
var x = 0;
The C# compiler will infer that x
is an int
in this case and for many scenarios that is expected, but if you really wanted a double
? If you were going to use that value in any simple arithmetic, you might be surprised to find the results rounded to integers! In that case you need to specify that explicitly inteded a double
either in the variable declaration or initialization:
double x = 0;
var x = 0.0;
var x = 0d;
var x = (double)0;
NOTE: The same concept applies to other numeric types
- In the first case, because the variable is declared as
double
, then the value0
is momentarily an inteteger, but it is implicitly converted to adouble
when stored in the variable. - The second case is technically correct, but visually ambiguous, there is a chance that an unsuspecting dev might clean this back into
0;
without realising that this alters thetype
. - The third line shows the correct way to use literal notation for a
double
that does not result in any form of cast or conversion. - The last is an explicit cast, which really shouldn't be used in this way, not when there is a literal notation that avoids the cast. This initialization works the same way as the first line does.
var
Do not prevent the use of In the review sessions for teams that restrict or prevent the use of var
there are three common red flag patterns that come up with respect to var
or rather scenarios where var should have been used:
- The wrong type was assumed in an initialization.
- The Git Diff logs are bloated with changes to variable declarations so it is really hard to focus in on changes to actual logic instead of changes to the variable initializations.
- There are no anonymous type declarations.
The first two issues are really the same thing, or the pattern of effort that follows the detection of the incorrect type initialization will be reflected in the number of changes in the Git Diff.
This type of change pattern can be referred to as Shotgun Surgery, a seemingly small change that requires alterations to multiple files and or multiple methods within the same file looks conceptually like someone has fired a shotgun at your code, they are notoriously hard to review and often we turn a blind eye to it and don't bother reviewing at all, usually this results in follow up shot gun shots trying to plug any holes that we missed the first time.
Most modern application logic follows a basic flow:
Load data => Transform \ Process => Output
It is common to facilitate requests against arrays of data, so in the middle of that flow will be at least one form of iteration or loop construct. This is one place where var
excels, when declaring interim variables that are simple placeholders for data that is passing through. The actual type of the underlying variable is usually irrelevant to the iteration logic, but using an implicit type for such variables gives us freedom to change the actual types being processed without having to change large chains of interconnected logic.
To Illustrate let us use a simple example that will iterate some bookings and send a notification:
var bookingsToUpdate = db.Bookings.Where(x => x.StatusModified > x.StatusLastSent)
.ToList();
foreach(var booking in bookingsToUpdate)
{
SendNotification(booking.Id))
}
In this contrived example, the type of booking
is not relevant, other than the fact that it has an Id
property. A common optimisation for this type of logic is to limit the data that is read from the database by using a LINQ projection. We can apply that optimisation here safely without needing to modify the core logic process:
var bookingsToUpdate = db.Bookings.Where(x => x.StatusModified > x.StatusLastSent)
.Select(x => new { x.Id })
.ToList();
foreach(var booking in bookingsToUpdate)
{
SendNotification(booking.Id))
}
Contrast this however with a codebase that does not allow implicit types:
List<int>() bookingIdsToUpdate = db.Bookings.Where(x => x.StatusModified > x.StatusLastSent)
.Select(x => x.Id )
.ToList();
foreach(int bookingId in bookingIdsToUpdate)
{
SendNotification(bookingId))
}
Even in this very simple example, more than 50% of the lines of code have been changed, a change like that stands out in the diffs and needs to be reviewed.
On it's own, this is not a bad optimisation, the code is still readable and will function as before, but it is far less extensible. Things get worse when we need to extend this logic to bring back more than a single field. If we MUST avoid implicit type declarations then we are forced to use ordinal tuples or an explicitly defined type:
List<Tuple<int, DateTimeOffset, DateTimeOffset>> bookingsToUpdate = db.Bookings.Where(x => x.StatusModified > x.StatusLastSent)
.Select(x => new Tuple<int, DateTimeOffset, DateTimeOffset>(x.Id, x.StatusModified, x.StatusLastSent))
.ToList();
foreach(Tuple<int, DateTimeOffset, DateTimeOffset> booking in bookingsToUpdate)
{
TimeSpan age = booking.Item2.Subtract(booking.Item3);
Trace.WriteLine($"Send Notification Triggered: BookingId: {booking.Item1}, LastModified: {booking.Item2}, LastSent: {booking.Item3}, Age: {age}");
if (age > TimeSpan.FromMinutes(5))
SendNotification(booking.Item1))
}
Ordinal Tuples are themselves a bit of an anti-pattern. Introduced in C# 4 to simplify passing semi-structured data objects without needing to formally define a
struct
to represent that type. The concept was a time and code saving device but the implementation leads to code that is much harder to read and understand, overall creating negative work for your team. You should NEVER pass an ordinal tuple outside of the current method scope!ValueTuple (or Tuple Literal) improves on Tuples by allowing named properties which makes them viable (though still not reccomended) for passing outside of the current method, but their usage is still restricted to certain sceanarios and the syntax is still verbose.
If you need to access to pass structured data outside of a given method scope, either as a return
argument or as an output
parameter, then you should use an explicitly described type definition instead of tuples or anonymous types. This advice will help solve other extensibility issues in your code base, but has nothing to do with the use of implicitly typed variables.
As the team or the codebase grows, it becomes more and more important from an overall efficiency of management point of view to reduce surface area of indivudal check-ins. Use of implicit types can lead to code patterns that allow your team and code to be more agile.
The thrid Red Flag pattern, when you come across code that does not contain any anonymous types, is similar to reviewing a SQL Trace log where most of the queries use SELECT * FROM
. This needs to uneccessary network traffic and memory consumption and can be the silent application performance killer.
If a code-base is discouraged from using implict type declarations, then they will commonly also try to avoid using anonymous types because the correct explicit type syntax to use can be hard to identify or describe, this leads to an avoidance and lack of understanding of anonymous types at all.
var
Repeat, DO NOT PREVENT the use of Embracing implicit types is hard for some developers to adapt to, but it allows more expressive implementations of your code and in some cases for you code to be more easily extended without adding bloat to the code itself or the memory profile of your application.
I am not suggesting that must you go through your legacy code base to change every eligible type declaration to use implicit types, but keep this powerful tool available so that it can be used when necessary. You will find that embracing this pattern of code will make it easier for you to onboard the next generation of developers who were not burdened with having to code from first principals before LINQ was introduced.
The C# language specification has evolved and continues to do so, as developers we need to do so as well. Though we might not always agree with all the new language or syntax features, we must recognise that these features are only ever introduced to improve the overall SDLC experience. It is not always about writing code quickly, many language features have other direct and sometimes indirect benefits that will help you manage the codebase long term. If you cannot see the value in a new feature, then perhaps you are misunderstanding the situations and issues that it was designed to resolve or perhaps reduce or prevent.