Types of confusion

Article
12/20/2006

When I began programming in my early teens, I was very excited to learn about programming and also excited to become a "real" programmer. I remember picking up a book from the library that purported to teach what real programmers were like. I absorbed the material and enjoyed every page. I can only remember a handful of things from that book which I now realize was not very serious material, but I do remember one quote: "Strong typing is for people with weak minds." I thought to myself, "I don't want to have a weak mind" and I didn't want people to think that I have a weak mind so I thought that I should definitely not embrace strong typing. But what was "strong typing" anyway, I wondered. After much reflection on my (brief) programming experience, I decided that strong typing was using the keyboard a lot. So I decided that people with weak minds wrote overly verbose code. I then applied my new found principle.

Later I had a similar experience when I began perusing large numbers of articles, blogs, and comments about programming and programming languages. I would routinely come across some fellow programmer who would make a statement like: "dynamic languages are better because you need to write so much less code" or "python is no good because it is weakly typed". Furthermore, in many cases it was claimed that some language was strongly typed by one advocate and another person would claim that the same language was weakly typed. It was obvious that my earlier misunderstanding was shared by many of my peers.

C# 3.0 extends and adds some wonderful language features focused on type inference. The same confusion about what makes a language "dynamic" or "strongly typed" surrounds language features like the "var" contextual keyword.

On Definitions

Benjamin Pierce, the author of Types and Programming Languages, wrote in an email, "I spent a few weeks... trying to sort out the terminology of "strongly typed," "statically typed," "safe," etc., and found it amazingly difficult.... The usage of these terms is so various as to render them almost useless."

Based on the apparent confusion, I think it is best to clarify what I mean by each of the following terms.

Type Checking - Verifying that code respects type constraints.

Statically Typed - Type checking occurs at compile time.

Dynamically Typed - Type checking occurs at run time.

Type Safe Language - A language which protects its own abstractions.

Type Unsafe Language - A language which is not type safe.

Strongly Typed and Weakly Typed - Depends on the author; The definitions are so many and so varied that the terms are practically useless. It seems that anyone can claim that language X is either strongly typed or weakly typed based on sound reasoning derived from one of the various definitions. Here are a few sources: Wikipedia, Reddit, Google Groups, and another one.

Dynamic Language - A language which enables runtime inspection or modification of a program; most languages can do this but dynamic languages make it easy. It is common for people to refer to "dynamic languages" and mean "dynamically typed languages" as the term is defined here.

Type Inference - Deducing the type of an expression based upon the eventual evalution of that expression. Type inference enables the compiler to deduce types without type annotations.

Based on these definitions we can tackle some common misconceptions.

#1 - Dynamic languages are dynamically typed

Depends. Admittedly, most people seem to use the two terms interchangeably. But given the definitions used above then they are not the same. It is true that many dynamic languages are dynamically typed but it is not a requirement. It also seems to be the case that dynamic language features tend to be more easily adapted to dynamically typed languages. However, C# can be claimed to be a dynamic language (or at least that it has dynamic language features) since it has reflection, runtime code generation (from libraries), EnC (tool based), and now it has expression tree (which can be created, type checked, and evaluated at runtime).

#2 - Languages that allow programming without type annotations are dynamically typed

False. Whether a language is dynamically typed or not depends on whether type checking happens at runtime or compile time. Languages with type inference for example do not require type annotations but the compiler infers them (as if a programmer had typed them) and then performs type checking at compile time for consistent usage.

#3 - Dynamically typed languages are more terse

False. Maybe, a claim could be made that dynamically typed languages tend to be more terse given languages like Python, Perl, and Lisp; however, the statically typed languages have some very terse representatives as well (Haskell, ML dialects).

#4 - Statically/Dynamically typed languages are better than Dynamically/Statically typed languages

False. It is hard to make this claim objectively since "better" is usually ill-defined. There are some very good languages that are statically typed and some very good languages that are dynamically typed. Define what "better" means here and then we can talk. Of course, your definition of "better" may be disputed (and often is).

#5 - A language is either dynamically or statically typed but not both

Depends. In fact, most statically typed languages have a few features that are dynamically typed. In C# there are several examples: array bounds checking and casting. If you access an element outside of the bounds of an array then the error occurs at runtime because the type checking for the validity of this operation occurrsat runtime. If you cast an object to a string when in fact it contained an int then the type system will complain at runtime. Despite this, a language is usually referred to as being either statically or dynamically typed.

The "var" contextual keyword

Now we are ready to look the new contextual keyword "var". It is a very handy device that enables the programmer to omit type declarations for variables that are initialized in their declaration. However, it should be carefully noted that this feature still uses static type checking. For example:

var x = true; // same as bool x = "foo"

var y = 5; // same as int y = 5

y += 1; // ok

var z = x + y; // error! cannot add bool and int, won't compile

Some people mistakenly confuse the semantics of the "var" contextual keyword with the "var" declarations used in javascript or the variant datatype in VB. They are very different. Javascript is dynamically typed so when an assignment is made to a variable there is no complaint about mismatched types. This is because each variable does not have a type per se but a value that is assigned to a variable has a type. So values of different types may be assigned to the same variable as long as usage of that value at runtime is consistent with the type. So in javascript we could do the following:

var x = "foo";

x = 5; // no error here, but if this were C# code then a compile error (not a runtime error) would occur

I love using the new C# 3.0 "var" syntax for declaring local variables. In fact, it is the C# 3.0 feature that I use most frequently. Here are several reasons to use "var":

1. In my opinion, good programming style dictates that variables should be initialized when they are declared anyway and the type can usually be deduced from the expression used to assign the variable

2. Specifying the type is often redundant

Foo foo = new Foo();

3. The programmer may not care what type a variable is as long as her usage of that variable type checks

SomeFileType file = GetFileAttributes(filename);

if (file.IsReadOnly) ...

4. The programmer may not be able to specify the name of the type

??? foo = new { X = "bar", Y = 3.14159 }; // anonymous type (type which has no name)

5. The type name may be very very very long

Dictionary<Pair<int,List<Set<double>>>, Pair<double,int>> map = new Dictionary<Pair<int,List<Set<double>>>, Pair<double,int>>();

Only #4 is a mandated use for "var", the other reasons are more asthetic. A natural follow-up question is when can't I or when shouldn't I use "var".

1. When the type of the initializing expression is not the type that you want the variable to be

var x = new Bar();

if (someCondition) x = new Foo(); // where Bar derives from Foo

var y = null; // this doesn't work because null doesn' thave a type, although you could do var y = default(string); instead

var z = (a, b) => a + b; // this doesn't work because lambdas do not have a type, they must be converted to a delegate type first

var y = 1;

y += 1.1; // error y is int, should have been declared double or possibly initialized with 1.0

2. When you want to explicitly state the type of a variable for human readability

I personally don't think #2 is very compelling especially since with IDE support you can easily hover over "var" and see the type for each variable. Furthermore, I think that some of the arguments used by the proponents of dynamically typed languages apply here (not that this is a dynamically typed language feature). The number of type annotations in a program can distract from the intent of the program. From this perspective, the "var" contextual keyword actually improves readability. But unlike a dynamically typed language, the compiler can still type check the program at compile time and provide feedback on possible errors.

The "var" contextual keyword is just one of several features in C# 3.0 that uses type inference. But those features will be discussed another day.

Types of confusion

On Definitions

The "var" contextual keyword

Additional resources