# Make Vectorize your friend in R

Old habits die hard. For someone coming from linear programming models such as C++ or Java, it takes quite a bit of effort before a *fully-compliant* R functions can be developed. The major pit-fall is, defining a function that is not *vector-compliant*.

For example, consider the below function that accepts two numbers and returns some computed value based on them.

```
getNc2 = function(vc, ec)
{
# compute nC2 and return the ratio
max_ec = vc * (vc-1) / 2
return (trunc(ec * 100 / max_ec))
}
```

This is a valid function in R, and perhaps similar to any C++ or Java method, except it works on both single number input *and* vector inputs. For example, note the below single number input calls:

```
> getNc2(5, 4)
[1] 40
> getNc2(6, 5)
[1] 33
```

Now, those two calls can be combined into a single vector call as below and our above definition works perfectly fine without complaints:

```
> getNc2(c(5,6), c(4,5))
[1] 40 33
```

The method accepted two vectors (instead of just two individual numbers) and returned vectorized results. Something not quite natural in C++ or Java, unless special care is taken to define the argument types as templated parameters and so on.

Now, what about the limits? Our method is computing a ratio which essentially involves a division. If we are not careful, we can endup with dividing-by-zero. For example, consider the below calls:

```
> getNc2(0, 5)
[1] -Inf
> getNc2(0, 0)
[1] NaN
```

Suppose, we want to handle these cases where the argument is zero and return 0 instead of *-Inf* or *NaN*. Then we will be adding an if condition as below:

```
getNc2 = function(vc, ec)
{
# if no vertices return 0
if(vc <= 1) return (0)
# compute nC2 and return the ratio
max_ec = vc * (vc-1) / 2
return (trunc(ec * 100 / max_ec))
}
```

Now, the earlier calls will return 0 correctly as shown below:

```
> getNc2(0, 0)
[1] 0
> getNc2(0, 5)
[1] 0
```

However, the if condition introduces a new problem. *if* in R is not a vector-compatible opration. It works only for scalar inputs and as such our method hence fails on vector inputs. For example:

```
> getNc2(c(5,0), c(4,5))
[1] 40 -Inf
Warning message:
In if (vc <= 1) return(0) :
the condition has length > 1 and only the first element will be used
```

Now, there are couple of ways you can address this problem. One would be to use *apply* constructs - but they can be overkill oftentimes. Other way is, using *ifelse* instead of *if* etc. vector-capable operations explicitly. That would work, provided you have the option to change the source. What about cases where you are using a method from some package and you cannot change the method ?

Take the method *get.edge* from *igraph* package, for example. It accepts an edge id and returns the vertices for that edge. However, if you supply a vector of edge ids, it will not work - it just returns the vertices for only the first edge, as can be seen from the below output:

```
> cg <- erdos.renyi.game(8, 0.6)
> get.edge(cg, 13)
[1] 4 6
> get.edge(cg, c(13,15,16))
[1] 4 6
```

How do you make these kind of methods work for vector inputs as well as scalar inputs? R has a nice base method under its sleeve that exactly addresses this problem. Named *Vectorize*, it helps in these situations - coverting scalar methods to vector methods.

*Vectorize* creates a function wrapper that vectorizes the action of its argument function. For example, we can vectorize our *getNc2* method as shown below:

```
> Vectorize(getNc2)(c(5,0), c(4,5))
[1] 40 0
```

Instead of directly calling *getNc2*, we are rather creating a wrapper around it by calling *Vectorize* method with *getNc2* as its argument, and supplying the original vector inputs to the resultant function. In other words, the above is same as:

```
> vecNc2 <- Vectorize(getNc2)
> vecNc2(c(5,0), c(4,5))
[1] 40 0
```

Remember, the vectorized *vecNc2* method works perfectly fine on scalar inputs too. For example:

```
> vecNc2(5, 4)
[1] 40
```

By default *Vectorize* tries to vectorize all the arguments of the supplied method, which may not work sometimes. For example, the earlier example of *get.edge* in *igraph*:

```
> vecGE <- Vectorize(get.edge)
> vecGE(cg, c(13,15,16))
Error in function (graph, id) : Not a graph object
```

Here the *get.edge* method is accepting two inputs, one the graph object itself and the second the edge id - and we want to vectorize only the second argument, namely the *id* argument and not the graph object. For this, we would have to use the *vectorize.args* facility of *Vectorize* method as shown below:

```
> vecGE <- Vectorize(get.edge, vectorize.args="id")
> vecGE(cg, c(13,15,16))
[,1] [,2] [,3]
[1,] 4 1 2
[2,] 6 7 7
```

This is a simplified output of the *Vectorize* method. We can use the *SIMPLIFY* facility of *Vectorize* method to get the original results unaltered. For example:

```
> vecGE <- Vectorize(get.edge, vectorize.args="id", SIMPLIFY=FALSE)
> vecGE(cg, c(13,15,16))
[[1]]
[1] 4 6
[[2]]
[1] 1 7
[[3]]
[1] 2 7
```

Note, this is same as the *lapply* output.

```
> lapply(c(13,15,16), function(id) get.edge(cg, id))
[[1]]
[1] 4 6
[[2]]
[1] 1 7
[[3]]
[1] 2 7
```

So go ahead and use *Vectorize* whenever you want to avoid *lapply* constructs for methods that accept only scalar inputs.