Manu - SQL

SQL like aggregation in R

Mon 28 July 2014

[caption id="" align="alignright" width="75"] Group by (Photo credit: Wikipedia)[/caption]

The use case is similar to the one descibe in usual SQL quick reference guide at the "group by" section.
first of all, I need a dataset as one data.frame:
d <- data.frame(product=sample(c("fruit", "phone", "computer"), size=20, replace=TRUE), vendor=sample(c("manu", "the other guy"), size=20, replace=TRUE), note=sample(c(1:5), size=20, replace=TRUE))
to compute the mean note of product sold by each vendor, is SQL the query looks like:

SELECT vendor, MEAN(note)
FROM d
GROUP BY vendor;

in R, you can use `aggregate() <http://stat.ethz.ch/R-manual/R-patched/library/stats/html/aggregate.html>`__ function:
aggregate(note ~ vendor, d, function(x){mean(x)})
Note that the column of the data.frame should have name and that the function can be any function you make.

more complicated: group by several column: I don't know if there is a canonical way to do it, but I found one:

aggregate(note ~ vendor + product, function(x){mean(x)})

Note that you can use any formula containing both vendor and product.

Category: how to Tagged: Aggregate Group by R SQL programming