SQL like aggregation in R
Mon 28 July 2014
[caption id="" align="alignright" width="75"] Group by (Photo credit: Wikipedia)[/caption]
The use case is similar to the one descibe in usual SQL quick reference guide at the "group by" section.
first of all, I need a dataset as one data.frame:
d <- data.frame(product=sample(c("fruit", "phone", "computer"), size=20, replace=TRUE), vendor=sample(c("manu", "the other guy"), size=20, replace=TRUE), note=sample(c(1:5), size=20, replace=TRUE))
to compute the mean note of product sold by each vendor, is SQL the query looks like:
SELECT vendor, MEAN(note) FROM d GROUP BY vendor;
in R, you can use `aggregate() <http://stat.ethz.ch/R-manual/R-patched/library/stats/html/aggregate.html>`__ function:
aggregate(note ~ vendor, d, function(x){mean(x)})
Note that the column of the data.frame should have name and that the function can be any function you make.
more complicated: group by several column: I don't know if there is a canonical way to do it, but I found one:
aggregate(note ~ vendor + product, function(x){mean(x)})
Note that you can use any formula containing both vendor and product.
Category: how to Tagged: Aggregate Group by R SQL programming