SQL like aggregation in R
[caption id="" align="alignright" width="75"] Group by (Photo credit: Wikipedia)[/caption]
The
use case is similar to the one descibe in usual
SQL quick reference guide at the "group by" section.
first of all, I need a dataset as one data.frame:
d <- data.frame(product=sample(c("fruit", "phone", "computer"), size=20, replace=TRUE), vendor=sample(c("manu", "the other guy"), size=20, replace=TRUE), note=sample(c(1:5), size=20, replace=TRUE))
to compute the mean note of product sold by each vendor, is SQL the query looks like:
SELECT vendor, MEAN(note)
FROM d
GROUP BY vendor;
aggregate(note ~ vendor, d, function(x){mean(x)})
Note that the column of the data.frame should have name and that the function can be any function you make.
more complicated: group by several column: I don't know if there is a canonical way to do it, but I found one:
aggregate(note ~ vendor + product, function(x){mean(x)})
Note that you can use any formula containing both vendor and product.
big cloud data
[caption id="" align="alignright" width="350"] Big cloud not computing (Photo credit: Wikipedia)[/caption]
This post should be untitled From cloud computing to big data to fast data.
The previously next big stuff: cloud
computing
[caption id="" align="alignright" width="75"] Cloud computing (Photo credit: Wikipedia)[/caption]
Once upon a …
Read More