SQL like aggregation in R

Mon 28 July 2014

[caption id="" align="alignright" width="75"]Zwei aneinandergekoppelte B-Wagen der Bogestra... Group by (Photo credit: Wikipedia)[/caption]

The use case is similar to the one descibe in usual SQL quick reference guide at the "group by" section.
first of all, I need a dataset as one data.frame:
d <- data.frame(product=sample(c("fruit", "phone", "computer"), size=20, replace=TRUE), vendor=sample(c("manu", "the other guy"), size=20, replace=TRUE), note=sample(c(1:5), size=20, replace=TRUE))
to compute the mean note of product sold by each vendor, is SQL the query looks like:
SELECT vendor, MEAN(note)
GROUP BY vendor;
aggregate(note ~ vendor, d, function(x){mean(x)})
Note that the column of the data.frame should have name and that the function can be any function you make.
more complicated: group by several column: I don't know if there is a canonical way to do it, but I found one:
aggregate(note ~ vendor + product, function(x){mean(x)})

Note that you can use any formula containing both vendor and product.

Category: how to Tagged: Aggregate Group by R SQL programming

big cloud data

Sun 02 February 2014

[caption id="" align="alignright" width="350"]Français : Big pink cloud Oia Big cloud not computing (Photo credit: Wikipedia)[/caption]

This post should be untitled From cloud computing to big data to fast data.

The previously next big stuff: cloud


[caption id="" align="alignright" width="75"]English: Cloud Computing Image Cloud computing (Photo credit: Wikipedia)[/caption]

Once upon a …

Category: maths Tagged: Apache Hadoop BigData Cloud computing Data mining Forecasting MongoDB NoSQL Platform as a service Software as a service SQL reflections

Read More
Page 1 of 1