R read.table and special characters
Wed 16 July 2014
[caption id="" align="alignright" width="350"] read.on.table (Photo credit: Wikipedia)[/caption]
There are some special characters the R function `read.table() <http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html>`__ (I suppose it is the same for read.csv(), I didn't test) cannot handle. This are mainly quotes (simple or double) and escaped quotes.
The easiest way to handle that is to replace quotes by other characters (in vim, a command to find and replace simple quotes by space is :%s/%x27/ /g as simple quote's ascii code is 0x27)
Another way to deal with it is to pass the appropriate argument quote to the function read.table():
dataset <- read.table("./dataset.txt", quote = "", ...)
The major drawback is that it does not handle escaped quotes.
The best idea is thus to preprocess the files to avoid quotes and escaped quotes.
(Non-)Related articles
- Watch out invisible characters.
- Strip control codes and extended characters from a string
- Character Encodings For Modern Programmers
Related articles (or not):
- UTF8, base64 and other encoding conversion
- R mclapply cores option
- R read lists from file
- SQL like aggregation in R
- evaluation of exploration values
Category: R Tagged: ASCII Escape sequence Quotation mark R