[R Tips] Use “grepl” to search through text data
[한국어안내는 제목을 클릭해주시길 바랍니다]
The project I got myself into requires me to identify brands just by looking at the given store name. I could look through all 3,000,000 store names, but thats silliness talking. CTRL+F is a basic command for searching keywords. “grepl” is the CTRL+F of R.
Lets first begin by creating a text data set
####################
#
Data set of random alphabet is created
####################
#
sample <- paste(rep(letters,2), rep(letters, each=2), sep=“”)
x <- letters
x <- x[order(x, decreasing = TRUE)]
sample <- paste(sample, rep(x,2), sep=“”)
sample <- as.data.frame(cbind(sample, 1:length(sample)))
colnames(sample) <- c(“name”, “number”)
sample <- sample[,c(2,1)]
####################
#
number name
1 1 aaz
2 2 bay
3 3 cbx
4 4 dbw
5 5 ecv
6 6 fcu
7 7 gdt
… (the code above should lead you to a data frame looking like this)
####################
#
Example 1) Find store names with “c” in it.
#
sample[grepl(“c”,sample$name),]
#
sample[grepl(“c”,sample[,2]),]
#
grepl(A,B), A is the keyword you are looking for. B is the data space where are you searching for.
In this example, we are finding “c” from vector sample$name
Example 2) Find store names beginning with “c”
#
sample[grepl(“^c”,sample$name),]
#
sample[grepl(“^c”,sample[,2]),]
#
Regular Expression is a must when dealing with text data.
“^” this is a symbol for beginning of line.
Example 3) Find store names ending with “c”
#
sample[grepl(“c$”,sample$name),]
#
sample[grepl(“c$”,sample[,2]),]
#
“$” this is a symbol for ending of a line.
Example 4) Find store names with “c” or “a”
#
x <- c(“c”,“a”)
#
sample[grepl(paste(x, collapse=“|”), sample$name),]
#
sample[grepl(paste(x, collapse=“|”), sample[,2]),]
#
collapse=“|” combines all the elements in x with an “or” statement in between.
Example 5) Find store names without “c”
#
sample[!grepl(“c”,sample$name),]
#
sample[!grepl(“c”,sample[,2]),]
#
The only difference from example 1 is the “!” in front of grepl.
“!” stands for complementary set in R.
If you are new to R, I recommend the video tutorials from Google.
https://www.youtube.com/playlist?list=PLOU2XLYxmsIK9qQfztXeybpHvru-TrqAP
Please comment if you have any questions about anything.
http://1000wonicecoffee.svbtle.com/