Eugine Kang

Golden Eyes of a blitzing Data Scientist in a mosh pit of R, SQL, Python

Read this first

[R Tips] Use “grepl” to search through text data

[한국어안내는 제목을 클릭해주시길 바랍니다]
The project I got myself into requires me to identify brands just by looking at the given store name. I could look through all 3,000,000 store names, but thats silliness talking. CTRL+F is a basic command for searching keywords. “grepl” is the CTRL+F of R.

Lets first begin by creating a text data set

Data set of random alphabet is created

sample <- paste(rep(letters,2), rep(letters, each=2), sep=“”)
x <- letters
x <- x[order(x, decreasing = TRUE)]
sample <- paste(sample, rep(x,2), sep=“”)
sample <- as.data.frame(cbind(sample, 1:length(sample)))
colnames(sample) <- c(“name”, “number”)
sample <- sample[,c(2,1)]

number name
1 1 aaz
2 2 bay
3 3 cbx
4 4 dbw
5 5 ecv
6 6 fcu
7 7 gdt
… (the code above should lead you to a data frame looking like this)

Example 1) Find store names with “c” in it.

sample[grep

...

Continue reading →


[MMA] What is a way to record all actions inside the octagon?

The Problem

Mixed Martial Arts (MMA) is one of the fastest growing professional sports today. Although the early years of MMA was more of a circus / freak show, today the sport has settled into the realm of professional sports with rules, and professionalism.
machida.jpg
The sport is maturing and this means improvements are needed left and right. One of the areas I see needing improvements are the fight statistics. FightMetric LLC is the official stats keeper for the UFC. In a collection of fights, many things are happening between two fighters. Strikes, take-downs, submissions, are all terms familiar to a regular MMA fan. FightMetric does a good job recording the SUMMARY of a fight.
example fightmetric.jpg
However, my main disappointment is with the lack of detail to this data. Did the take-down attempt come after a quick jab? Was a rear-naked choke attempt successful due to a stunned opponent? The time and order...

Continue reading →