Tokenization Tutorial (Part 1)

Tokenization Tutorial (Part 1)

Oct 22, 2022

This tutorial demonstrates how to create a character vector and convert it to a data frame and ultimately tokenize the text into individual words.

imagen

#Load Libraries

require(dplyr)# package required for data frame manipulation

require(tidytext)#package used to tokenize words

#creating character vector

text <- c("When things go wrong as they sometimes will",

"When the road you’re trudging seems all uphill",

"When the funds are low, and the debts are high",

"And you want to smile, but you have to sigh",

"When care is pressing you down a bit",

"Rest if you must, but don’t you quit.")

#converting vector into data frame

textdf <- dataframe(line = 1:6, text = text)

#change the poem into tokens/ a token is a word in this example

resultstoken<- textdf %>%

unnest_tokens(word, text)

You can access the video tutorial here:

https://youtu.be/dFHXENDV0m0

¿Te gusta esta publicación?

Comprar DataCentricInc un café

Más de DataCentricInc