Channel | Publish Date | Thumbnail & View Count | Download Video |
---|---|---|---|
Publish Date not found | 0 Views |
The usual first step in NLP is to break our documents into smaller pieces in a process called tokenization. We will look at the challenges involved and how we can make it happen.
Colab notebook: https://colab.research.google.com/github/futuremojo/nlp-demystified/blob/main/notebooks/nlpdemystified_preprocessing.ipynb
Timestamps:
00:00 Tokenization
00:12 Text as unstructured data
00:39 What is tokenization?
01:09 The challenges of tokenization
03:09 DEMO: tokenizing text with spaCy
07:55 Preprocessing as a pipeline
This video is part of Natural Language Processing Demystified – a free, accessible course on NLP.
Visit https://www.nlpdemystified.org/ for more information.
Please take the opportunity to connect and share this video with your friends and family if you find it useful.