Basic Bag-of-Words

TimestampDescription
00:00:00Basic bag-of-words (BoW)
00:00:22The need for vectors
00:00:53Selecting and extracting features from our data
00:04:04Idea: similar documents share similar vocabulary
00:04:46Turning a corpus into a BoW matrix
00:07:10What vectorization helps us accomplish
00:08:20Measuring document similarity
00:11:09Shortcomings of basic BoW
00:12:37Capturing a bit of context with n-grams
00:14:10DEMO: creating basic BoW with scikit-learn and spaCy, measuring document similarity, and creating n-grams
00:19:35Basic BoW recap