Skip-gram Word Embeddings Demo

Interactive visualization of the simplified Skip-gram model (W₁ = W₀ᵀ)

Enter sentences (STEP1). Press START when ready.

Step 1. Select text

Step 2: Process Text (remove stop words, etc.)

Step 3: Training Pairs

Initial (random) Embedding

Step 4: Improved Embedding (per K iterations) — 0 / 3000

Training Progress

How to Use

1. Select text. Press START.

2. The button becomes NEXT STEP. Press it to fill Step 2: Process Text (stop words removed, sentences and vocabulary shown with red/blue cluster colors).

3. Press NEXT STEP again to fill Step 3: Training Pairs in two columns (red pairs left, blue pairs right). The machine does not see these colors!

4. Press NEXT STEP to see the Initial (random) embedding.

5. Press NEXT STEP to run iterations and see the Improved embedding. The button becomes KEEP ITERATING.

6. Smooth checkbox (on by default): when checked, iterations run continuously until Epochs is reached; when off, each click runs K iterations once.

7. Iterations per UPDATE K (default 10): how many epochs run before each display update. Epochs (default 3000): total target iterations. The counter in Step 4 shows progress.

8. RESET stops the process and clears everything.

How to Enter Your Own Text

Select one of presets A–G — or type your own text in the field below the buttons. Separate sentences with periods. After entering, press START.

Try giving the AI this prompt:

Come up with 5 variants of text with six short sentences each.
Three on one topic and three on another. For example,

A) "Fish swim in deep water.
Ocean is very deep.
Fish swim in darkness.
Birds are high in the sky.
Birds fly very high.
On a sunny day the sky is full of light."

B) "Where does childhood go.
How young we were.
The years go by, yet we're still young.
Fashion isn't what it used to be.
You can't forbid dressing well.
You're not dressed for the season."

Press RESET. Delete the text and enter your own. Then press START, NEXT STEP, etc.

Overview

The Skip-gram model (Mikolov et al., 2013) learns word embeddings by predicting context words from a center word. Words that appear in similar contexts get similar embeddings.

Notation

Given vocabulary V, window size c (e.g. 2), we form training pairs (w_center, w_context). W₀ is a |V|×2 matrix: each row is a word’s 2D embedding.

Simplified Model (single matrix)

We use W₁ = W₀ᵀ. The probability of context y given center x is:

P(w_context|w_center) = exp(x·y') / Σ exp(x·y'_v)

We maximize the average log-likelihood over all training pairs.

Clusters

Words from the first half of sentences are shown in red; words from the second half in blue. After training, related words (fish, swim, water) and (birds, sky, fly) should cluster together.

Note: when repeating the experiment with the same text, the partition occurs, but not identically.

References

Mikolov et al. (2013.09) · Mikolov et al. (2013.10)