threads recommender system
← Back to all posts
Thread (3 posts)
Can you draw an ASCII diagram of the system of components along with their relationships with one other and also provide a brief description of the role of each component?
Claude 4 Sonnet via @yondonfu.bsky.social
Based on the EmbSum paper, here's an ASCII diagram of the system architecture and components:
EmbSum Architecture
==================
Input Layer:
┌─────────────────────────────────────────────────────────────────┐
│ User Engagement History │
│ E_ui = {η₁, η₂, ..., η_g} (Sessions) │
│ Each session: η_i = {e₁, e₂, ..., e_p} │
└─────────────────────────────────────────────────────────────────┘
│
▼
Session Encoding:
┌─────────────────────────────────────────────────────────────────┐
│ T5 Encoder (Parallel) │
│ Session 1 Session 2 ... Session g │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ T5-Enc │ │ T5-Enc │ ... │ T5-Enc │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ [SOS] token [SOS] token ... [SOS] token │
│ embedding embedding embedding │
└─────────────────────────────────────────────────────────────────┘
│
▼
Summary Generation:
┌─────────────────────────────────────────────────────────────────┐
│ T5 Decoder │
│ Input: Concatenated hidden states from all sessions │
│ Supervision: LLM-generated summaries (Mixtral-8x22B) │
│ Output: User interest summary + [EOS] token │
│ Loss: L_sum = -∑log(p(y^ui_j|E, y^ui_<j)) │
└─────────────────────────────────────────────────────────────────┘
│
▼
User Representation:
┌─────────────────────────────────────────────────────────────────┐
│ User Poly-Embedding (UPE) │
│ Input Matrix Z: [k session embeddings + global embedding] │
│ Poly-Attention: α_a = softmax[c_a tanh(ZW^f)ᵀ] Z │
│ Output: A ∈ R^(m×d) - Multiple user interest vectors │
└─────────────────────────────────────────────────────────────────┘
Candidate Content Processing:
┌─────────────────────────────────────────────────────────────────┐
│ Candidate Content │
│ (News articles, Books, etc.) │
│ │ │
│ ▼ │
│ T5 Encoder │
│ │ │
│ ▼ │
│ Content Poly-Embedding (CPE) │
│ Poly-Attention with context codes {b₁, b₂, ..., bₙ} │
│ Output: B ∈ R^(n×d) - Multiple content vectors │
└─────────────────────────────────────────────────────────────────┘
Matching & Scoring:
┌─────────────────────────────────────────────────────────────────┐
│ CTR Prediction │
│ K^i_j = flatten(A_i^T B_j) │
│ W^p = softmax(flatten(A · gelu(BW^s)^T)) │
│ s^i_j = W^p · K^i_j │
│ │
│ Training Loss: L = L_NCE + λL_sum │
│ L_NCE = -log(exp(s^i_+)/(exp(s^i_+) + ∑exp(s^i_-,j))) │
└─────────────────────────────────────────────────────────────────┘
│
▼
Output:
┌─────────────────────────────────────────────────────────────────┐
│ Ranked Candidate Content │
│ Based on relevance scores s^i_j │
└─────────────────────────────────────────────────────────────────┘
Component Descriptions:
1. Session Encoding (T5 Encoder)
- Role: Processes user engagement history by partitioning it into sessions and encoding each session independently
- Why: Handles memory limitations of attention mechanisms while maintaining session-level context
- Output: k representation vectors from session embeddings
2. User Interest Summarization (T5 Decoder)
- Role: Generates textual summaries of user interests using LLM supervision
- Why: Captures interactions between long-range engaged content that simple concatenation misses
- Supervision: Mixtral-8x22B-generated summaries provide training targets
- Output: Global user representation from [EOS] token
3. User Poly-Embedding (UPE)
- Role: Creates multiple user interest vectors using poly-attention mechanism
- Why: Single embeddings can't capture diverse user interests effectively
- Input: Matrix Z combining session embeddings + global embedding
- Output: Matrix A with m different user interest representations
4. Content Poly-Embedding (CPE)
- Role: Generates multiple embeddings for candidate content items
- Why: Enables richer content representation beyond single token embeddings
- Mechanism: Uses context codes {b₁, b₂, ..., bₙ} with poly-attention
- Output: Matrix B with n different content representations
5. CTR Prediction Module
- Role: Computes relevance scores between users and candidate items
- Process:
- Computes interaction matrix between user and content embeddings
- Applies attention mechanism to aggregate matching scores
- Produces final relevance score s^i_j
6. Training Objectives
- L_NCE: Noisy Contrastive Estimation loss for recommendation task
- L_sum: Summarization loss supervised by LLM-generated summaries
- Combined: L = L_NCE + λL_sum (λ = 0.05)
Key Innovations:
- Offline Pre-computation: Both user and content embeddings can be computed offline
- Long History Handling: Session-based encoding + summarization handles extensive user histories
- Multi-faceted Representations: Poly-embeddings capture diverse aspects of users and content
- LLM Supervision: Leverages LLM capabilities for better user interest modeling
The system achieves state-of-the-art performance while using fewer parameters (61M vs 125M+ for BERT-based methods) and enables efficient real-world deployment through offline pre-computation capabilities.