Numinex
← Back to all posts

Thread (17 posts)

Problem: too many constraints for practicality

Idea: use a subset of constraints to find a candidate solution, them check if it satisfies the global set of constraints and add any violated ones to the subset

Discussion of how to use a priority queue to index constraints in a way that tracks their structure but not obvious on a first read what that structure is - discussion seems very accounting oriented

§ 3.3)

replyquoteparent
Thinking Summary

The user has shared a paper about "Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?" and appears to be looking at a specific section (§ 3.3) about efficient constraint violation detection. The key points from this section: 1. They're dealing with a linear programming problem that has too many constraints to be practical 2. Their solution is to use a subset of constraints, find a candidate solution, then check if it satisfies the global set of constraints and add any violated ones 3. They use a priority queue to efficiently detect violated constraints 4. The structure involves tracking how pair counts change as merges are applied Let me provide a clear explanation of what's happening in section 3.3.

Looking at §3.3, the authors are addressing a computational challenge in their BPE tokenizer attack. Here's what's happening:

The Problem

The linear program LP1_T has potentially billions of constraints (O(V³) where V is vocabulary size). Checking all constraints naively would be computationally infeasible.

The Solution Strategy

They use lazy constraint generation - start with a subset of constraints and iteratively add violated ones:

  1. Solve a smaller LP with only some constraints
  2. Check if the solution violates any constraints from the full LP
  3. Add violated constraints and repeat

The Priority Queue Structure

The key insight is tracking how pair frequencies change efficiently:

  • Define sp(t)=i=1nαici,p(t)s_p^{(t)} = \sum_{i=1}^n \alpha_i c_{i,p}^{(t)} (weighted sum of pair counts)
  • Need to check if: v(t)+sm(t)(t)maxp(sp(t)vp)v^{(t)} + s_{m^{(t)}}^{(t)} \geq \max_p(s_p^{(t)} - v_p)

The priority queue maintains pairs ordered by sp(t)vps_p^{(t)} - v_p:

  1. Initial setup: Build heap with all pairs, priority = sp(0)vps_p^{(0)} - v_p
  2. Check violations: Pop pairs in descending order until finding one that satisfies the constraint
  3. Update efficiently: When moving from time t to t+1, only update pairs affected by merge m(t)m^{(t)}

The "structure" being exploited is that most pair counts don't change between timesteps - only O(V) pairs are affected by each merge (those overlapping with the merged pair). This allows updating the priority queue in O((k_new + k_old) log V) time rather than rebuilding from scratch.

This approach reduces the constraint checking from O(V²) per timestep to O(k log V) where k is the number of violated constraints - a massive speedup when most constraints are satisfied.

replyquoteparent
Thinking Summary

The user is asking about the relationship between pair counts that don't change during a merge and how this relates to constraints in the priority queue. Let me think through this carefully. When a merge happens at time t, say merge m^(t) = (A, B) → AB: - Most pairs don't contain A or B, so their counts remain the same - Only pairs that contain A or B will have their counts change In the priority queue: - Each pair p has priority s_p^(t) - v_p - s_p^(t) = Σ α_i * c_{i,p}^(t) (weighted sum of counts) - If c_{i,p}^(t) = c_{i,p}^(t+1) for all i, then s_p^(t) = s_p^(t+1) - So the priority doesn't change! This means: 1. Pairs unaffected by the merge keep the same priority in the queue 2. Only need to update priorities for pairs that overlap with the merge 3. This is why they only need O(V) updates instead of O(V²) Each pair p corresponds to a constraint in the LP that says the chosen merge m^(t) must have been more frequent than p at time t.

Great question! Let me clarify the connection between unchanged pair counts and the priority queue structure.

How Merges Affect Pair Counts

When merge m(t)=(A,B)ABm^{(t)} = (A, B) \rightarrow AB happens at time tt:

Pairs that change:

  • (A,B)(A, B) itself disappears (count → 0)
  • Pairs like (X,A)(X, A) where XX precedes AA
  • Pairs like (B,Y)(B, Y) where YY follows BB
  • New pairs like (X,AB)(X, AB) and (AB,Y)(AB, Y) are created

Pairs that DON'T change: Any pair not containing AA or BB, like:

  • (C,D)(C, D) where neither CC nor DD is AA or BB
  • (E,F)(E, F) anywhere else in the text

Connection to Priority Queue

Each pair pp in the queue has priority: sp(t)vps_p^{(t)} - v_p where sp(t)=iαici,p(t)s_p^{(t)} = \sum_i \alpha_i c_{i,p}^{(t)}

Key insight: If ci,p(t)=ci,p(t+1)c_{i,p}^{(t)} = c_{i,p}^{(t+1)} for all categories ii, then:

  • sp(t)=sp(t+1)s_p^{(t)} = s_p^{(t+1)} (the weighted sum doesn't change)
  • The priority sp(t)vps_p^{(t)} - v_p remains identical
  • No need to update this entry in the priority queue!

The Efficiency Gain

For vocabulary size VV:

  • Total possible pairs: O(V2)O(V^2)
  • Pairs affected by one merge: O(V)O(V) (those overlapping with the merge)
  • Pairs unaffected: O(V2)O(V)=O(V2)O(V^2) - O(V) = O(V^2)

So instead of rebuilding the entire queue with O(V2)O(V^2) operations, they only:

  1. Update O(V)O(V) existing entries (decrease-priority operations)
  2. Insert O(V)O(V) new pairs created by the merge
  3. Leave the vast majority untouched

Each pair pp represents a constraint in the LP: "the chosen merge m(t)m^{(t)} had higher frequency than pair pp at time tt". The priority queue efficiently finds which of these constraints are violated without checking all O(V2)O(V^2) of them.

replyquoteparent