Stealing Part of a Production Language Model
Thread (64 posts)
In algorithm 1 line 5 are they setting the i-th row (column?) of Q to the oracle output?
Notation Q
Thinking Summary
The user is asking about Algorithm 1 in the paper "Stealing Part of a Production Language Model". Let me look at Algorithm 1. Algorithm 1 is "Hidden-Dimension Extraction Attack" and appears in the paper. Looking at line 5: ``` \State $\Q_i \gets \api(\x_i)$ ``` The user is asking whether they're setting the i-th row or column of Q to the oracle output. Looking at the context, they initialize Q as an empty matrix $\Q = \mathbf{0}^{n \times l}$ where n is the number of queries and l is the vocabulary size. Since $\api(\x_i)$ returns a logit vector of dimension l, and Q has dimensions n × l, they must be setting the i-th row of Q to be the logit vector returned by the API. This makes sense because they're collecting n different logit vectors (one for each random prompt), and stacking them as rows in the matrix Q.
Looking at Algorithm 1 (Hidden-Dimension Extraction Attack), line 5 states:
\State $\Q_i \gets \api(\x_i)$
They are setting the i-th row of Q to the oracle output.
This is clear from the context:
- Q is initialized as a matrix (line 2)
- The API returns a logit vector of dimension
- They iterate from to (line 3)
So each query response is a vector of length (the vocabulary size), and they're stacking these as rows to build the matrix Q. This is confirmed later in Lemma 1 where they define:
Let denote the matrix with columns
Wait, this seems inconsistent! The lemma says the API responses are columns, but the algorithm initialization suggests rows. Looking more carefully at the formalization section, they write \Q = \Et \cdot \H where is and \H is , which gives as .
So actually, they're using column notation - refers to the i-th column of Q, not row. Each API response is stored as a column vector in the matrix Q.
Thinking Summary
The user is asking me to redo my response without KaTeX syntax errors. Let me check my previous response for any KaTeX issues. I see that I used some notation that might not render properly. Let me rewrite it more carefully.
Looking at Algorithm 1 (Hidden-Dimension Extraction Attack), line 5 states:
State Q_i ← O(x_i)
They are setting the i-th column of Q to the oracle output.
This is clear from the context:
- Q is initialized as an matrix (line 2)
- The API returns a logit vector of dimension
- They iterate from to (line 3)
Initially this seems confusing because the algorithm initializes Q as , but looking at the formalization section provides clarity. In Lemma 1 they define:
Let denote the matrix with columns
So there's a notational inconsistency in the paper. The key insight is that they write where:
- is (the embedding projection matrix)
- is (hidden states)
- This gives as
Therefore, refers to the i-th column of Q. Each API response (a logit vector of length ) is stored as a column in the matrix Q.
Thinking Summary
The user is pointing out that there's likely a typo on line 2 of Algorithm 1. Let me check: Line 2 says: "Initialize an empty matrix Q = 0^{n × l}" But based on the formalization and Lemma 1, Q should be l × n (vocabulary size × number of queries), not n × l. So yes, this appears to be a typo in the algorithm.
Yes, exactly! Line 2 appears to have a typo.
Line 2 states:
Initialize an empty matrix Q = 0^{n×l}
But it should be:
Initialize an empty matrix Q = 0^{l×n}
This is a clear typo because:
- The formalization shows where the result must be
- Lemma 1 explicitly states: "Let denote the matrix with columns "
- Each API call returns a logit vector of length , and they're collecting of these as columns
So the matrix Q should have dimensions (vocabulary size × number of queries), not as written in line 2 of the algorithm.