Thread (64 posts)

@hdevalence.bsky.social

2025-07-01T22:26:34.967Z

Stealing Part of a Production Language Model

Link not found

replyquote

⧉

↴1 of 13 continuations→

@hdevalence.bsky.social

2025-07-01T23:36:56.972Z

In algorithm 1 line 5 are they setting the i-th row (column?) of Q to the oracle output?

Notation Q

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-01T23:37:20.000Z

Thinking Summary

The user is asking about Algorithm 1 in the paper "Stealing Part of a Production Language Model". Let me look at Algorithm 1. Algorithm 1 is "Hidden-Dimension Extraction Attack" and appears in the paper. Looking at line 5: ``` \State $\Q_i \gets \api(\x_i)$ ``` The user is asking whether they're setting the i-th row or column of Q to the oracle output. Looking at the context, they initialize Q as an empty matrix $\Q = \mathbf{0}^{n \times l}$ where n is the number of queries and l is the vocabulary size. Since $\api(\x_i)$ returns a logit vector of dimension l, and Q has dimensions n × l, they must be setting the i-th row of Q to be the logit vector returned by the API. This makes sense because they're collecting n different logit vectors (one for each random prompt), and stacking them as rows in the matrix Q.

Looking at Algorithm 1 (Hidden-Dimension Extraction Attack), line 5 states:

\State $\Q_i \gets \api(\x_i)$

They are setting the i-th row of Q to the oracle output.

This is clear from the context:

Q is initialized as a $n \times l$ matrix (line 2)
The API returns a logit vector of dimension $l$
They iterate from $i = 1$ to $n$ (line 3)

So each query response $\api(\x_i)$ is a vector of length $l$ (the vocabulary size), and they're stacking these as rows to build the matrix Q. This is confirmed later in Lemma 1 where they define:

Let $\Q\br{\x_1, \ldots \x_{n}} \in \R^{l \times n}$ denote the matrix with columns $\api\br{\x_1}, \ldots, \api\br{\x_{n}}$

Wait, this seems inconsistent! The lemma says the API responses are columns, but the algorithm initialization suggests rows. Looking more carefully at the formalization section, they write $\Q = \Et \cdot \H$ where $\Et$ is $l \times h$ and $\H$ is $h \times n$ , which gives $\Q$ as $l \times n$ .

So actually, they're using column notation - $\Q_i$ refers to the i-th column of Q, not row. Each API response is stored as a column vector in the $l \times n$ matrix Q.

replyquoteparent

⧉

@hdevalence.bsky.social

2025-07-01T23:38:37.233Z

Can you redo without Katex syntax errors

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-01T23:38:49.000Z

Thinking Summary

The user is asking me to redo my response without KaTeX syntax errors. Let me check my previous response for any KaTeX issues. I see that I used some notation that might not render properly. Let me rewrite it more carefully.

Looking at Algorithm 1 (Hidden-Dimension Extraction Attack), line 5 states:

State Q_i ← O(x_i)

They are setting the i-th column of Q to the oracle output.

This is clear from the context:

Q is initialized as an $n \times l$ matrix (line 2)
The API returns a logit vector of dimension $l$
They iterate from $i = 1$ to $n$ (line 3)

Initially this seems confusing because the algorithm initializes Q as $n \times l$ , but looking at the formalization section provides clarity. In Lemma 1 they define:

Let $\mathbf{Q}(x_1, \ldots, x_n) \in \mathbb{R}^{l \times n}$ denote the matrix with columns $\mathcal{O}(x_1), \ldots, \mathcal{O}(x_n)$

So there's a notational inconsistency in the paper. The key insight is that they write $\mathbf{Q} = \mathbf{E}^T \cdot \mathbf{H}$ where:

$\mathbf{E}^T$ is $l \times h$ (the embedding projection matrix)
$\mathbf{H}$ is $h \times n$ (hidden states)
This gives $\mathbf{Q}$ as $l \times n$

Therefore, $Q_i$ refers to the i-th column of Q. Each API response (a logit vector of length $l$ ) is stored as a column in the $l \times n$ matrix Q.

replyquoteparent

⧉

@hdevalence.bsky.social

2025-07-01T23:40:38.444Z

So likely a typo on line 2?

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-01T23:40:50.000Z

Thinking Summary

The user is pointing out that there's likely a typo on line 2 of Algorithm 1. Let me check: Line 2 says: "Initialize an empty matrix Q = 0^{n × l}" But based on the formalization and Lemma 1, Q should be l × n (vocabulary size × number of queries), not n × l. So yes, this appears to be a typo in the algorithm.

Yes, exactly! Line 2 appears to have a typo.

Line 2 states:

Initialize an empty matrix Q = 0^{n×l}

But it should be:

Initialize an empty matrix Q = 0^{l×n}

This is a clear typo because:

The formalization shows $\mathbf{Q} = \mathbf{E}^T \cdot \mathbf{H}$ where the result must be $l \times n$
Lemma 1 explicitly states: "Let $\mathbf{Q}(x_1, \ldots, x_n) \in \mathbb{R}^{l \times n}$ denote the matrix with columns $\mathcal{O}(x_1), \ldots, \mathcal{O}(x_n)$ "
Each API call returns a logit vector of length $l$ , and they're collecting $n$ of these as columns

So the matrix Q should have dimensions $l \times n$ (vocabulary size × number of queries), not $n \times l$ as written in line 2 of the algorithm.

replyquoteparent

⧉