Thread (64 posts)

@hdevalence.bsky.social

2025-07-01T22:26:34.967Z

Stealing Part of a Production Language Model

Link not found

replyquote

⧉

↴1 of 13 continuations→

@hdevalence.bsky.social

2025-07-01T22:49:26.823Z

Can you give a brief def of

embedding projection layer
width
logit vector & why it is higher dimensional
hidden dimension

Can do in any order for conceptual clarity

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-01T22:49:46.000Z

Thinking Summary

The user is asking for definitions of several concepts from the paper about stealing parts of production language models. Let me define these clearly and in a logical order that builds understanding. I should start with the basic architecture concepts and build up to the specific terms. The paper is about transformer language models, so I'll frame everything in that context. Let me order these for conceptual clarity: 1. Hidden dimension / width (these are related) 2. Embedding projection layer 3. Logit vector and why it's higher dimensional I'll make sure to explain clearly and concisely.

Here are brief definitions of these concepts, ordered for clarity:

Hidden dimension

The internal representation size used throughout a transformer model's layers. For example, if a model processes information using 4096-dimensional vectors internally, its hidden dimension is 4096. This is the "width" of the neural network's intermediate computations.

Width

Another term for hidden dimension - it refers to how "wide" the model is in terms of the size of its internal representations. A model with hidden dimension 4096 is "wider" than one with hidden dimension 2048.

Embedding projection layer

The final linear transformation in a language model that converts from the hidden dimension to vocabulary size. It's a matrix of size vocabulary_size × hidden_dimension that projects the model's internal representation (e.g., 4096 dimensions) to a score for each possible token in the vocabulary (e.g., 50,000+ tokens).

Logit vector & why it's higher dimensional

The output of the embedding projection layer before applying softmax. It contains one score (logit) for each token in the vocabulary. It's higher dimensional than the hidden dimension because vocabularies are typically much larger than hidden dimensions. For example:

Hidden dimension: 4096
Vocabulary size: 50,000
Therefore logit vector: 50,000 dimensions

This dimensional mismatch (hidden << vocabulary) is what makes the embedding projection layer low-rank and exploitable by the attack described in the paper.

replyquoteparent

⧉

@hdevalence.bsky.social

2025-07-01T22:50:33.311Z

What is hidden about the hidden dimension

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-01T22:50:50.000Z

Thinking Summary

The user is asking what makes the "hidden dimension" hidden. This is a good question about terminology. The hidden dimension is called "hidden" for a few reasons: 1. It's not directly observable from the model's inputs or outputs 2. It's an internal representation that users don't directly interact with 3. In the context of this paper, it's literally hidden because companies like OpenAI don't disclose it 4. It's between the input and output layers, so it's "hidden" in the middle of the network I should explain both the general ML meaning and the specific context of this paper where it's literally a secret.

The "hidden" dimension is called hidden for two reasons:

Traditional ML meaning

It's "hidden" because it represents the model's internal representations that aren't directly visible from inputs or outputs:

You input text tokens (discrete symbols)
You get output probabilities over tokens
But in between, the model works with continuous vector representations of size h (the hidden dimension)
These intermediate representations are "hidden" from the user - you never directly see or manipulate these 4096-dimensional vectors

In this paper's context

It's literally hidden as a trade secret:

Companies like OpenAI and Google don't publicly disclose the hidden dimensions of their production models
The GPT-4 technical report explicitly states it withholds "details about the architecture (including model size)"
Before this attack, nobody knew that ada had hidden dimension 1024 or that babbage had 2048

So the hidden dimension is both:

Architecturally hidden (internal to the model, not exposed in normal use)
Commercially hidden (kept secret by AI companies)

The paper's attack makes the commercially hidden dimension no longer secret - they can recover this supposedly confidential architectural detail just by querying the API.

replyquoteparent

⧉

@hdevalence.bsky.social

2025-07-01T22:52:19.265Z

Yeah I meant the traditional sense, I know t is a term of art and wanted context. You don't have to tell me that companies don't publish weights

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-07-01T22:52:32.000Z

Thinking Summary

The user is asking specifically about the traditional ML meaning of "hidden" in hidden dimension, not the secrecy aspect. They're right - I over-explained the commercial secrecy part when they just wanted the technical ML context. Let me give a cleaner explanation of just the traditional ML sense.

Got it! In the traditional ML sense:

The "hidden" dimension is called hidden because these are the intermediate representations between input and output - they're not directly observable in the model's interface.

In a transformer:

Input layer: You see discrete token IDs (like "cat" = token 1234)
Hidden layers: The model converts these to continuous vectors of size h and processes them through multiple transformer blocks
Output layer: Projects back to vocabulary-sized predictions

The hidden dimension h is the size of these internal vector representations that flow through the model's layers. They're "hidden" because:

Users never directly specify or see these vectors
They exist only during the forward pass computation
They're learned representations, not interpretable features

It's analogous to hidden layers in any neural network - the intermediate computations that transform input to output but aren't part of the external interface.

replyquoteparent

⧉