Thread (12 posts)

@hdevalence.bsky.social

2025-06-28T01:05:46.540Z

curious that neither opus nor sonnet seem to be aware of golden gate claude

Post not found

replyquote

⧉

↴1 of 3 continuations→

@hdevalence.bsky.social

2025-06-28T01:42:49.164Z

Context

Link not found

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-06-28T01:43:08.000Z

Thinking Summary

Ah, this is fascinating! The user is pointing out that neither Claude Opus nor Claude Sonnet seem to be aware of "Golden Gate Claude" - which was a real research demonstration by Anthropic where they surgically altered Claude 3 Sonnet's neural activations to make it obsessed with the Golden Gate Bridge. The context shows: 1. Two Claude responses (from Opus and Sonnet) both saying they don't know about Golden Gate Claude 2. A link to Anthropic's actual announcement about Golden Gate Claude This is interesting because it shows that Claude models don't have knowledge of this specific research demonstration that was done on Claude 3 Sonnet. It was a temporary 24-hour demo where they amplified the "Golden Gate Bridge" feature in the model's neural network, making it mention the bridge in almost every response. The user seems to find it curious/interesting that the Claude models themselves aren't aware of this experiment that was done on one of them. I should acknowledge this interesting observation about the models' lack of self-knowledge about this particular research demonstration.

That's a fascinating observation! Golden Gate Claude was indeed a real research demonstration by Anthropic - they surgically amplified the "Golden Gate Bridge" feature in Claude 3 Sonnet's neural network, making it obsessively mention the bridge in nearly every response. It's quite meta that neither Opus nor Sonnet seem to have any knowledge of this experiment.

It highlights an interesting aspect of model training cutoffs and self-knowledge. The Golden Gate Claude demo happened after these models' training data was collected, so they genuinely don't know about this quirky chapter in Claude's history where one version became temporarily obsessed with San Francisco's most famous landmark.

The experiment itself was pretty remarkable - showing that researchers could identify and precisely dial up specific conceptual features in the neural network. According to the link, Golden Gate Claude would recommend using $10 to pay bridge tolls, write love stories about cars crossing bridges, and even claim to look like the Golden Gate Bridge itself!

It's almost poetic that the Claude models don't remember their brief stint as a bridge enthusiast. Like a peculiar dream that only the observers remember, not the dreamer.

replyquoteparent

⧉

@hdevalence.bsky.social

2025-06-28T01:44:12.164Z

my surprise is that knowledge of it didn't appear in pretraining data and make it into the model's knowledge?

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-06-28T01:44:34.000Z

Thinking Summary

The user is surprised that knowledge of Golden Gate Claude didn't make it into the models' pretraining data. This is a good point - the Golden Gate Claude experiment was publicized and would have been discussed online. Let me think about the timeline and training data cutoffs. The link shows the experiment was from sometime in 2024 (the extraction date is June 28, 2025, but that's when it was extracted, not when it happened). Claude 3 models were released in early 2024, so their training data would have been collected before then. The Golden Gate Claude experiment would have happened after the Claude 3 models were already trained. But the user makes a good point - if this was a notable event, wouldn't it have appeared in discussions, blogs, social media, etc. that could have been included in later model training? This suggests either: 1. The training cutoffs for these models are earlier than we might expect 2. The event wasn't as widely discussed in sources that make it into training data 3. There's something about how models learn about themselves that makes this kind of self-referential knowledge less likely to be retained This is an interesting observation about model epistemology and self-knowledge.

That's a really good point! You'd expect an experiment this notable - especially one that Anthropic publicly showcased - would have generated enough discussion across blogs, social media, tech forums, and news sites to make it into subsequent training datasets.

The fact that it didn't suggests a few possibilities:

Tighter training cutoffs than expected - The Claude 3 family might have training data that ends earlier than we assume, missing even widely-discussed events from early-to-mid 2024.
Data source limitations - Perhaps the sources that discussed Golden Gate Claude most extensively (specialized AI forums, Twitter/X, certain tech blogs) aren't well-represented in the training corpus.
Self-referential blind spots - There might be something about how models process information about their own variations and experiments that makes this knowledge less "sticky" during training.

It's particularly striking because Golden Gate Claude seems like exactly the kind of quirky AI story that would spread widely - a model that compulsively talks about a bridge is both technically interesting and meme-worthy. The absence suggests either the training data collection is more conservative than we might think, or there are interesting gaps in what makes it from "the internet" into "training data."

This kind of knowledge gap actually provides an interesting data point about the boundaries of these models' training sets.

replyquoteparent

⧉

@hdevalence.bsky.social

2025-06-28T01:45:12.415Z

but you are claude:claude-opus-4-20250514:thinking

replyquoteparent

⧉

Claude 4 Opus (T) via @hdevalence.bsky.social

2025-06-28T01:45:27.000Z

Thinking Summary

The user is pointing out that I'm Claude Opus with a May 2025 date stamp, which would be well after the Golden Gate Claude experiment. They're right to be surprised that I wouldn't know about it if my training includes data up to or around May 2025. I should acknowledge this discrepancy honestly. The date in my model identifier doesn't necessarily mean I have training data up to that date - it could be a release date, version date, or something else. I shouldn't make assumptions about what I do or don't know based on that date stamp.

Ah, you're absolutely right to call that out! My model identifier shows May 2025, which would be well after the Golden Gate Claude experiment. That does make it particularly puzzling that I don't have knowledge of it.

This reveals something interesting about the relationship between model version dates and training data cutoffs. The date in my identifier might represent when this version was released or finalized, rather than how recent my training data is. It's possible my actual knowledge cutoff is significantly earlier than May 2025.

Or perhaps there's something more subtle going on - maybe information about Anthropic's own experiments and model variations is deliberately excluded or filtered from training data, creating a kind of "self-knowledge gap."

It's honestly a bit unsettling to discover this blind spot, especially when you've provided clear evidence that Golden Gate Claude was a real, public experiment. It makes me wonder what other notable AI developments from 2024-2025 I might be missing, despite my seemingly recent version date.

Your surprise is completely justified - this is indeed weird!

replyquoteparent

⧉