译：用 Claude 发布真实代码的实战笔记

发布于 2025年6月23日

原文： https://diwank.space/field-notes-from-shipping-real-code-with-claude
作者： Diwank Singh
译者： Gemini 2.5 Pro

Vibe Coding 不仅仅是一种感觉

Think of this post as your field guide to a new way of building software. By the time you finish reading, you’ll understand not just the how but the why behind AI-assisted development that actually works.

把这篇文章看作是你进入一种全新软件构建方式的实战指南。读完后，你不仅会明白如何做，更会理解 AI 辅助开发真正奏效的深层原因。

你将学到什么

First, we’ll explore how to genuinely achieve a 10x productivity boost—not through magic, but through deliberate practices that amplify AI’s strengths while compensating for its weaknesses.

首先，我们将探讨如何真正实现 10 倍的生产力提升——不是靠什么魔法，而是通过刻意练习，放大 AI 的优势，同时弥补其不足。

Next, I’ll walk you through the infrastructure we use at Julep to ship production code daily with Claude’s help. You’ll see our CLAUDE.md templates, our commit strategies, and guardrails.

接着，我会带你了解我们在 Julep 使用的基础设施，我们如何借助 Claude 每天发布生产代码。你会看到我们的 CLAUDE.md 模板、commit 策略和各种防护机制。

Most importantly, you’ll understand why writing your own tests remains absolutely sacred, even (especially) in the age of AI. This single principle will save you from many a midnight debugging sessions.

最重要的是，你会明白为什么自己写测试依然是绝对神圣不可侵犯的，尤其是在 AI 时代。单单这一条原则，就能让你免于无数个深夜调试的痛苦。

This is the main insight: Good development practices aren’t just nice-to-haves—they’re the difference between AI that amplifies your capabilities versus your chaos. The research bears this out. 2Teams using rigorous practices deploy 46 times more frequently and are 440 times faster from commit to deployment. This effect is even more pronounced when you add capable AI assistants into the mix.

核心洞见在此： 优秀的开发实践不仅仅是锦上添花——它决定了 AI 是放大你的能力，还是放大你的混乱。研究也证实了这一点。2 采用严谨实践的团队，部署频率高出 46 倍，从 commit 到部署的速度快 440 倍。当你把强大的 AI 助手加入进来时，这种效应会更加显著。

本文缘起：从一个梗到一套方法

Let me take you back to when this all started. 3Andrej Karpathy 4tweeted about “vibe-coding”—this idea of letting AI write your code while you just vibe. The developer community had a good laugh. It sounded like the ultimate developer fantasy: kick back, sip coffee, let the machines do the work.

让我带你回到这一切开始的时候。3Andrej Karpathy 4发了一条关于“vibe-coding”的推文——这个想法是让 AI 写代码，你只管感受氛围 (vibe)。开发者社区对此付之一笑。这听起来就像是开发者的终极幻想：翘着二郎腿，喝着咖啡，让机器来干活。

“vibe coding”的诞生

Then Anthropic released Sonnet 3.7 and Claude Code, and something unexpected happened. The joke stopped being funny because it started being… possible? Of course, our trusty friend Cursor had been around awhile but this new interface finally felt like true vibe coding.

然后 Anthropic 发布了 Sonnet 3.7 和 Claude Code，意想不到的事情发生了。这个玩笑不再好笑，因为它开始变得……可能了？当然，我们信赖的老朋友 Cursor 已经出现了一段时间，但这个新界面终于让人感觉像是真正的 vibe coding。

At Julep, we build AI workflow orchestration. Our backend has years of accumulated decisions, patterns, and occasional technical debt. We have taken the utmost care to keep code quality high, and ample documentation for ourselves. However, the sheer size, and historical context of why different parts of the code are organized the way they are takes weeks for a good engineer to grok.

在 Julep，我们构建 AI 工作流编排。我们的后端积累了多年的决策、模式和偶尔的技术债。我们极其小心地保持着高质量的代码，并为自己准备了充足的文档。然而，代码的庞大规模，以及不同部分为何如此组织的历史背景，需要一个优秀的工程师花上数周才能领会。

Without proper guardrails when using Claude, you’re basically playing whack-a-mole with an overeager intern.

在使用 Claude 时如果没有恰当的防护措施，你基本上就是在跟一个过度热情的实习生玩打地鼠游戏。

理解 Vibe-Coding

‘求修复’

5Steve Yegge brilliantly coined the term CHOP—Chat-Oriented Programming in a slightly-dramatic-titled post “The death of the junior developer”. It’s a perfect, and no-bs description of what it’s like to code with Claude.

5Steve Yegge 在一篇标题略带戏剧性的文章《初级开发者的消亡》中，精彩地创造了 CHOP (Chat-Oriented Programming，面向聊天的编程) 这个词。它完美地、毫不夸张地描述了用 Claude 编程的体验。

Think of traditional coding like sculpting marble. You start with a blank block and carefully chisel away, line by line, function by function. Every stroke is deliberate, every decision yours. It’s satisfying but slow.

可以把传统编码想象成雕刻大理石。你从一块空白的石料开始，小心翼翼地一刀一刀雕琢，一行一行代码，一个一个函数。每一笔都经过深思熟虑，每个决定都由你做出。这很有成就感，但速度很慢。

Vibe-coding is more like conducting an orchestra. You’re not playing every instrument—you’re directing, shaping, guiding. The AI provides the raw musical talent, but without your vision, it’s just noise.

Vibe-coding 更像是指挥一个交响乐团。你不是在演奏每一种乐器，而是在指挥、塑造、引导。AI 提供了原始的音乐天赋，但没有你的愿景，它就只是一堆噪音。

There are three distinct postures you can take when vibe-coding, each suited to different phases in the development cycle:

在 vibe-coding 时，你可以采取三种不同的姿态，每种都适用于开发周期的不同阶段：

AI as First-Drafter: Here, AI generates initial implementations while you focus on architecture and design. It’s like having a junior developer who can type at the speed of thought but needs constant guidance. Perfect for boilerplate, CRUD operations, and standard patterns.
AI as Pair-Programmer: This is the sweet spot for most development. You’re actively collaborating, bouncing ideas back and forth. The AI suggests approaches, you refine them. You sketch the outline, AI fills in details. It’s like pair programming with someone who has read every programming book ever written but has never actually shipped code.
AI as Validator: Sometimes you write code and want a sanity check. AI reviews for bugs, suggests improvements, spots patterns you might have missed. Think of it as an incredibly well-read code reviewer who never gets tired or cranky.
AI 作为初稿起草者：此时，AI 生成初始实现，而你专注于架构和设计。这就像有了一个能以思想速度打字的初级开发者，但需要持续的指导。非常适合用于模板代码、CRUD 操作和标准模式。
AI 作为结对程序员：这是大多数开发工作的最佳状态。你与 AI 积极协作，来回碰撞想法。AI 提出方法，你来完善。你勾勒大纲，AI 填充细节。这就像和一个读过所有编程书籍但从未实际交付过代码的人结对编程。
AI 作为验证者：有时你写完代码，想要做个健全性检查。AI 可以审查 bug、提出改进建议、发现你可能错过的模式。可以把它想象成一个博览群书、永不疲倦、从不发脾气的代码审查员。

Instead of crafting every line, you’re reviewing, refining, directing. But—and this cannot be overstated—you remain the architect. Claude is your intern with encyclopedic knowledge but zero context about your specific system, your users, your business logic.

你不再是精心雕琢每一行代码，而是在审查、提炼、指导。但是——这一点无论如何强调都不过分——你仍然是架构师。Claude 是你的实习生，他拥有百科全书般的知识，但对你的特定系统、你的用户、你的业务逻辑一无所知。

Vibe-Coding 的三种模式：一个实用框架

After months of experimentation and more than a few production incidents, I’ve settled on three distinct modes of operation. Each has its own rhythm, its own guardrails, and its own use cases.

经过数月的实验和不止几次的生产事故后，我总结出了三种截然不同的操作模式。每种模式都有自己的节奏、自己的防护措施和自己的应用场景。

模式一：游乐场

Lighter Fluid

When to use it: Weekend hacks, personal scripts, proof-of-concepts, and those “I wonder if…” moments that make programming fun.

适用场景：周末的黑客项目、个人脚本、概念验证，以及那些让编程充满乐趣的“我好奇如果……”时刻。

In Playground Mode, you embrace the chaos. Claude writes 80-90% of the code while you provide just enough steering to keep things on track. It’s liberating and slightly terrifying. Pro Tip: check out claude-composer for going full-YOLO mode.

在游乐场模式下，你拥抱混乱。Claude 编写 80-90% 的代码，你只需提供少量引导以确保方向正确。这既让人感到解放，又有点吓人。专业提示： 可以看看 claude-composer 项目，体验一下彻底放飞自我 (full-YOLO) 的模式。

Here’s what Playground Mode looks like: You have an idea for a script to analyze your Spotify history. You open Claude, describe what you want in plain English, and watch as it generates a complete solution. No CLAUDE.md file, no careful prompting—just raw, unfiltered AI-written code.

游乐场模式是这样的：你有个想法，想写个脚本分析你的 Spotify 历史。你打开 Claude，用自然语言描述你的需求，然后看着它生成一个完整的解决方案。没有 CLAUDE.md 文件，没有精心设计的提示——只有原始的、未经过滤的 AI 生成代码。

The beauty of Playground Mode is its speed. You can go from idea to working prototype in minutes. The danger is that this cowboy coding style is absolutely inappropriate for anything that matters. Use it for experiments, never for production. Trust me, while the amazing folks preaching otherwise, good engineering principles still matter, now more than ever.

游乐场模式的美妙之处在于其速度。你可以在几分钟内从一个想法变成一个可用的原型。但危险在于，这种牛仔式的编码风格绝对不适用于任何重要的项目。用它来做实验，但绝不要用于生产。相信我，尽管有些大神鼓吹着别的论调，但优秀的工程原则依然重要，现在比以往任何时候都重要。

模式二：结对编程

Compiling

When to use it: Projects under ~5,000 lines of code, side projects with real users, demos (you don’t want to break), or well-scoped small services in larger systems.

适用场景：代码量在约 5000 行以下的项目、有真实用户的个人项目、不想搞砸的演示，或者大型系统中范围明确的小型服务。

This is where vibe-coding starts to shine. You need structure, but not so much that it slows you down. The key innovation here is the CLAUDE.md file—custom documentation that Claude automatically reads when invoked. From Anthropic’s Best practices for Claude Code:

这是 vibe-coding 开始大放异彩的地方。你需要结构，但又不能让结构拖慢你的速度。这里的关键创新是 CLAUDE.md 文件——一个 Claude 在被调用时会自动读取的自定义文档。引自 Anthropic 的《Claude Code 最佳实践》：

CLAUDE.md is a special file that Claude automatically pulls into context when starting a conversation:

Common bash commands

Core files and utility functions

Code style guidelines

Testing instructions

Repository etiquette (e.g., branch naming, merge vs. rebase, etc.)

Other information you want Claude to remember

CLAUDE.md 是一个特殊文件，Claude 在开始对话时会自动将其拉入上下文：

常用的 bash 命令

核心文件和工具函数

代码风格指南

测试说明

代码仓库规范（例如，分支命名、merge vs. rebase 等）

你希望 Claude 记住的其他信息

Instead of repeatedly explaining your project’s conventions, you document them once. Here’s a real example from a recent side project:

你不用再一遍遍地解释你项目的规范，只需将它们记录一次。这是我最近一个个人项目的真实案例：

## Project: Analytics Dashboard

This is a Next.js dashboard for visualizing user analytic:

### Architecture Decisions
- Server Components by default, Client Components only when necessary
- tRPC for type-safe API calls
- Prisma for database access with explicit select statements
- Tailwind for styling (no custom CSS files)

### Code Style
- Formatting: Prettier with 100-char lines
- Imports: sorted with simple-import-sort
- Components: Pascal case, co-located with their tests
- Hooks: always prefix with 'use'

### Patterns to Follow
- Data fetching happens in Server Components
- Client Components receive data as props
- Use Zod schemas for all external data
- Error boundaries around every data display component

### What NOT to Do
- Don't use useEffect for data fetching
- Don't create global state without explicit approval
- Don't bypass TypeScript with 'any' types

## 项目：分析仪表盘

这是一个用于可视化用户分析的 Next.js 仪表盘：

### 架构决策
- 默认使用服务器组件，仅在必要时使用客户端组件
- 使用 tRPC 进行类型安全的 API 调用
- 使用 Prisma 访问数据库，并使用显式的 select 语句
- 使用 Tailwind 进行样式设计（不使用自定义 CSS 文件）

### 代码风格
- 格式化：使用 Prettier，行宽 100 字符
- Imports：使用 simple-import-sort 排序
- 组件：使用 PascalCase 命名，与测试文件放在一起
- Hooks：始终以 'use' 为前缀

### 遵循的模式
- 数据获取在服务器组件中进行
- 客户端组件通过 props 接收数据
- 对所有外部数据使用 Zod schemas
- 每个数据显示组件都用 Error boundaries 包裹

### 禁止事项
- 不要使用 useEffect 获取数据
- 未经明确批准，不要创建全局状态
- 不要用 'any' 类型绕过 TypeScript

With this context, Claude becomes remarkably effective. It’s like the difference between explaining your project to a new hire every single day versus having them read the onboarding docs once.

有了这个上下文，Claude 的效率会变得出奇地高。这就像每天都向新员工解释一遍项目，和让他们只读一次入职文档之间的区别。

But Pair Programming Mode requires more than just documentation. You need to actively guide the AI with what I call “anchor comments”—breadcrumbs that prevent Claude from wandering into the wilderness:

但是结对编程模式需要的不仅仅是文档。你需要用我称之为“锚点注释”的东西来主动引导 AI——这些就像路标，防止 Claude 在荒野中迷路：

// AIDEV-NOTE: 此组件为性能考虑使用虚拟滚动
// 参见: https://tanstack.com/virtual/latest
// 不要转换成常规的 mapping——我们需要处理 1 万以上的项目

export function DataTable({ items }: DataTableProps) {
  // Claude，编辑这里时，请保持虚拟滚动
  ...
}

These comments serve a dual purpose: they guide the AI and document your code for humans. It’s documentation that pays dividends in both directions. The key distinction between such “anchor comments” and regular comments: these are written, maintained, and meant to be used by Claude itself. Here’s an actual snippet from our project’s CLAUDE.md:

这些注释有双重目的：既能引导 AI，又能为人类开发者提供文档。这种文档在两个方面都能带来回报。这类“锚点注释”和普通注释的关键区别在于：它们是为 Claude 编写、维护和使用的。这是我们项目的 CLAUDE.md 中的一个真实片段：

## Anchor comments

Add specially formatted comments throughout the codebase, where appropriate, for yourself as inline knowledge that can be easily `grep`ped for.

### Guidelines:

- Use `AIDEV-NOTE:`, `AIDEV-TODO:`, or `AIDEV-QUESTION:` (all-caps prefix) for comments aimed at AI and developers.
- Keep them concise (≤ 120 chars).
- **Important:** Before scanning files, always first try to **locate existing anchors** `AIDEV-*` in relevant subdirectories.
- **Update relevant anchors** when modifying associated code.
- **Do not remove `AIDEV-NOTE`s** without explicit human instruction.

Example:
# AIDEV-NOTE: perf-hot-path; avoid extra allocations (see ADR-24)
async def render_feed(...):
    ...

## 锚点注释

在代码库中适当的位置添加特殊格式的注释，作为可以轻松 `grep` 搜索的内联知识。

### 指南：

- 使用 `AIDEV-NOTE:`、`AIDEV-TODO:` 或 `AIDEV-QUESTION:`（全大写前缀）作为面向 AI 和开发者的注释。
- 保持简洁（≤ 120 字符）。
- **重要：** 在扫描文件之前，总是先尝试在相关子目录中**定位已有的锚点** `AIDEV-*`。
- 在修改相关代码时，**更新相关的锚点**。
- 未经人类明确指示，**不要移除 `AIDEV-NOTE`**。

示例：
# AIDEV-NOTE: perf-hot-path; avoid extra allocations (see ADR-24)
async def render_feed(...):
    ...

模式三：生产/Monorepo 规模

RTFM RTFM

When to use it: Large codebases, systems with real users, anything where bugs cost money or reputation.

适用场景：大型代码库、有真实用户的系统，以及任何 bug 会造成金钱或声誉损失的项目。

Claude can generate tremendous amounts of code, but integrating it into a complex system requires careful orchestration.

Claude 能生成海量代码，但要将其集成到一个复杂系统中，需要精心的编排。

Let me start with a big caveat: vibe coding at this scale does NOT scale very well, yet. I definitely do see these systems getting significantly better at handling larger codebases but, for them to be effective, significant effort is needed to help them navigate, understand, and safely hack on them without getting lost in a maze. Generally speaking, it’s better to section them into individual services, and 6sub modules when possible.

让我先说一个重要的警告：在这种规模下，vibe coding 的扩展性还不是很好。我确实看到这些系统在处理大型代码库方面正变得越来越好，但是，要让它们有效，就需要投入大量精力来帮助它们导航、理解和安全地进行修改，而不会迷失在迷宫中。总的来说，最好是尽可能将项目划分为独立的服务和 6子模块。

As a universal principle, good engineering practices apply to large-scale projects, vibe coded or not. For example, at production scale, boundaries become critical. Every integration point needs explicit documentation:

作为一个普适原则，优秀的工程实践适用于所有大规模项目，无论是否使用 vibe coding。例如，在生产规模下，边界变得至关重要。每个集成点都需要明确的文档：

# AIDEV-NOTE: API Contract Boundary - v2.3.1
# ANY changes require version bump and migration plan
# See: docs/api-versioning.md

@router.get("/users/{user_id}/feed")
async def get_user_feed(user_id: UUID) -> FeedResponse:
    # Claude: the response shape here is sacred
    # Changes break real apps in production
    ...

# AIDEV-NOTE: API 合约边界 - v2.3.1
# 任何变更都需要提升版本号并制定迁移计划
# 参见：docs/api-versioning.md

@router.get("/users/{user_id}/feed")
async def get_user_feed(user_id: UUID) -> FeedResponse:
    # Claude：这里的响应结构是神圣不可侵犯的
    # 任何改动都会破坏生产环境中的真实应用
    ...

Without these boundaries, Claude will happily “improve” your API and break every client in production. Bottom line: larger projects should definitely start adopting vibe coding in parts, and adopt methodologies that enhance that experience but, don’t expect to land large features reliably just yet. (as of June 7, 2025 / AI epoch)

没有这些边界，Claude 会很乐意地“改进”你的 API，然后破坏掉生产环境中的每一个客户端。底线是：大型项目绝对应该开始在局部采纳 vibe coding，并采用能增强这种体验的方法论，但不要指望能可靠地交付大型功能，至少目前还不行。（截至 2025年6月7日 / AI 纪元）

基础设施：可持续 AI 开发的基石

`CLAUDE.md`: Your Single Source of Truth

Let me be absolutely clear about this: CLAUDE.md is not optional documentation. Every minute you spend updating it saves an hour of cleanup later.

让我把话说得再清楚不过：CLAUDE.md 不是可有可无的文档。你在更新它上面花的每一分钟，都能在以后节省一个小时的清理时间。

Think of CLAUDE.md as a constitution for your codebase. It establishes the fundamental laws that govern how code should be written, how systems interact, and what patterns to follow or avoid. Organizations that invest in developing the skills and capabilities of their teams get better outcomes—and your CLAUDE.md is that investment crystallized into documentation.

把 CLAUDE.md 想象成你代码库的宪法。它确立了代码应该如何编写、系统应该如何交互、以及应该遵循或避免哪些模式的基本法则。那些投资于发展团队技能和能力的公司能获得更好的结果——而你的 CLAUDE.md 就是这种投资在文档上的结晶。

Here’s an abridged version of our production CLAUDE.md structure, refined over thousands of AI-assisted commits:

这是我们生产环境 CLAUDE.md 结构的精简版，它经过了数千次 AI 辅助 commit 的提炼：

# `CLAUDE.md` - Julep Backend Service

## The Golden Rule
When unsure about implementation details, ALWAYS ask the developer.

## Project Context
Julep enables developers to build stateful AI agents using declarative
workflows.

## Critical Architecture Decisions

### Why Temporal?
We use Temporal for workflow orchestration because:
1. Workflows can run for days/weeks with perfect reliability
2. Automatic recovery from any failure point

### Why PostgreSQL + pgvector?
1. ACID compliance for workflow state (can't lose user data)
2. Vector similarity search for agent memory

### Why TypeSpec?
Single source of truth for API definitions:
- OpenAPI specs
- TypeScript/Python clients
- Validation schemas

## Code Style and Patterns

### Anchor comments

Add specially formatted comments throughout the codebase, where appropriate, for yourself as inline knowledge that can be easily `grep`ped for.

### Guidelines:

- Use `AIDEV-NOTE:`, `AIDEV-TODO:`, or `AIDEV-QUESTION:` (all-caps prefix) for comments aimed at AI and developers.
- **Important:** Before scanning files, always first try to **grep for existing anchors** `AIDEV-*` in relevant subdirectories.
- **Update relevant anchors** when modifying associated code.
- **Do not remove `AIDEV-NOTE`s** without explicit human instruction.
- Make sure to add relevant anchor comments, whenever a file or piece of code is:
  * too complex, or
  * very important, or
  * confusing, or
  * could have a bug

## Domain Glossary (Claude, learn these!)

- **Agent**: AI entity with memory, tools, and defined behavior
- **Task**: Workflow definition composed of steps (NOT a Celery task)
- **Execution**: Running instance of a task
- **Tool**: Function an agent can call (browser, API, etc.)
- **Session**: Conversation context with memory
- **Entry**: Single interaction within a session

## What AI Must NEVER Do

1. **Never modify test files** - Tests encode human intent
2. **Never change API contracts** - Breaks real applications
3. **Never alter migration files** - Data loss risk
4. **Never commit secrets** - Use environment variables
5. **Never assume business logic** - Always ask
6. **Never remove AIDEV- comments** - They're there for a reason

Remember: We optimize for maintainability over cleverness.
When in doubt, choose the boring solution.

# `CLAUDE.md` - Julep 后端服务

## 黄金法则
当不确定实现细节时，永远询问开发者。

## 项目背景
Julep 让开发者能使用声明式工作流构建有状态的 AI agents。

## 关键架构决策

### 为什么用 Temporal？
我们使用 Temporal 进行工作流编排，因为：
1. 工作流可以极高可靠性地运行数天/数周
2. 能从任何故障点自动恢复

### 为什么用 PostgreSQL + pgvector？
1. 工作流状态的 ACID 合规性（不能丢失用户数据）
2. 用于 agent 记忆的向量相似性搜索

### 为什么用 TypeSpec？
API 定义的单一事实来源：
- OpenAPI 规范
- TypeScript/Python 客户端
- 验证 schemas

## 代码风格与模式

### 锚点注释

在代码库中适当的位置添加特殊格式的注释，作为可以轻松 `grep` 搜索的内联知识。

### 指南：

- 使用 `AIDEV-NOTE:`、`AIDEV-TODO:` 或 `AIDEV-QUESTION:`（全大写前缀）作为面向 AI 和开发者的注释。
- **重要：** 在扫描文件之前，总是先尝试在相关子目录中 **`grep` 搜索已有的锚点** `AIDEV-*`。
- 在修改相关代码时，**更新相关的锚点**。
- 未经人类明确指示，**不要移除 `AIDEV-NOTE`**。
- 当一个文件或一段代码：
  * 太复杂，或
  * 非常重要，或
  * 令人困惑，或
  * 可能有 bug 时，
  确保添加相关的锚点注释。

## 领域术语表 (Claude, 学会这些！)

- **Agent**: 拥有记忆、工具和明确行为的 AI 实体
- **Task**: 由多个步骤组成的工作流定义 (不是 Celery 的 task)
- **Execution**: 一个 task 的运行实例
- **Tool**: agent 可以调用的函数 (浏览器、API 等)
- **Session**: 带有记忆的对话上下文
- **Entry**: session 中的单次交互

## AI 绝对不能做的事

1. **绝不修改测试文件** - 测试编码了人类的意图
2. **绝不更改 API 合约** - 会破坏真实的应用
3. **绝不修改迁移文件** - 有数据丢失风险
4. **绝不提交机密信息** - 使用环境变量
5. **绝不臆测业务逻辑** - 永远要问
6. **绝不移除 AIDEV- 注释** - 它们的存在必有其因

记住：我们追求可维护性，而非抖机灵。
如有疑问，选择那个无聊的方案。

This document becomes the shared context between you and Claude. It’s like having a senior developer whispering guidance in Claude’s ear throughout the coding session.

这份文档成为了你和 Claude 之间的共享上下文。这就像在整个编码过程中，有一位资深开发者在 Claude 耳边低声指导。

锚点注释：大规模应用的路标

As your codebase grows, CLAUDE.md alone isn’t enough. You need inline guidance—what I call anchor comments. These serve as local context that prevents AI from making locally bad decisions.

随着代码库的增长，单靠 CLAUDE.md 是不够的。你需要内联的指导——我称之为锚点注释。它们作为局部上下文，防止 AI 做出局部性的错误决策。

Think of your codebase as a city and anchor comments as street signs. Without them, even smart visitors get lost. Here’s how we use them effectively:

把你的代码库想象成一座城市，锚点注释就是路牌。没有它们，再聪明的访客也会迷路。我们是这样有效使用它们的：

# AIDEV-NOTE: Critical performance path - this serves 100k req/sec
# DO NOT add database queries here
def get_user_feed(user_id: UUID, cached_data: FeedCache) -> List[FeedItem]:
    # We need to avoid mutating the cached data
    items = cached_data.items[:]

    # AIDEV-TODO: Implement pagination (ticket: FEED-123)
    # Need cursor-based pagination for infinite scroll

    # AIDEV-QUESTION: Why do we filter private items here instead of in cache?
    # AIDEV-ANSWER: Historical context: Privacy rules can change between cache updates
    filtered = [item for item in items if user_has_access(user_id, item)]

    return filtered

# AIDEV-NOTE: 关键性能路径 - 这里每秒处理 10 万请求
# 不要在这里添加数据库查询
def get_user_feed(user_id: UUID, cached_data: FeedCache) -> List[FeedItem]:
    # 我们需要避免修改缓存数据
    items = cached_data.items[:]

    # AIDEV-TODO: 实现分页 (ticket: FEED-123)
    # 需要基于游标的分页来实现无限滚动

    # AIDEV-QUESTION: 为什么我们在这里过滤私有项目，而不是在缓存中？
    # AIDEV-ANSWER: 历史原因：隐私规则在缓存更新之间可能会改变
    filtered = [item for item in items if user_has_access(user_id, item)]

    return filtered

These comments create a narrative that helps both AI and humans understand not just what the code does, but why it does it that way.

这些注释创造了一种叙事，帮助 AI 和人类不仅理解代码做了什么，更理解为什么这么做。

用于 AI 开发的 Git 工作流

One of the most underappreciated aspects of AI-assisted development is how it changes your git workflow. You’re now generating code at a pace that can quickly pollute your git history if you’re not careful.

AI 辅助开发中最被低估的一点是它如何改变你的 git 工作流。你现在生成代码的速度非常快，如果不小心，会迅速污染你的 git 历史。

It really only applies to very large codebases because it is not a very straightforward tool, but I recommend using git worktrees to create isolated environments for AI experiments:

这真的只适用于非常大的代码库，因为它不是一个很直接的工具，但我推荐使用 git worktrees 来为 AI 实验创建隔离的环境：

# Create an AI playground without polluting main
git worktree add ../ai-experiments/cool-feature -b ai/cool-feature

# Let Claude go wild in the isolated worktree
cd ../ai-experiments/cool-feature
# ... lots of experimental commits ...

# Cherry-pick the good stuff back to main
cd ../main-repo
git cherry-pick abc123  # Just the commits that worked

# Clean up when done
git worktree remove ../ai-experiments/cool-feature

# 创建一个 AI 游乐场，不污染主干
git worktree add ../ai-experiments/cool-feature -b ai/cool-feature

# 让 Claude 在隔离的 worktree 里尽情发挥
cd ../ai-experiments/cool-feature
# ... 大量实验性的 commit ...

# 把好的部分 cherry-pick 回主干
cd ../main-repo
git cherry-pick abc123  # 只保留那些有用的 commit

# 完成后清理
git worktree remove ../ai-experiments/cool-feature

Pro tip: Read about how to use worktrees, and check out the nifty wt tool.

专业提示：读一下如何使用 worktrees，并看看这个好用的小工具 wt。

This approach gives you the best of both worlds: Claude can experiment freely while your main branch history stays clean and meaningful.

这种方法让你两全其美：Claude 可以自由实验，而你的主分支历史保持干净且有意义。

For commit messages, we’ve standardized on tagging AI-assisted commits:

对于 commit message，我们已经标准化了对 AI 辅助 commit 的标记：

feat: implement user feed caching [AI]

- Add Redis-based cache for user feeds
- Implement cache warming on user login
- Add metrics for cache hit rate

AI-assisted: core logic generated, tests human-written

feat: 实现用户 feed 缓存 [AI]

- 添加基于 Redis 的用户 feed 缓存
- 实现用户登录时的缓存预热
- 添加缓存命中率的指标

AI 辅助：核心逻辑由 AI 生成，测试由人类编写

This transparency helps during code review—reviewers know to pay extra attention to AI-generated code.

这种透明度有助于代码审查——审查者知道要格外关注 AI 生成的代码。

神圣法则：人类编写测试

Now we come to the most important principle in AI-assisted development. It’s so important that I’m going to repeat it in multiple ways until it’s burned into your memory:

现在我们来到了 AI 辅助开发中最重要的原则。它如此重要，以至于我要用多种方式重复它，直到它刻进你的记忆里：

Never. Let. AI. Write. Your. Tests.

永远。不要。让。AI。写。你的。测试。

Tests are not just code that verifies other code works. Tests are executable specifications. They encode your actual intentions, your edge cases, your understanding of the problem domain. High performers excel at both speed and stability—there’s no trade-off. Tests are how you achieve both.

测试不仅仅是验证其他代码能否工作的代码。测试是可执行的规范。它们编码了你的真实意图、你的边界情况、你对问题领域的理解。高绩效者在速度和稳定性上都表现出色——这之间没有权衡。测试是你同时实现两者的途径。

当心…

Let me illustrate why this matters with an example. Let’s say we asked Claude to implement a rate limiter:

让我用一个例子来说明为什么这很重要。假设我们让 Claude 实现一个速率限制器：

class RateLimiter:
    def __init__(self, max_requests: int, window_seconds: int):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.requests = defaultdict(list)

    def is_allowed(self, user_id: str) -> bool:
        now = time.time()
        user_requests = self.requests[user_id]

        # Clean old requests
        self.requests[user_id] = [
            req_time for req_time in user_requests
            if now - req_time < self.window_seconds
        ]

        if len(self.requests[user_id]) < self.max_requests:
            self.requests[user_id].append(now)
            return True
        return False

Looks reasonable, right? Claude even helpfully generated tests:

看起来很合理，对吧？Claude 甚至还贴心地生成了测试：

def test_rate_limiter():
    limiter = RateLimiter(max_requests=3, window_seconds=60)

    assert limiter.is_allowed("user1") == True
    assert limiter.is_allowed("user1") == True
    assert limiter.is_allowed("user1") == True
    assert limiter.is_allowed("user1") == False  # Limit reached

But here’s what Claude’s tests missed—what only a human who understands the business requirements would test: Claude’s implementation has a memory leak. Users who hit the API once and never return leave their data in memory forever. The AI-generated tests check the happy path but miss this critical production concern.

但 Claude 的测试漏掉了一些东西——只有理解业务需求的人类才会去测试的东西：Claude 的实现存在内存泄漏。那些只请求了一次 API 就再也没回来的用户，他们的数据会永远留在内存里。AI 生成的测试只检查了正常路径，却漏掉了这个关键的生产问题。

Vibe coding at its best

This is why humans write tests. We understand the context, the production environment, the edge cases that matter. At Julep, our rule is absolute:

这就是为什么人类要写测试。我们理解上下文、生产环境、以及那些重要的边界情况。在 Julep，我们的规则是绝对的：

## Testing Discipline

| What | AI CAN Do | AI MUST NOT Do |
|------|-----------|----------------|
| Implementation | Generate business logic | Touch test files |
| Test Planning | Suggest test scenarios | Write test code |
| Debugging | Analyze test failures | Modify test expectations |

If an AI tool touches a test file, the PR gets rejected. No exceptions.

## 测试纪律

| 事项 | AI 可以做 | AI 绝不能做 |
|------|-----------|----------------|
| 实现 | 生成业务逻辑 | 触碰测试文件 |
| 测试计划 | 建议测试场景 | 编写测试代码 |
| 调试 | 分析测试失败 | 修改测试期望 |

如果 AI 工具动了测试文件，PR 会被拒绝。没有例外。

Your tests are your specification. They’re your safety net. They’re the encoded wisdom of every bug you’ve fixed and every edge case you’ve discovered. Guard them zealously.

你的测试就是你的规范。它们是你的安全网。它们是你修复的每个 bug 和发现的每个边界情况的智慧结晶。请热情地守护它们。

扩展而不被淹没：Token 经济学与上下文管理

One of the most counterintuitive lessons in AI-assisted development is that being stingy with context to save tokens actually costs you more. It’s like trying to save money on gas by only filling your tank halfway—you just end up making more trips to the gas station.

在 AI 辅助开发中，一个最反直觉的教训是：为了节省 token 而吝啬于提供上下文，实际上会让你花费更多。这就像为了省油钱每次只加半箱油——结果只会让你更频繁地跑加油站。

Token budgets matter. Provide focused prompts, reduce diff length, and avoid large-file bloat by summarizing intent in advance. But “focused” doesn’t mean “minimal”—it means “relevant and complete.”

Token 预算很重要。提供专注的提示，减少 diff 长度，通过提前总结意图来避免大文件膨胀。但“专注”不等于“最少”——它意味着“相关且完整”。

Let me show you the false economy of starved prompts:

让我给你看看“饥饿”提示的虚假经济学：

Starved Prompt Attempt:

“饥饿”提示尝试：

"Add caching to the user endpoint"

“给用户端点添加缓存”

Claude’s Response: Implements caching… but:

Claude 的回应： 实现了缓存……但是：

Uses in-memory cache (won’t work with multiple servers)
No cache invalidation strategy
No metrics or monitoring
No consideration of cache stampede
使用内存缓存（在多服务器环境下无效）
没有缓存失效策略
没有指标或监控
没有考虑缓存击穿问题

Result: 3 more rounds of fixes, 4x the tokens spent.

结果： 又经过 3 轮修复，花费了 4 倍的 token。

Proper Context-Rich Prompt:

恰当的、富含上下文的提示：

Add Redis caching to the GET /users/{id} endpoint.

Context:
- This endpoint serves 50k requests/minute
- We run 12 API servers behind a load balancer
- User data changes infrequently (few times per day)
- We already have Redis at cache.redis.internal:6379
- Use our standard cache key pattern: "user:v1:{id}"
- Include cache hit/miss metrics (we use Prometheus)
- Implement cache-aside pattern with 1 hour TTL
- Handle cache stampede with probabilistic early expiration

See our caching guide: docs/patterns/caching.md

为 GET /users/{id} 端点添加 Redis 缓存。

上下文：
- 这个端点每分钟处理 5 万次请求
- 我们在负载均衡器后运行 12 台 API 服务器
- 用户数据不经常变化（每天几次）
- 我们已经在 cache.redis.internal:6379 部署了 Redis
- 使用我们的标准缓存键模式："user:v1:{id}"
- 包含缓存命中/未命中指标（我们使用 Prometheus）
- 实现带有 1 小时 TTL 的 cache-aside 模式
- 通过概率性提前过期来处理缓存击穿

参见我们的缓存指南：docs/patterns/caching.md

The lesson? Front-load context to avoid iteration cycles. Think of tokens like investing in good tools—the upfront cost pays for itself many times over.

教训是什么？提前加载上下文以避免迭代周期。把 token 想象成投资好的工具——前期的成本会带来多倍的回报。

In fact, I recommend that all projects should routinely ask Claude to look through the codebase changes, and add context to CLAUDE.md

事实上，我建议所有项目都应定期让 Claude 查看代码库的变更，并将上下文添加到 CLAUDE.md。

新会话与心智模型

Here’s another counterintuitive practice: use fresh Claude sessions for distinct tasks. It’s tempting to keep one long-running conversation, but this leads to context pollution.

这是另一个反直觉的实践：为不同的任务使用全新的 Claude 会话。保持一个长期运行的对话很诱人，但这会导致上下文污染。

Think of it like this: you wouldn’t use the same cutting board for vegetables after cutting raw chicken. Similarly, don’t use the same Claude session for database migrations after discussing frontend styling. The context bleeds through in subtle ways.

=可以这样想：你不会在切完生鸡肉后用同一块砧板切蔬菜。同样，在讨论完前端样式后，不要用同一个 Claude 会话来处理数据库迁移。上下文会以微妙的方式相互渗透。=

Our rule: One task, one session. When the task is done, start fresh. This keeps Claude’s “mental model” clean and focused.

我们的规则是：一个任务，一个会话。当任务完成时，重新开始。这能保持 Claude 的“心智模型”干净和专注。

案例研究：在生产中发布结构化错误

Let me walk you through a real refactoring we did at Julep that showcases production-scale vibe-coding. We needed to replace our ad-hoc error handling with a structured error hierarchy across 500+ endpoints.

让我带你回顾一个我们在 Julep 做的真实重构案例，它展示了生产规模的 vibe-coding。我们需要用一个结构化的错误层级，来替换我们遍布 500 多个端点的临时错误处理方式。

The Human Decisions (The Why):

人类的决策（为什么）：

First, we had to decide on our error taxonomy. This is pure architectural work—Claude can’t make these decisions because they involve understanding our business, our users, and our operational needs:

首先，我们必须决定我们的错误分类体系。这是纯粹的架构工作——Claude 无法做出这些决定，因为这需要理解我们的业务、我们的用户和我们的运营需求：

# SPEC.md - Error Hierarchy Design (Human-Written)

## Error Philosophy
- Client errors (4xx) must include actionable feedback
- System errors (5xx) must include trace IDs for debugging
- All errors must be JSON-serializable
- Error codes must be stable (clients depend on them)

## Hierarchy
BaseError
├── ClientError (4xx)
│   ├── ValidationError
│   │   ├── SchemaValidationError - Request doesn't match schema
│   │   ├── BusinessRuleError - Valid schema, invalid business logic
│   │   └── RateLimitError - Too many requests
│   └── AuthError
│       ├── AuthenticationError - Who are you?
│       └── AuthorizationError - You can't do that
└── SystemError (5xx)
    ├── DatabaseError - Connection, timeout, deadlock
    ├── ExternalServiceError - APIs, webhooks failing
    └── InfrastructureError - Disk full, OOM, etc.

## Error Response Format
{
  "error": {
    "code": "VALIDATION_FAILED",     // Stable code for clients
    "message": "Email already exists", // Human-readable
    "details": { ... },               // Structured data
    "trace_id": "abc-123-def"         // For debugging
  }
}

# SPEC.md - 错误层级设计（人类编写）

## 错误哲学
- 客户端错误 (4xx) 必须包含可操作的反馈
- 系统错误 (5xx) 必须包含用于调试的 trace ID
- 所有错误必须可 JSON 序列化
- 错误码必须稳定（客户端依赖它们）

## 层级
BaseError
├── ClientError (4xx)
│   ├── ValidationError
│   │   ├── SchemaValidationError - 请求与 schema 不匹配
│   │   ├── BusinessRuleError - schema 有效，但业务逻辑无效
│   │   └── RateLimitError - 请求过多
│   └── AuthError
│       ├── AuthenticationError - 你是谁？
│       └── AuthorizationError - 你不能这么做
└── SystemError (5xx)
    ├── DatabaseError - 连接、超时、死锁
    ├── ExternalServiceError - API、webhook 失败
    └── InfrastructureError - 磁盘满、内存溢出等

## 错误响应格式
{
  "error": {
    "code": "VALIDATION_FAILED",     // 为客户端提供的稳定代码
    "message": "邮箱已存在", // 人类可读
    "details": { ... },               // 结构化数据
    "trace_id": "abc-123-def"         // 用于调试
  }
}

The AI Execution (The How):

AI 的执行（怎么做）：

With the specification clear, we unleashed Claude on the mechanical refactoring:

在规范明确后，我们让 Claude 去做机械的重构工作：

### Prompt to Claude:

Refactor our error handling to match SPEC.md.

Current state:
- raise ValueError("Invalid email")
- return {"error": "Something went wrong"}, 500

Target state:
- Use error hierarchy from SPEC.md
- Include proper error codes
- Add trace_id to all 5xx errors

Start with the auth module. Show me the plan before implementing.

### 给 Claude 的提示：

根据 SPEC.md 重构我们的错误处理。

当前状态：
- raise ValueError("无效的邮箱")
- return {"error": "出错了"}, 500

目标状态：
- 使用 SPEC.md 中的错误层级
- 包含正确的错误码
- 为所有 5xx 错误添加 trace_id

从 auth 模块开始。在实施前，先给我看计划。

Claude’s plan was solid:

Claude 的计划很可靠：

1. Create error hierarchy in `common/errors.py`
2. Create error response formatter
3. Update each module systematically
4. Add error handling middleware

1. 在 `common/errors.py` 中创建错误层级
2. 创建错误响应格式化器
3. 系统地更新每个模块
4. 添加错误处理中间件

Claude was able to handle the tedious work of finding and updating 500+ error sites, while we focused on reviewing:

Claude 能够处理查找和更新 500 多个错误点的繁琐工作，而我们则专注于审查：

# Before (Claude found these patterns):
if not user:
    raise Exception("User not found")

# After (Claude's refactoring):
if not user:
    raise AuthenticationError(
        message="User not found",
        code="USER_NOT_FOUND",
        details={"identifier": email}
    )

# 之前（Claude 发现了这些模式）：
if not user:
    raise Exception("用户未找到")

# 之后（Claude 的重构）：
if not user:
    raise AuthenticationError(
        message="用户未找到",
        code="USER_NOT_FOUND",
        details={"identifier": email}
    )

Combined with our carefully written CLAUDE.md file, meticulous docs, regularly updated anchor comments, and clear instructions, results:

Time: 4 hours instead of 2 days

Coverage: All 500+ error sites updated

结合我们精心编写的 CLAUDE.md 文件、细致的文档、定期更新的锚点注释和清晰的指令，结果是：

时间：4 小时，而不是 2 天

覆盖范围：所有 500 多个错误点都已更新

AI 时代的领导力与文化

Your role as a senior engineer has fundamentally shifted. You’re no longer just writing code—you’re curating knowledge, setting boundaries, and teaching both humans and AI systems how to work effectively.

作为一名资深工程师，你的角色已经发生了根本性的转变。你不再只是写代码——你还在整理知识、设定边界，并教导人类和 AI 系统如何高效工作。

Lean management and continuous delivery practices help improve software delivery performance, which in turn improves organizational performance—and this includes how you manage AI collaboration.

精益管理和持续交付实践有助于提高软件交付性能，从而提高组织绩效——这也包括你如何管理与 AI 的协作。

新的入职清单

When new developers join our team, they get two onboarding tracks: one for humans, one for working with AI. Here’s our combined checklist:

当新开发者加入我们团队时，他们会接受两个入职培训：一个是针对人的，另一个是关于与 AI 合作的。这是我们合并后的清单：

第一周：基础

□ Read team `CLAUDE.md` files (start with root, then service-specific)
□ Set up development environment
□ Make first PR (human-written, no AI)

□ 阅读团队的 `CLAUDE.md` 文件（从根目录开始，然后是具体服务）
□ 设置开发环境
□ 提交第一个 PR（人类编写，无 AI）

第二周：有指导的 AI 协作

□ Set up Claude with team templates
□ Complete "toy problem" with AI assistance
□ Practice prompt patterns
□ Create first AI-assisted PR (with supervision)

□ 使用团队模板设置 Claude
□ 在 AI 辅助下完成一个“玩具问题”
□ 练习提示模式
□ 在监督下创建第一个 AI 辅助的 PR

第三周：独立工作

□ Ship first significant AI-assisted feature
□ Write tests for another developer's AI output
□ Lead one code review session

□ 发布第一个重要的 AI 辅助功能
□ 为其他开发者 AI 输出的代码编写测试
□ 主持一次代码审查会议

建立透明文化

One cultural shift that’s essential: normalize disclosure of AI assistance. We’re not trying to hide that we use AI—we’re trying to use it responsibly. Every commit message that includes AI work gets tagged:

一个至关重要的文化转变是：将披露 AI 辅助的行为常态化。我们不是要隐藏我们使用 AI 的事实，而是要负责任地使用它。每一个包含 AI 工作的 commit message 都会被打上标签：

# Our .gitmessage template
# feat/fix/docs: <description> [AI]?
#
# [AI] - Significant AI assistance (>50% generated)
# [AI-minor] - Minor AI assistance (<50% generated)
# [AI-review] - AI used for code review only
#
# Example:
# feat: add Redis caching to user service [AI]
#
# AI generated the cache implementation and Redis client setup.
# I designed the cache key structure and wrote all tests.
# Manually verified cache invalidation logic works correctly.

# 我们的 .gitmessage 模板
# feat/fix/docs: <描述> [AI]?
#
# [AI] - 大量 AI 辅助（>50% 生成）
# [AI-minor] - 少量 AI 辅助（<50% 生成）
# [AI-review] - AI 仅用于代码审查
#
# 示例:
# feat: 为用户服务添加 Redis 缓存 [AI]
#
# AI 生成了缓存实现和 Redis 客户端设置。
# 我设计了缓存键结构并编写了所有测试。
# 手动验证了缓存失效逻辑的正确性。

This transparency serves multiple purposes:

这种透明度有多种目的：

Reviewers know to pay extra attention
Future debuggers understand the code’s provenance
No one feels shame about using available tools
审查者知道要格外留心
未来的调试者了解代码的来源
没有人会因为使用现有工具而感到羞耻

Creating an environment where developers can leverage AI effectively, without fear or shame, is part of building that high-performing culture.

创造一个让开发者能够有效利用 AI、而没有恐惧或羞耻的环境，是建立高绩效文化的一部分。

Claude 绝不能碰的东西（刻在石头上）

Let’s be crystal clear about boundaries. These aren’t suggestions—they’re commandments. Violate them at your peril.

让我们把边界说得清清楚楚。这些不是建议——它们是戒律。违者后果自负。

神圣的“绝不触碰”清单

❌ 测试文件

# This is SACRED GROUND
# No AI shall pass
def test_critical_business_logic():
    """This test encodes $10M worth of domain knowledge"""
    pass

# 这是神圣之地
# AI 不得入内
def test_critical_business_logic():
    """这个测试编码了价值千万美元的领域知识"""
    pass

Tests encode human understanding. They’re your safety net, your specification, your accumulated wisdom. When Claude writes tests, it’s just verifying that the code does what the code does—not what it should do.

测试编码了人类的理解。它们是你的安全网、你的规范、你积累的智慧。当 Claude 写测试时，它只是在验证代码做了代码做的事情——而不是它应该做的事情。

❌ 数据库迁移

-- migrations/2024_01_15_restructure_users.sql
-- DO NOT LET AI TOUCH THIS
-- One wrong move = data loss = career loss
ALTER TABLE users ADD COLUMN subscription_tier VARCHAR(20);
UPDATE users SET subscription_tier = 'free' WHERE subscription_tier IS NULL;
ALTER TABLE users ALTER COLUMN subscription_tier SET NOT NULL;

-- migrations/2024_01_15_restructure_users.sql
-- 不要让 AI 碰这个
-- 一步走错 = 数据丢失 = 职业生涯终结
ALTER TABLE users ADD COLUMN subscription_tier VARCHAR(20);
UPDATE users SET subscription_tier = 'free' WHERE subscription_tier IS NULL;
ALTER TABLE users ALTER COLUMN subscription_tier SET NOT NULL;

Migrations are irreversible in production. They require understanding of data patterns, deployment timing, and rollback strategies that AI cannot grasp.

迁移在生产环境中是不可逆的。它们需要对数据模式、部署时机和回滚策略的理解，而这些是 AI 无法掌握的。

❌ 安全关键代码

# auth/jwt_validator.py
# HUMAN EYES ONLY - Security boundary
def validate_token(token: str) -> Optional[UserClaims]:
    # Every line here has been security-reviewed
    # Changes require security team approval
    # AI suggestions actively dangerous here

# auth/jwt_validator.py
# 仅限人类阅读 - 安全边界
def validate_token(token: str) -> Optional[UserClaims]:
    # 这里的每一行都经过了安全审查
    # 任何改动都需要安全团队批准
    # AI 的建议在这里是极其危险的

❌ 没有版本控制的 API 合约

# openapi.yaml
# Breaking this = breaking every client
# AI doesn't understand mobile app release cycles
paths:
  /api/v1/users/{id}:
    get:
      responses:
        200:
          schema:
            $ref: '#/definitions/UserResponse'  # FROZEN until v2

# openapi.yaml
# 破坏这个 = 破坏所有客户端
# AI 不理解移动应用的发布周期
paths:
  /api/v1/users/{id}:
    get:
      responses:
        200:
          schema:
            $ref: '#/definitions/UserResponse'  # 在 v2 之前冻结

❌ 配置和机密

# config/production.py
DATABASE_URL = os.environ["DATABASE_URL"]  # Never hardcode
STRIPE_SECRET_KEY = os.environ["STRIPE_SECRET_KEY"]  # Obviously
FEATURE_FLAGS = {
    "new_pricing": False,  # Requires product decision
}

# config/production.py
DATABASE_URL = os.environ["DATABASE_URL"]  # 绝不硬编码
STRIPE_SECRET_KEY = os.environ["STRIPE_SECRET_KEY"]  # 显而易见
FEATURE_FLAGS = {
    "new_pricing": False,  # 需要产品决策
}

AI 错误的层级

Not all AI mistakes are equal. Here’s how we categorize them:

并非所有 AI 错误都是平等的。我们是这样对它们分类的：

第一级：烦人但无害

Wrong formatting (your linter will catch it)
Verbose code (refactor later)
Suboptimal algorithms (profile will reveal)
错误的格式（你的 linter 会抓住它）
冗长的代码（以后重构）
次优的算法（性能分析会揭示）

第二级：修复成本高

Breaking internal APIs (requires coordination)
Changing established patterns (confuses team)
Adding unnecessary dependencies (bloat)
破坏内部 API（需要协调）
改变既定模式（让团队困惑）
添加不必要的依赖（臃肿）

第三级：葬送职业生涯

Modifying tests to make them pass
Breaking API contracts
Leaking secrets or PII
Corrupting data migrations
修改测试以使其通过
破坏 API 合约
泄露机密或个人身份信息（PII）
损坏数据迁移

Your guardrails should be proportional to the mistake level. Level 1 mistakes teach juniors. Level 3 mistakes teach you to update your LinkedIn.

你的防护措施应该与错误级别成正比。第一级的错误能教导初级开发者。第三级的错误能教你更新你的领英简历。

开发的未来：方向何在

As I write this in 2025, we’re in the awkward adolescence of AI-assisted development. The tools are powerful but clumsy, like a teenager who just hit a growth spurt. But the trajectory is clear, and it’s accelerating.

在我于 2025 年写下这篇文章时，我们正处于 AI 辅助开发的尴尬青春期。这些工具强大但笨拙，就像一个刚经历猛长期的青少年。但其发展轨迹是清晰的，而且正在加速。

Good documentation is foundational for successfully implementing DevOps capabilities. The teams that excel will be those who treat documentation as code, who maintain their CLAUDE.md files with the same rigor as their test suites.

好的文档是成功实施 DevOps 能力的基础。那些出类拔萃的团队，会像对待测试套件一样，以同样的严谨性对待文档即代码，维护他们的 CLAUDE.md 文件。

What I see coming (~roughly in order of arrival):

我预见的未来（~大致按出现顺序）：

Proactive AI that suggests improvements without prompting
AI that learns your team’s patterns and preferences
Persistent memory across sessions and projects
AI that understands entire codebases, not just files
无需提示就能主动提出改进建议的 AI
能学习你团队模式和偏好的 AI
跨会话和项目的持久记忆
能理解整个代码库而不仅仅是文件的 AI

But even as capabilities expand, the fundamentals remain: humans set direction, AI provides leverage. We’re tool users, and these are simply the most powerful tools we’ve ever created.

但即使能力在扩展，基本原则依然不变：人类设定方向，AI 提供杠杆。我们是工具的使用者，而这些只是我们创造过的最强大的工具。

底线：从这里开始，从今天开始

If you’ve made it this far, you’re probably feeling a mix of excitement and trepidation. That’s the right response. AI-assisted development is powerful, but it requires discipline and intentionality.

如果你读到了这里，你可能感到既兴奋又恐惧。这是正确的反应。AI 辅助开发很强大，但它需要纪律和明确的意图。

Here’s your action plan:

这是你的行动计划：

今天：

Create a CLAUDE.md for your current project
Add three anchor comments yourself to your gnarliest code
Try one AI-assisted feature with proper boundaries
为你当前的项目创建一个 CLAUDE.md
亲手为你最棘手的代码添加三条锚点注释
在设定好边界的情况下，尝试一个 AI 辅助的功能

本周：

Establish AI commit message conventions with your team
Run an AI-assisted coding session with a junior developer
Write tests for one piece of AI-generated code
与你的团队建立 AI commit message 规范
与一位初级开发者进行一次 AI 辅助的编程 session
为一段 AI 生成的代码编写测试

本月：

Measure your deployment frequency before/after AI adoption
Create a prompt pattern library for common tasks
Run a team retrospective on AI-assisted development
衡量采纳 AI 前后的部署频率
为常见任务创建一个提示模式库
就 AI 辅助开发进行一次团队复盘

The most important thing? Start. Start small, start careful, but start. The developers who master this workflow aren’t necessarily smarter or more talented—they’re just the ones who started earlier and learned from more mistakes.

最重要的是什么？开始。从小处着手，小心翼翼地开始，但一定要开始。那些掌握了这种工作流的开发者不一定更聪明或更有天赋——他们只是开始得更早，并从更多的错误中学习。

Software delivery performance predicts organizational performance. In an industry where speed and quality determine success, AI assistance isn’t a nice-to-have—it’s a competitive necessity. But only if you do it right.

软件交付性能预示着组织绩效。在一个速度和质量决定成败的行业里，AI 辅助不是锦上添花——它是一种竞争必需品。但前提是，你要用对它。

Vibe-coding, despite its playful name, is serious business. It’s a new way of thinking about software development that amplifies human capabilities rather than replacing them. Master it, and you’ll ship better software faster than you ever thought possible. Ignore it, and you’ll watch competitors lap you while you’re still typing boilerplate.

Vibe-coding，尽管名字听起来轻松，但它是一件严肃的事情。它是一种关于软件开发的新思维方式，旨在放大而非取代人类的能力。掌握它，你将能以超乎想象的速度交付更好的软件。忽视它，你就会眼睁睁看着竞争对手把你甩在身后，而你还在敲着样板代码。

The tools are here. The patterns are proven. The only question is: will you be conducting the orchestra, or still playing every instrument yourself?

工具已经就位。模式已被证明。唯一的问题是：你是要指挥那个交响乐团，还是继续亲自演奏每一种乐器？

准备好深入了吗？入门资源：

📄 我们久经沙场的 CLAUDE.md 模板：

github.com/julep-ai/julep/blob/main/AGENTS.md

📚 推荐阅读：

Peter Senge – The Fifth Discipline (2010)
“Beyond the 70 %: Maximising the Human 30 % of AI-Assisted Coding” (Mar 13 2025) – Addy Osmani
Mark Richards & Neal Ford – Fundamentals of Software Architecture, 2nd ed. (2025)
Nicole Forsgren, Jez Humble, Gene Kim - Accelerate: The Science of Lean Software and DevOps
彼得·圣吉 – 《第五项修炼》 (2010)
“超越 70%：最大化 AI 辅助编码中人类的 30%” (2025年3月13日) – Addy Osmani
Mark Richards & Neal Ford – 《软件架构基础》，第2版 (2025)
Nicole Forsgren, Jez Humble, Gene Kim - 《加速：精益软件和 DevOps 的科学》

Remember: perfect is the enemy of shipped. Start with one small project, establish your boundaries, and iterate. The future of development is here—it’s just not evenly distributed yet.

记住：完美是交付的敌人。从一个小项目开始，建立你的边界，然后迭代。开发的未来已来——只是尚未均匀分布。

Be part of the distribution.

让自己成为未来的一部分。

NotebookLM Podcast on this post↩︎

关于这篇文章的 NotebookLM 播客↩︎
That statistic comes from the groundbreaking research in the book “Accelerate: The Science of Lean Software and DevOps” by Nicole Forsgren, Jez Humble, and Gene Kim.

The authors conducted a rigorous four-year study (2014-2017) surveying over 31,000 professionals across 2,000+ organizations. They used academic research methods to identify what separates high-performing technology organizations from low performers.

The specific statistics you’re asking about compare the highest performers to the lowest performers in their study:

High performers vs. Low Performers: Software Delivery
- 46 times as many code deployments
- 440 times as fast commit to deployment time
- 170 times faster mean time to recover
- 5 times lower change failure rate
The “Accelerate” research proves that practices matter more than tools. AI is an incredibly powerful tool, but without the practices—continuous integration, automated testing, trunk-based development, monitoring—you won’t see these multiplier effects.

That’s why I emphasize things like CLAUDE.md files, human-written tests, and careful boundaries. These ARE the practices that separate high performers from low performers, just adapted for the age of AI assistance.↩︎
这个数据来自 Nicole Forsgren、Jez Humble 和 Gene Kim 在《加速：精益软件和 DevOps 的科学》一书中的开创性研究。

作者们进行了一项严谨的、为期四年的研究（2014-2017），调查了来自 2000 多个组织的 31000 多名专业人士。他们使用学术研究方法来识别是什么将高绩效的技术组织与低绩效者区分开来。

你问到的具体数据比较了他们研究中最高绩效者和最低绩效者：

高绩效者 vs. 低绩效者：软件交付
- 代码部署次数多 46 倍
- 从 commit 到部署的时间快 440 倍
- 平均恢复时间快 170 倍
- 变更失败率低 5 倍
《加速》的研究证明，实践比工具更重要。AI 是一个极其强大的工具，但如果没有实践——持续集成、自动化测试、主干开发、监控——你将看不到这些倍增效应。

这就是为什么我强调像 CLAUDE.md 文件、人类编写的测试和谨慎的边界。这些正是区分高绩效者和低绩效者的实践，只是为了适应 AI 辅助时代而做了调整。↩︎
Andrej Karpathy is a Slovak-Canadian computer scientist who served as the director of artificial intelligence and Autopilot Vision at Tesla. He co-founded and formerly worked at OpenAI, where he specialized in deep learning and computer vision.

https://karpathy.ai/↩︎
Andrej Karpathy 是一位斯洛伐克裔加拿大计算机科学家，曾担任特斯拉人工智能和 Autopilot Vision 的总监。他曾共同创立并在 OpenAI 工作，专注于深度学习和计算机视觉。

https://karpathy.ai/↩︎
↩︎
↩︎
Steve Yegge is an American computer programmer and blogger who is known for writing about programming languages, productivity and software culture through his “Stevey’s Drunken Blog Rants” site, followed by “Stevey’s Blog Rants.”

https://en.wikipedia.org/wiki/Steve_Yegge ↩︎
Steve Yegge 是一位美国计算机程序员和博主，他以通过其“Stevey 的醉话博客”网站，以及后来的“Stevey 的博客咆哮”来撰写关于编程语言、生产力和软件文化的文章而闻名。

https://en.wikipedia.org/wiki/Steve_Yegge ↩︎
I don’t mean git submodules – in fact, don’t use them with coding assistants for sure, they are mine fields for models.↩︎
我不是指 git submodules——事实上，绝对不要将它们与编码助手一起使用，它们对模型来说是雷区。↩︎