译：我是如何使用 LLM 辅助我写代码的

发布于 2025年7月10日

原文： https://simonwillison.net/2025/Mar/11/using-llms-for-code/
作者： Simon Willison
译者： Gemini 2.5 Pro

Online discussions about using Large Language Models to help write code inevitably produce comments from developers who’s experiences have been disappointing. They often ask what they’re doing wrong—how come some people are reporting such great results when their own experiments have proved lacking?

网上关于使用大语言模型辅助编程的讨论，总会引来一些开发者，他们的经历令人失望。他们常常问自己到底做错了什么——为什么有些人说效果那么好，而他们自己的尝试却收效甚微？

Using LLMs to write code is difficult and unintuitive. It takes significant effort to figure out the sharp and soft edges of using them in this way, and there’s precious little guidance to help people figure out how best to apply them.

用 LLM 写代码是困难且反直觉的。你需要花很大力气才能摸清它的边界和能力范围，而且市面上几乎没有什么靠谱的指南能教你如何最好地运用它们。

If someone tells you that coding with LLMs is easy they are (probably unintentionally) misleading you. They may well have stumbled on to patterns that work, but those patterns do not come naturally to everyone.

如果有人告诉你用 LLM 编程很容易，那他很可能（也许是无意的）误导了你。他们或许是碰巧摸索出了一些行之有效的模式，但这些模式并非人人都能自然掌握。

I’ve been getting great results out of LLMs for code for over two years now. Here’s my attempt at transferring some of that experience and intution to you.

两年多来，我用 LLM 辅助编程一直效果很好。这篇文章就是我试图将这些经验和直觉分享给你的一次尝试。

设定合理的预期 #

Ignore the “AGI” hype—LLMs are still fancy autocomplete. All they do is predict a sequence of tokens—but it turns out writing code is mostly about stringing tokens together in the right order, so they can be extremely useful for this provided you point them in the right direction.

别管那些“AGI”的炒作——LLM 本质上还是个高级的自动补全工具。它们所做的，只是预测一连串的 token——但事实证明，写代码很大程度上就是把 token 以正确的顺序串联起来，所以只要你给它们指明了方向，它们就能变得极其有用。

If you assume that this technology will implement your project perfectly without you needing to exercise any of your own skill you’ll quickly be disappointed.

如果你指望这门技术能完美实现你的项目，而你完全不需要动用自己的任何技能，那你很快就会大失所望。

Instead, use them to augment your abilities. My current favorite mental model is to think of them as an over-confident pair programming assistant who’s lightning fast at looking things up, can churn out relevant examples at a moment’s notice and can execute on tedious tasks without complaint.

你应该把它用作增强自身能力的工具。我目前最喜欢的心智模型是，把它们看作一个过度自信的结对编程助手：它查资料快如闪电，能随时生成相关示例，还能毫无怨言地执行繁琐任务。

Over-confident is important. They’ll absolutely make mistakes—sometimes subtle, sometimes huge. These mistakes can be deeply inhuman—if a human collaborator hallucinated a non-existent library or method you would instantly lose trust in them.

过度自信这点很重要。它们绝对会犯错——有时是细微的，有时是巨大的。这些错误可能是非人般的——如果一个人类合作者“幻觉”出一个不存在的库或方法，你会立刻对他失去信任。

Don’t fall into the trap of anthropomorphizing LLMs and assuming that failures which would discredit a human should discredit the machine in the same way.

不要陷入拟人化 LLM 的陷阱，不要认为那些会让一个程序员信誉扫地的错误，同样也会让机器信誉扫地。

When working with LLMs you’ll often find things that they just cannot do. Make a note of these—they are useful lessons! They’re also valuable examples to stash away for the future—a sign of a strong new model is when it produces usable results for a task that previous models had been unable to handle.

和 LLM 一起工作时，你经常会发现它们就是做不到某些事。把这些记下来——它们是宝贵的教训！这些也是值得为未来收藏的例子——一个强大的新模型的标志，就是它能在一件旧模型搞不定的任务上，给出可用的结果。

考虑训练数据的截止日期 #

A crucial characteristic of any model is its training cut-off date. This is the date at which the data they were trained on stopped being collected. For OpenAI’s models this is usually October 2023 or May 2024. Other providers may have more recent dates.

任何模型的一个关键特性是其训练数据截止日期。这是指它们所用训练数据的收集截止时间。对于 OpenAI 的模型，这个日期通常是 2023 年 10 月或 2024 年 5 月。其他提供商的日期可能更新一些。

This is extremely important for code, because it influences what libraries they will be familiar with. If the library you are using had a major breaking change since October 2023, some OpenAI models won’t know about it!

这对代码来说极其重要，因为它决定了模型熟悉哪些库。如果你正在使用的库在 2023 年 10 月之后有重大的破坏性更新，一些 OpenAI 模型对此将一无所知！

I gain enough value from LLMs that I now deliberately consider this when picking a library—I try to stick with libraries with good stability and that are popular enough that many examples of them will have made it into the training data. I like applying the principles of boring technology—innovate on your project’s unique selling points, stick with tried and tested solutions for everything else.

我从 LLM 中获益良多，以至于我现在选择库时会有意考虑这一点——我倾向于选择那些稳定性好、足够流行、有大量示例被纳入训练数据的库。我喜欢应用无聊技术的原则——在你项目的独特卖点上创新，其他所有事情都用经过验证的成熟方案。

LLMs can still help you work with libraries that exist outside their training data, but you need to put in more work—you’ll need to feed them recent examples of how those libraries should be used as part of your prompt.

LLM 仍然可以帮助你使用其训练数据之外的库，但你需要付出更多努力——你需要在你的 prompt 中提供这些库的最新用法示例。

This brings us to the most important thing to understand when working with LLMs:

这就引出了使用 LLM 时需要理解的最重要的一点：

上下文为王 #

Most of the craft of getting good results out of an LLM comes down to managing its context—the text that is part of your current conversation.

从 LLM 那里获得好结果的大部分技巧，都归结于管理它的上下文——也就是你当前对话中的文本。

This context isn’t just the prompt that you have fed it: successful LLM interactions usually take the form of conversations, and the context consists of every message from you and every reply from the LLM that exist in the current conversation thread.

这个上下文不仅仅是你输入的 prompt：成功的 LLM 交互通常以对话的形式进行，上下文包含了当前对话线程中你发出的每条消息以及 LLM 的每次回复。

When you start a new conversation you reset that context back to zero. This is important to know, as often the fix for a conversation that has stopped being useful is to wipe the slate clean and start again.

当你开始一个新的对话时，你就将上下文重置为零了。了解这一点很重要，因为当一段对话不再有用时，通常的解决方法就是清空重来。

Some LLM coding tools go beyond just the conversation. Claude Projects for example allow you to pre-populate the context with quite a large amount of text—including a recent ability to import code directly from a GitHub repository which I’m using a lot.

一些 LLM 编程工具的功能超出了对话本身。例如，Claude Projects 允许你用大量文本预先填充上下文——包括最近新增的直接从 GitHub 仓库导入代码的功能，我用得非常多。

Tools like Cursor and VS Code Copilot include context from your current editor session and file layout automatically, and you can sometimes use mechanisms like Cursor’s @commands to pull in additional files or documentation.

像 Cursor 和 VS Code Copilot 这样的工具会自动包含你当前编辑器会话和文件布局中的上下文，有时你还可以使用像 Cursor 的 @ 命令这样的机制来引入额外的文件或文档。

One of the reasons I mostly work directly with the ChatGPT and Claude web or app interfaces is that it makes it easier for me to understand exactly what is going into the context. LLM tools that obscure that context from me are less effective.

我主要直接使用 ChatGPT 和 Claude 的网页或应用界面的原因之一是，这样我能更容易地确切了解哪些内容进入了上下文。那些对我隐藏上下文的 LLM 工具，效果会更差。

You can use the fact that previous replies are also part of the context to your advantage. For complex coding tasks try getting the LLM to write a simpler version first, check that it works and then iterate on building to the more sophisticated implementation.

你可以利用之前的回复也是上下文一部分这个事实。对于复杂的编程任务，可以先让 LLM 写一个简单版本，检查它是否能用，然后再迭代构建更复杂的实现。

I often start a new chat by dumping in existing code to seed that context, then work with the LLM to modify it in some way.

我经常通过扔进现有代码来开启一个新的聊天，以此作为上下文的种子，然后与 LLM 一起对它进行修改。

One of my favorite code prompting techniques is to drop in several full examples relating to something I want to build, then prompt the LLM to use them as inspiration for a new project. I wrote about that in detail when I described my JavaScript OCR application that combines Tesseract.js and PDF.js—two libraries I had used in the past and for which I could provide working examples in the prompt.

我最喜欢的代码 prompt 技巧之一是，把我想要构建的东西相关的几个完整示例扔进去，然后让 LLM 以它们为灵感来创建一个新项目。我在描述我的 JavaScript OCR 应用时详细写过这一点，那个应用结合了 Tesseract.js 和 PDF.js——这两个库我过去都用过，并且可以在 prompt 中提供可行的示例。

向它们询问选项 #

Most of my projects start with some open questions: is the thing I’m trying to do possible? What are the potential ways I could implement it? Which of those options are the best?

我的大多数项目都是从一些开放性问题开始的：我想要做的事情可能实现吗？有哪些潜在的实现方式？这些选项中哪个是最好的？

I use LLMs as part of this initial research phase.

我会在这个初步研究阶段使用 LLM。

I’ll use prompts like “what are options for HTTP libraries in Rust? Include usage examples”—or “what are some useful drag-and-drop libraries in JavaScript? Build me an artifact demonstrating each one” (to Claude).

我会用类似这样的 prompt：“Rust 有哪些 HTTP 库选项？请包含使用示例”——或者（对 Claude 说）“JavaScript 有哪些好用的拖放库？为我构建一个 artifact 来演示每一个”。

The training cut-off is relevant here, since it means newer libraries won’t be suggested. Usually that’s OK—I don’t want the latest, I want the most stable and the one that has been around for long enough for the bugs to be ironed out.

训练数据截止日期在这里很重要，因为这意味着它不会推荐最新的库。通常这没关系——我不想要最新的，我想要最稳定的，以及那些已经存在了足够长时间、bug 都被修复得差不多的。

If I’m going to use something more recent I’ll do that research myself, outside of LLM world.

如果我要用一些比较新的东西，我会自己去研究，脱离 LLM 的世界。

The best way to start any project is with a prototype that proves that the key requirements of that project can be met. I often find that an LLM can get me to that working prototype within a few minutes of me sitting down with my laptop—or sometimes even while working on my phone.

启动任何项目的最佳方式都是先做一个原型，证明项目的关键需求可以被满足。我常常发现，在我坐到笔记本电脑前后几分钟内，LLM 就能帮我搞定一个可行的原型——有时甚至是在手机上工作时就能完成。

明确告诉它们要做什么 #

Once I’ve completed the initial research I change modes dramatically. For production code my LLM usage is much more authoritarian: I treat it like a digital intern, hired to type code for me based on my detailed instructions.

一旦我完成了初步研究，我就会彻底改变模式。对于生产代码，我对 LLM 的使用方式要专制得多：我把它当作一个数字实习生，雇来根据我的详细指令为我敲代码。

Here’s a recent example:

这里有一个最近的例子：

写一个使用 asyncio httpx 的 Python 函数，函数签名如下：
async def download_db(url, max_size_bytes=5 * 1025 * 1025): -> pathlib.Path
给定一个 URL，这个函数会将数据库下载到一个临时目录并返回其路径。但是，它在开始流式传输数据时会检查 content-length 头，如果超过限制就抛出错误。下载完成后，它会使用 sqlite3.connect(…) 然后运行 PRAGMA quick_check 来确认 SQLite 数据有效——如果无效则抛出错误。最后，如果 content-length 头欺骗了我们——比如说它写的是 2MB 但我们下载了 3MB——我们一发现这个问题就立即抛出错误。

I could write this function myself, but it would take me the better part of fifteen minutes to look up all of the details and get the code working right. Claude knocked it out in 15 seconds.

我自己也能写这个函数，但要查阅所有细节并让代码正常工作，大概需要我十五分钟。而 Claude 只用了 15 秒就搞定了。

I find LLMs respond extremely well to function signatures like the one I use here. I get to act as the function designer, the LLM does the work of building the body to my specification.

我发现 LLM 对我使用的这种函数签名的响应非常好。我扮演函数设计师的角色，LLM 则负责根据我的规范构建函数体。

I’ll often follow-up with “Now write me the tests using pytest”. Again, I dictate my technology of choice—I want the LLM to save me the time of having to type out the code that’s sitting in my head already.

我经常会接着说“现在用 pytest 给我写测试”。同样，我指定我选择的技术——我希望 LLM 能帮我节省时间，不用去敲那些已经在我脑子里的代码。

If your reaction to this is “surely typing out the code is faster than typing out an English instruction of it”, all I can tell you is that it really isn’t for me any more. Code needs to be correct. English has enormous room for shortcuts, and vagaries, and typos, and saying things like “use that popular HTTP library” if you can’t remember the name off the top of your head.

如果你对此的反应是“打代码肯定比打一段英文指令要快”，我只能告诉你，对我来说真的不再是这样了。代码需要正确无误。而英语有巨大的空间来使用缩写、模糊表达、处理拼写错误，甚至在你一时想不起名字的时候可以说“用那个流行的 HTTP 库”。

The good coding LLMs are excellent at filling in the gaps. They’re also much less lazy than me—they’ll remember to catch likely exceptions, add accurate docstrings, and annotate code with the relevant types.

好的编程 LLM 非常擅长填补这些空白。它们也比我懒惰的程度要低得多——它们会记得捕获可能的异常，添加准确的文档字符串，并用相关的类型注解代码。

你必须测试它写的代码！#

I wrote about this at length last week: the one thing you absolutely cannot outsource to the machine is testing that the code actually works.

我上周详细写过这个：有一件事你绝对不能外包给机器，那就是测试代码是否真的能用。

Your responsibility as a software developer is to deliver working systems. If you haven’t seen it run, it’s not a working system. You need to invest in strengthening those manual QA habits.

作为一名软件开发者，你的责任是交付可用的系统。如果你没亲眼见过它运行，那它就不是一个可用的系统。你需要投入精力来加强那些手动 QA 的习惯。

This may not be glamorous but it’s always been a critical part of shipping good code, with or without the involvement of LLMs.

这可能不那么光鲜，但它一直都是交付好代码的关键部分，无论有没有 LLM 的参与。

记住，这是一场对话 #

If I don’t like what an LLM has written, they’ll never complain at being told to refactor it! “Break that repetitive code out into a function”, “use string manipulation methods rather than a regular expression”, or even “write that better!”—the code an LLM produces first time is rarely the final implementation, but they can re-type it dozens of times for you without ever getting frustrated or bored.

如果我不喜欢 LLM 写的东西，它们永远不会抱怨被要求重构！“把那段重复的代码提取成一个函数”，“用字符串操作方法而不是正则表达式”，甚至“写得更好一点！”——LLM 第一次生成的代码很少是最终实现，但它们可以为你重写几十次，而从不感到沮丧或厌烦。

Occasionally I’ll get a great result from my first prompt—more frequently the more I practice—but I expect to need at least a few follow-ups.

偶尔我的第一个 prompt 就能得到很好的结果——随着我练习得越多，这种情况也越频繁——但我通常预期至少需要几次跟进。

I often wonder if this is one of the key tricks that people are missing—a bad initial result isn’t a failure, it’s a starting point for pushing the model in the direction of the thing you actually want.

我常常在想，这是否是人们错过的关键技巧之一——一个糟糕的初始结果不是失败，而是一个起点，用来推动模型朝你真正想要的方向前进。

使用能替你运行代码的工具 #

An increasing number of LLM coding tools now have the ability to run that code for you. I’m slightly cautious about some of these since there’s a possibility of the wrong command causing real damage, so I tend to stick to the ones that run code in a safe sandbox. My favorites right now are:

越来越多的 LLM 编程工具现在都具备了为你运行代码的能力。我对其中一些持谨慎态度，因为错误的命令可能会造成真正的损害，所以我倾向于使用那些在安全沙箱中运行代码的工具。我目前最喜欢的是：

ChatGPT Code Interpreter, where ChatGPT can write and then execute Python code directly in a Kubernetes sandbox VM managed by OpenAI. This is completely safe—it can’t even make outbound network connections so really all that can happen is the temporary filesystem gets mangled and then reset.
Claude Artifacts, where Claude can build you a full HTML+JavaScript+CSS web application that is displayed within the Claude interface. This web app is displayed in a very locked down iframe sandbox, greatly restricting what it can do but preventing problems like accidental exfiltration of your private Claude data.
ChatGPT Canvas is a newer ChatGPT feature with similar capabilites to Claude Artifacts. I have not explored this enough myself yet.
ChatGPT Code Interpreter，ChatGPT 可以在 OpenAI 管理的 Kubernetes 沙箱虚拟机中直接编写并执行 Python 代码。这完全安全——它甚至不能建立出站网络连接，所以真正可能发生的就是临时文件系统被弄乱然后重置。
Claude Artifacts，Claude 可以为你构建一个完整的 HTML+JavaScript+CSS Web 应用，并显示在 Claude 界面内。这个 Web 应用显示在一个非常受限的 iframe 沙箱中，极大地限制了它的功能，但能防止像意外泄露你的私有 Claude 数据这类问题。
ChatGPT Canvas 是一个较新的 ChatGPT 功能，与 Claude Artifacts 功能相似。我自己还没来得及充分探索。

And if you’re willing to live a little more dangerously:

如果你愿意冒点险：

Cursor has an “Agent” feature that can do this, as does Windsurf and a growing number of other editors. I haven’t spent enough time with these to make recommendations yet.
Aider is the leading open source implementation of these kinds of patterns, and is a great example of dogfooding—recent releases of Aider have been 80%+ written by Aider itself.
Claude Code is Anthropic’s new entrant into this space. I’ll provide a detailed description of using that tool shortly.
Cursor 有一个可以做到这一点的 “Agent” 功能，Windsurf 以及越来越多其他编辑器也是如此。我还没花足够的时间来对它们做出推荐。
Aider 是这类模式的领先开源实现，也是吃自己的狗粮 (dogfooding) 的一个绝佳例子——Aider 的最近几个版本有 80% 以上的代码是由 Aider 自己编写的。
Claude Code 是 Anthropic 在这个领域的新产品。我稍后会详细介绍如何使用这个工具。

This run-the-code-in-a-loop pattern is so powerful that I chose my core LLM tools for coding based primarily on whether they can safely run and iterate on my code.

这种循环运行代码的模式如此强大，以至于我主要根据它们是否能安全地运行和迭代我的代码来选择我的核心 LLM 编程工具。

Vibe-coding 是一个很好的学习方式 #

Andrej Karpathy coined the term vibe-coding just over a month ago, and it has stuck:

Andrej Karpathy 在一个多月前创造了 vibe-coding 这个词，并且它流传开来：

There’s a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. […] I ask for the dumbest things like “decrease the padding on the sidebar by half” because I’m too lazy to find it. I “Accept All” always, I don’t read the diffs anymore. When I get error messages I just copy paste them in with no comment, usually that fixes it.

有一种我称之为“vibe-coding”的新型编程方式，你完全凭感觉来，拥抱指数级增长，甚至忘记代码的存在。[…] 我会提一些最蠢的要求，比如“把侧边栏的内边距减少一半”，因为我懒得去找。我总是“全部接受”，不再阅读 diff。当我遇到错误信息时，我只是不加评论地复制粘贴进去，通常这样就能解决问题。

Andrej suggests this is “not too bad for throwaway weekend projects”. It’s also a fantastic way to explore the capabilities of these models—and really fun.

Andrej 建议这“对于一次性的周末项目来说还不错”。这也是探索这些模型能力的绝佳方式——而且真的很有趣。

The best way to learn LLMs is to play with them. Throwing absurd ideas at them and vibe-coding until they almost sort-of work is a genuinely useful way to accelerate the rate at which you build intuition for what works and what doesn’t.

学习 LLM 的最好方法就是和它们一起玩。向它们扔出荒谬的想法，然后进行 vibe-coding，直到它们差不多能用为止，这是一种真正有用的方式，可以加速你建立起对什么行得通、什么行不通的直觉。

I’ve been vibe-coding since before Andrej gave it a name! My simonw/tools GitHub repository has 77 HTML+JavaScript apps and 6 Python apps, and every single one of them was built by prompting LLMs. I have learned so much from building this collection, and I add to it at a rate of several new prototypes per week.

在 Andrej 给它命名之前，我就已经在 vibe-coding 了！我的 simonw/tools GitHub 仓库里有 77 个 HTML+JavaScript 应用和 6 个 Python 应用，每一个都是通过 prompt LLM 构建的。通过构建这个合集，我学到了太多东西，而且我每周都会增加几个新的原型。

You can try most of mine out directly on tools.simonwillison.net—a GitHub Pages published version of the repo. I wrote more detailed notes on some of these back in October in Everything I built with Claude Artifacts this week.

你可以在 tools.simonwillison.net 上直接尝试我的大部分工具——这是该仓库的 GitHub Pages 发布版本。我在去年十月的文章《我这周用 Claude Artifacts 构建的所有东西》中对其中一些做了更详细的记录。

If you want to see the transcript of the chat used for each one it’s almost always linked to in the commit history for that page—or visit the new colophon page for an index that includes all of those links.

如果你想看每个工具所用的聊天记录，几乎总能在那一页的 commit 历史中找到链接——或者访问新的版本说明页，那里有一个包含所有这些链接的索引。

一个使用 Claude Code 的详细例子 #

While I was writing this article I had the idea for that tools.simonwillison.net/colophon page—I wanted something I could link to that showed the commit history of each of my tools in a more obvious way than GitHub.

在我写这篇文章的时候，我萌生了创建那个 tools.simonwillison.net/colophon 页面的想法——我想要一个可以链接到的页面，用比 GitHub 更直观的方式展示我每个工具的 commit 历史。

I decided to use that as an opportunity to demonstrate my AI-assisted coding process.

我决定以此为契机，展示我的 AI 辅助编程过程。

For this one I used Claude Code, because I wanted it to be able to run Python code directly against my existing tools repository on my laptop.

这次我用了 Claude Code，因为我希望它能直接在我笔记本电脑上对我现有的工具仓库运行 Python 代码。

Running the /cost command at the end of my session showed me this:

在会话结束时运行 /cost 命令，我看到了这个：

> /cost
  ⎿  Total cost: $0.61
     Total duration (API): 5m 31.2s
     Total duration (wall): 17m 18.7s

The initial project took me just over 17 minutes from start to finish, and cost me 61 cents in API calls to Anthropic.

最初的项目从开始到完成花了我 17 分钟多一点，调用 Anthropic API 花了 61 美分。

I used the authoritarian process where I told the model exactly what I wanted to build. Here’s my sequence of prompts (full transcript here).

我用了专制的方式，明确告诉模型我想要构建什么。这是我的 prompt 序列（完整记录在这里）。

I started by asking for an initial script to gather the data needed for the new page:

我从请求一个初始脚本开始，用来收集新页面所需的数据：

Almost all of the HTML files in this directory were created using Claude prompts, and the details of those prompts are linked in the commit messages. Build a Python script that checks the commit history for each HTML file in turn and extracts any URLs from those commit messages into a list. It should then output a JSON file with this structure: {“pages”: {“name-of-file.html”: [“url”], {“name-of-file-2.html”: [“url1”, “url2”], …—as you can see, some files may have more than one URL in their commit history. The script should be called gather_links.py and it should save a JSON file called gathered_links.json

这个目录里几乎所有的 HTML 文件都是用 Claude 的 prompt 创建的，这些 prompt 的详情链接都在 commit message 里。写一个 Python 脚本，依次检查每个 HTML 文件的 commit 历史，并从这些 commit message 中提取出所有的 URL，放进一个列表里。然后，它应该输出一个这样结构的 JSON 文件：{“pages”: {“name-of-file.html”: [“url”], {“name-of-file-2.html”: [“url1”, “url2”], …——如你所见，有些文件在它们的 commit 历史中可能有多个 URL。这个脚本应该叫 gather_links.py，并且应该把 JSON 文件保存为 gathered_links.json。

I really didn’t think very hard about this first prompt—it was more of a stream of consciousness that I typed into the bot as I thought about the initial problem.

我其实没怎么仔细思考这个第一个 prompt——它更像是我在思考初始问题时，随手敲给机器人的意识流。

I inspected the initial result and spotted some problems:

我检查了初始结果，发现了一些问题：

It looks like it just got the start of the URLs, it should be getting the whole URLs which might be to different websites—so just get anything that starts https:// and ends with whitespace or the end of the commit message

看起来它只获取了 URL 的开头部分，它应该获取完整的 URL，这些 URL 可能指向不同的网站——所以只要获取所有以 https:// 开头并以空白或 commit message 结尾的字符串就行。

Then I changed my mind—I wanted those full commit messages too:

然后我改主意了——我也想要完整的 commit message：

Update the script—I want to capture the full commit messages AND the URLs—the new format should be {“pages”: {“aria-live-regions.html”: {“commits”: [{“hash”: hash, “message”: message, “date”: iso formatted date], “urls”: [list of URLs like before]

更新脚本——我想要捕获完整的 commit message 和 URL——新格式应该是 {“pages”: {“aria-live-regions.html”: {“commits”: [{“hash”: 哈希值, “message”: 消息, “date”: iso 格式日期], “urls”: [和之前一样的 URL 列表]

Providing examples like this is a great shortcut to getting exactly what you want.

提供这样的例子是得到你想要的东西的一个绝佳捷径。

Note that at no point have I looked at the code it’s written in gather_links.py! This is pure vibe-coding: I’m looking at what it’s doing, but I’ve left the implementation details entirely up to the LLM.

注意，我自始至终都没有看过它在 gather_links.py 中写的代码！这是纯粹的 vibe-coding：我只看它在做什么，但把实现细节完全交给了 LLM。

The JSON looked good to me, so I said:

JSON 看起来不错，于是我说：

This is working great. Write me a new script called build_colophon.py which looks through that gathered JSON file and builds and saves an HTML page. The page should be mobile friendly and should list every page—with a link to that page—and for each one display the commit messages neatly (convert newlines to br and linkify URLs but no other formatting)—plus the commit message dates and links to the commits themselves which are in https://github.com/simonw/tools

这个工作得很好。给我写一个新脚本，叫 build_colophon.py，它会遍历那个收集好的 JSON 文件，然后构建并保存一个 HTML 页面。这个页面应该是移动端友好的，并且应该列出每个页面——附上指向该页面的链接——对每一个页面，都要整洁地显示 commit message（将换行符转为 br，并将 URL 转为链接，不做其他格式化）——还要加上 commit message 的日期和指向 commit 本身的链接，它们在 https://github.com/simonw/tools。

Claude knows how GitHub URLs works, so telling it to link to the commits and providing the repo name was enough for it guess https://github.com/simonw/tools/commit/fd9daf885c924ba277806b3440457d52b0ad90a8 for those commit URLs.

Claude 知道 GitHub URL 的工作原理，所以告诉它链接到 commit 并提供仓库名称，就足以让它猜出那些 commit URL 的格式是 https://github.com/simonw/tools/commit/fd9daf885c924ba277806b3440457d52b0ad90a8。

I tend to find Claude has good default taste when it comes to web page design—I said “the page should be mobile friendly” and left it at that.

我倾向于认为 Claude 在网页设计方面有不错的默认品味——我只说了“页面应该是移动端友好的”，然后就没再管了。

Claude churned away and built me a page that wasn’t right, so I said:

Claude 忙活了一阵，给我建了一个页面，但不对劲，于是我说：

it’s not working right. ocr.html had a bunch of commits but in colophon.html there is only one link and heading for the first commit and the rest are shown within that same block—there should be separate HTML chunks with links and formatted dates for each of the other commits. Also the neatly formatted date should include the HH:MM as well as the date

它工作得不对。ocr.html 有一堆 commit，但在 colophon.html 里，只有第一个 commit 有一个链接和标题，其余的都显示在同一个块里——其他的每个 commit 都应该有独立的 HTML 块，包含链接和格式化后的日期。另外，整洁格式化的日期也应该包括 HH:MM 以及年月日。

It fixed the bug all on its own, leaving just two changes I decided to make:

它自己修复了这个 bug，只剩下两个我决定要做的改动：

it’s almost perfect, but each page should have the commits displayed in the opposite order—oldest first

差不多完美了，但是每个页面的 commit 应该以相反的顺序显示——最旧的在前面。

And then:

然后：

One last change—the pages are currently listed alphabetically, lets instead list them with the most recently modified at the top

最后一个改动——页面目前是按字母顺序排列的，让我们改成按最近修改时间排序，最新的在最上面。

And that was the entire project! Here’s build_colophon.py, and the page it generated came out looking pretty good:

整个项目就这样完成了！这是 build_colophon.py，它生成的页面看起来还挺不错：

工具版本说明页。此页面记录了 tools.simonwillison.net 上工具的创建过程，包括用于构建它们的 Claude 对话链接。social-media-cropper.html b4a2bc 2024年12月10日 20:35 社交媒体裁剪器 https://gist.github.com/simonw/12b8f88932a71450071190e1289a17e9 a10954 2025年2月28日 16:02 支持2:1和14:10比例 https://gist.github.com/simonw/e23917eddcbb368c9b6180d581f8f40a

There was one remaining task: I needed to deploy the new colophon as part of my site, but I didn’t want to check that new colophon.html page into the repository itself. I wanted a custom GitHub Pages build process.

还剩下一个任务：我需要把新的版本说明页部署到我的网站上，但我不想把那个新的 colophon.html 文件提交到仓库本身。我想要一个自定义的 GitHub Pages 构建流程。

I started a brand new Claude Code session (clearing the slate) to see if that was something Claude could figure out too:

我开启了一个全新的 Claude Code 会话（清空重来），看看 Claude 是否也能搞定这件事：

This repo is deployed using GitHub Pages. I want to modify the build slightly—before deploying to GitHub Pages it should run python gather_links.py which will generate a file called gathered_links.json—then it should run python build_colophon.py which will use that file to create colophon.html—I want both gathered_links.json and colophon.html to be published to GitHub pages but I do not want them checked into the repo itself

这个仓库是用 GitHub Pages 部署的。我想稍微修改一下构建流程——在部署到 GitHub Pages 之前，它应该运行 python gather_links.py，这会生成一个叫 gathered_links.json 的文件——然后它应该运行 python build_colophon.py，用那个文件来创建 colophon.html——我希望 gathered_links.json 和 colophon.html 都被发布到 GitHub Pages，但我不希望它们被提交到仓库本身。

Unlike the first time, this time I watched what it was doing very closely—I don’t know how to customize GitHub Pages builds in this way and I wanted to both learn how to do it and stay cautious that it might hallucinate details and fail at the task.

和第一次不同，这次我非常仔细地观察它在做什么——我不知道如何以这种方式自定义 GitHub Pages 的构建，我既想学习如何做，又想保持警惕，怕它会幻觉出细节导致任务失败。

It told me it wanted to create this file and asked for my permission:

它告诉我它想创建这个文件，并请求我的许可：

name: Deploy to GitHub Pages

on:
  # Run on push to main branch
  push:
    branches: [ main ]
  # Allow manual trigger
  workflow_dispatch:

# Sets permissions of the GITHUB_TOKEN
permissions:
  contents: read
  pages: write
  id-token: write

# Allow only one concurrent deployment
concurrency:
  group: pages
  cancel-in-progress: true

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Need full history for git log in gather_links.py

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
      - name: Generate links and build colophon
        run: |
          python gather_links.py
          python build_colophon.py
      - name: Setup Pages
        uses: actions/configure-pages@v4

      - name: Build with Jekyll
        uses: actions/jekyll-build-pages@v1

      - name: Upload artifact
        uses: actions/upload-pages-artifact@v3

  deploy:
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    needs: build
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

I guess that’s right? I appreciated that comment about “Need full history for git log in gather_links.py”—that’s something I could easily have missed.

我猜这是对的？我很欣赏那句关于“gather_links.py 中的 git log 需要完整的历史记录”的注释——这是我很容易忽略的地方。

Then it said it wanted to add those files to .gitignore—sounds like a good idea.

然后它说想把那些文件添加到 .gitignore——听起来是个好主意。

It spat out this summary of its actions so far:

它输出了到目前为止的行动总结：

I’ve created a GitHub Actions workflow for GitHub Pages that:

Creates a .github/workflows/pages.yml file that:

Runs on pushes to main branch

Checkouts the full git history (needed for gather_links.py)

Sets up Python 3.12

Runs both scripts in sequence to generate the files

Builds and deploys to GitHub Pages

Created .gitignore file to prevent the generated files from being committed

Now when you push to main, GitHub Actions will:

Run the gather_links.py script to create gathered_links.json

Run build_colophon.py to create colophon.html

Deploy both files (and all other files) to GitHub Pages

I checked to see how much it had cost:

我检查了一下花了多少钱：

> /cost
  ⎿  Total cost: $0.1788
     Total duration (API): 44.6s
     Total duration (wall): 10m 18.6s

So 17 cents and 45 seconds using the Claude API. (I got distracted, hence the 10m of total time.) Here’s the full transcript.

所以用了 17 美分和 45 秒的 Claude API。（我分心了，所以总时长是 10 分钟。）这是完整的记录。

The code didn’t look like it would irreversibly break anything, so I pushed it to GitHub to see what would happen.

代码看起来不会造成不可逆的破坏，所以我把它推送到 GitHub，看看会发生什么。

… and it worked! My new colophon page was live.

……然后它成功了！我的新版本说明页上线了。

There’s a catch. I watched the GitHub Actions interface while it was running and something didn’t look right:

但有个问题。我看着 GitHub Actions 界面运行时，发现有些不对劲：

GitHub Actions interface showing three completed actions. Test for Custom pages workflow for colophon,2 Deploy for that same name and another one called pages-build-deployment.

GitHub Actions 界面显示了三个已完成的 action。一个是“Test for Custom pages workflow for colophon”，一个是同名的“Deploy”，另一个是“pages-build-deployment”。

I was expecting that “Test” job, but why were there two separate deploys?

我预料到了那个“Test”任务，但为什么会有两个独立的部署？

I had a hunch that the previous, default Jekyll deploy was still running, while the new deploy ran at the same time—and it was pure luck of the timing that the new script finished later and over-wrote the result of the original.

我有一种预感，之前的默认 Jekyll 部署仍在运行，而新的部署同时进行——纯粹是运气好，新脚本完成得晚一些，覆盖了原始部署的结果。

It was time to ditch the LLMs and read some documentation!

是时候抛开 LLM，去读一些文档了！

I found this page on Using custom workflows with GitHub Pages but it didn’t tell me what I needed to know.

我找到了这篇关于在 GitHub Pages 中使用自定义工作流的页面，但它没有告诉我需要知道的信息。

On another hunch I checked the GitHub Pages settings interface for my repo and found this option:

凭着另一个直觉，我检查了我仓库的 GitHub Pages 设置界面，发现了这个选项：

GitHub Pages UI - shows your site is live at tools.simonwillison.net, deployed 7 minutes ago. - then under Buyld and deployment a source menu shows options for GitHub Actions or for Deploy from a branch (selected)

GitHub Pages UI - 显示你的网站已在 tools.simonwillison.net 上线，7分钟前部署。 - 然后在“构建和部署”下方的源菜单显示了 GitHub Actions 或从分支部署（已选择）的选项。

My repo was set to “Deploy from a branch”, so I switched that over to “GitHub Actions”.

我的仓库被设置为“从分支部署”，所以我把它切换到了“GitHub Actions”。

I manually updated my README.md to add a link to the new Colophon page in this commit, which triggered another build.

我手动更新了我的 README.md，在这个 commit 中添加了指向新版本说明页的链接，这触发了另一次构建。

This time only two jobs ran, and the end result was the correctly deployed site:

这次只运行了两个任务，最终结果是正确部署的网站：

现在只有两个进行中的工作流，一个是 Test，另一个是 Deploy to GitHub Pages。

(I later spotted another bug—some of the links inadvertently included <br> tags in their href=, which I fixed with another 11 cent Claude Code session.)

（我后来又发现了一个 bug——有些链接的 href= 中无意中包含了 <br> 标签，我通过另一次花费 11 美分的 Claude Code 会话修复了它。）

Update: I improved the colophon further by adding AI-generated descriptions of the tools.

更新：我通过添加 AI 生成的工具描述进一步改进了版本说明页。

准备好随时由人接管 #

I got lucky with this example because it helped illustrate my final point: expect to need to take over.

我这次的例子很幸运，因为它帮助阐明了我的最后一点：要预料到需要自己接管。

LLMs are no replacement for human intuition and experience. I’ve spent enough time with GitHub Actions that I know what kind of things to look for, and in this case it was faster for me to step in and finish the project rather than keep on trying to get there with prompts.

LLM 无法替代人类的直觉和经验。我在 GitHub Actions 上花了足够多的时间，知道要注意什么样的事情，在这种情况下，由我介入并完成项目比继续用 prompt 尝试要快得多。

最大的优势是开发速度 #

My new colophon page took me just under half an hour from conception to finished, deployed feature.

我的新版本说明页从构思到完成部署，花了我不到半小时。

I’m certain it would have taken me significantly longer without LLM assistance—to the point that I probably wouldn’t have bothered to build it at all.

我敢肯定，如果没有 LLM 的帮助，我会花更长的时间——甚至可能根本就不会费心去构建它。

This is why I care so much about the productivity boost I get from LLMs so much: it’s not about getting work done faster, it’s about being able to ship projects that I wouldn’t have been able to justify spending time on at all.

这就是为什么我如此看重 LLM 带来的生产力提升：这不仅仅是为了更快地完成工作，更是为了能够交付那些我原本认为根本不值得花时间去做的项目。

I wrote about this in March 2023: AI-enhanced development makes me more ambitious with my projects. Two years later that effect shows no sign of wearing off.

我在 2023 年 3 月写过这个：AI 增强开发让我对我的项目更有野心。两年后，这种影响丝毫没有减弱的迹象。

It’s also a great way to accelerate learning new things—today that was how to customize my GitHub Pages builds using Actions, which is something I’ll certainly use again in the future.

这也是加速学习新事物的好方法——今天我学了如何使用 Actions 自定义我的 GitHub Pages 构建，这肯定是我将来会再次用到的东西。

The fact that LLMs let me execute my ideas faster means I can implement more of them, which means I can learn even more.

LLM 让我能更快地执行我的想法，这意味着我能实现更多的想法，也就意味着我能学到更多。

LLM 放大已有的专业知识 #

Could anyone else have done this project in the same way? Probably not! My prompting here leaned on 25+ years of professional coding experience, including my previous explorations of GitHub Actions, GitHub Pages, GitHub itself and the LLM tools I put into play.

其他人能以同样的方式完成这个项目吗？可能不行！我在这里的 prompt 依赖于我 25 年以上的专业编程经验，包括我之前对 GitHub Actions、GitHub Pages、GitHub 本身以及我所使用的 LLM 工具的探索。

I also knew that this was going to work. I’ve spent enough time working with these tools that I was confident that assembling a new HTML page with information pulled from my Git history was entirely within the capabilities of a good LLM.

我也知道这会成功。我和这些工具打交道的时间足够长，所以我确信，用从 Git 历史中提取的信息来组装一个新的 HTML 页面，完全在一个好的 LLM 的能力范围之内。

My prompts reflected that—there was nothing particularly novel here, so I dictated the design, tested the results as it was working and occasionally nudged it to fix a bug.

我的 prompt 也反映了这一点——这里并没有什么特别新颖的东西，所以我规定了设计，在它工作时测试结果，并偶尔推动它修复一个 bug。

If I was trying to build a Linux kernel driver—a field I know virtually nothing about—my process would be entirely different.

如果我试图构建一个 Linux 内核驱动程序——一个我几乎一无所知的领域——我的流程会完全不同。

彩蛋：回答关于代码库的问题 #

If the idea of using LLMs to write code for you still feels deeply unappealing, there’s another use-case for them which you may find more compelling.

如果用 LLM 为你写代码这个想法对你来说仍然毫无吸引力，它们还有另一个你可能会觉得更有说服力的用例。

Good LLMs are great at answering questions about code.

好的 LLM 在回答关于代码的问题方面非常出色。

This is also very low stakes: the worst that can happen is they might get something wrong, which may take you a tiny bit longer to figure out. It’s still likely to save you time compared to digging through thousands of lines of code entirely by yourself.

这也是非常低风险的：最坏的情况就是它们可能会搞错一些东西，这可能会让你多花一点点时间去搞清楚。但与完全靠自己翻阅数千行代码相比，它仍然很可能为你节省时间。

The trick here is to dump the code into a long context model and start asking questions. My current favorite for this is the catchily titled gemini-2.0-pro-exp-02-05, a preview of Google’s Gemini 2.0 Pro which is currently free to use via their API.

这里的诀窍是把代码扔进一个长上下文模型，然后开始提问。我目前最喜欢用的是名字很上口的 gemini-2.0-pro-exp-02-05，这是 Google Gemini 2.0 Pro 的一个预览版，目前可以通过他们的 API 免费使用。

I used this trick just the other day. I was trying out a new-to-me tool called monolith, a CLI tool written in Rust which downloads a web page and all of its dependent assets (CSS, images etc) and bundles them together into a single archived file.

我前几天就用了这个技巧。我当时在试用一个对我来说很新的工具，叫做 monolith，这是一个用 Rust 写的 CLI 工具，它可以下载一个网页及其所有依赖资源（CSS、图片等），并将它们打包成一个单一的存档文件。

I was curious as to how it worked, so I cloned it into my temporary directory and ran these commands:

我很好奇它是如何工作的，所以我把它克隆到我的临时目录里，然后运行了这些命令：

cd /tmp
git clone https://github.com/Y2Z/monolith
cd monolith

files-to-prompt . -c | llm -m gemini-2.0-pro-exp-02-05 \
  -s 'architectural overview as markdown'

I’m using my own files-to-prompt tool (built for me by Claude 3 Opus last year) here to gather the contents of all of the files in the repo into a single stream. Then I pipe that into my LLM tool and tell it (via the llm-gemini plugin) to prompt Gemini 2.0 Pro with a system prompt of “architectural overview as markdown”.

我在这里使用了我自己的 files-to-prompt 工具（是 Claude 3 Opus 去年为我构建的），用来将仓库中所有文件的内容收集到一个数据流中。然后我把它通过管道传给我自己的 LLM 工具，并告诉它（通过 llm-gemini 插件）用“以 markdown 格式提供架构概览”这个系统 prompt 来提示 Gemini 2.0 Pro。

This gave me back a detailed document describing how the tool works—which source files do what and, crucially, which Rust crates it was using. I learned that it used reqwest, html5ever, markup5ever_rcdom and cssparser and that it doesn’t evaluate JavaScript at all, an important limitation.

这给了我一份详细的文档，描述了这个工具的工作原理——哪些源文件做了什么，以及至关重要的是，它使用了哪些 Rust crates。我了解到它使用了 reqwest、html5ever、markup5ever_rcdom 和 cssparser，而且它根本不执行 JavaScript，这是一个重要的限制。

I use this trick several times a week. It’s a great way to start diving into a new codebase—and often the alternative isn’t spending more time on this, it’s failing to satisfy my curiosity at all.

我每周都会用这个技巧好几次。这是开始深入研究一个新代码库的好方法——而且通常，如果不这样做，替代方案不是花更多时间，而是根本无法满足我的好奇心。

I included three more examples in this recent post.

我在最近的这篇文章中还包含了另外三个例子。