1. **Analyze the Request:** * **Role:** News summarization expert. * **Task:** Summarize the provided article concisely (2-4 sentences). * **Constraints:** Focus on important info, *do not* copy-paste, *do not* just restate the title (add new info), answer in English. * **Input Title:** "Build a deep researcher and learn DSPy Signatures and Modules" * **Input Source:** "hackernews" * **Input Text:** Only the title is provided effectively. The "Article Text" just repeats the title. 2. **Analyze the Input Data:** * The input "Article Text" is extremely sparse. It literally just repeats the title. * *Problem:* I cannot summarize information that isn't there. * *Context:* The prompt implies this is a submission or link found on Hacker News. Usually, Hacker News items have a URL. Without the URL or the actual content of the blog post/tutorial, I only know the topic is building a "deep researcher" agent and learning "DSPy Signatures and Modules." 3. **Formulate Strategy:** * Since I cannot hallucinate details not present in the text, I have to rely on general knowledge of what these terms usually mean in the context of AI engineering (DSPy is a framework for programming LLMs, "deep researcher" usually implies an agent that can search and synthesize). * *However*, the strict instruction "DO NOT copy and paste passages" and "Add new information beyond the title" is tricky when the provided text *is* just the title. * *Correction/Refinement:* Usually, in these RAG (Retrieval-Augmented Generation) scenarios or few-shot prompts, if the text is missing, I should mention the *topic* but acknowledge
본문
Deep research agents are amazing tools. Given a research task or question, they scour the web, synthesizing data from numerous sources to prepare a nicely packaged report. There are many deep research agents, each with their own quirks and design decisions. ChatGPT sports one, as does Claude. Google has a couple. Each is unique in how they plan, explore, clarify, and synthesize. Today we're going to build several deep research agents, increasing their complexity as we go. By the end of this article you'll have a better idea how these agents work and how to build them. More importantly, you'll also learn plenty about DSPy. You'll understand how it helps devs prototype quickly, iterate easily, and build reliable software. All the code for these examples is accessible in a GitHub repo. Introducing Signatures & Modules as We Build Our First Agent At a high level, our researcher does one thing: it accepts a research request and returns a research report. It's easy to imagine additional requests or constraints (include at least 3 sources, make it at least 500 words), but to begin we'll start with the minimum viable task description. Today, we're using DSPy to implement our agent. DSPy is an open-source software framework that lets us declaratively describe the goals of our program, rather than how it should accomplish the task. Think of DSPy as a higher-level language for AI programming, similar to Python compared to assembly. In Python we don't have to manage our memory, track our pointers, or handle register allocation. In DSPy we don't have to hand-craft our prompts, use weird context hacks, manually parse outputs, or manage the details of our test time strategy. So let's focus on what we want: we want a program that takes in a research request (as text) and outputs a research report (also as text). In DSPy, we can express this as a Signature, like so: researcher_signature = "research_request: str -> report: str" A Signature can be a string or a class (more on that later). Here we're giving our input a name (research_request ) and specifying its type (str ). Our output, delineated by the -> , is also named and typed (report and str , respectively). When I first read about DSPy Signatures, I immediately began looking through the documents for a list of "terms" I could use to describe my inputs and outputs. I thought there was a preset list! But this was old software thinking: we can name our inputs and outputs whatever we want, and the LLM will infer our goals from them. This means it's important that we give our Signature attributes descriptive names; ones that give the LLM a clue to what we want. Amazingly, research_request and report turn out to be sufficiently detailed for our goals! No further instruction is needed. On its own, a Signature does nothing other than express what you want. It's intentionally very simple. If we send it to an LLM as-is, we'll be rather disappointed. DSPy provides functions that use our signature to perform operations with an LLM, the most essential of which is Module. A DSPy Module executes your task, as defined by your signature. In AI programming, this means it manages test-time execution, including prompting techniques, tool definitions, environment procurement, and template formatting. It also handles LLM responses, either continuing a multi-turn process or extracting and formatting the final output. There are several predefined Modules that come with DSPy, the most basic of which is Predict . There's no test-time strategy here; Predict just formats your Signature's information into a prompt template, inserts your input parameters, and manages data extraction: researcher = dspy.Predict(researcher_signature) Once we've set up a language model, we can use our program. Together, the whole thing looks like this: import dspy # Connect to and set an LLM lm = dspy.LM('anthropic/claude-haiku-4-5', api_key='ANTHROPIC_API_KEY', max_tokens=64000) dspy.configure(lm=lm) # Create our program researcher_signature = "research_request: str -> report: str" researcher = dspy.Predict(researcher_signature) # Call our program result = researcher(research_request="Write a history of Coyote Hills, a park in the East Bay Regional Parks District in California.") print(result.report) Behind the scenes, the Predict module prepares inputs to send to the LLM. In practice, this is often a string formatting our inputs and additional instructions. In this specific case, first a system prompt is generated: Your input fields are: 1. `research_request` (str): Your output fields are: 1. `report` (str): All interactions will be structured in the following way, with the appropriate values filled in. [[ ## research_request ## ]] {research_request} [[ ## report ## ]] {report} [[ ## completed ## ]] In adhering to this structure, your objective is: Given the fields `research_request`, produce the fields `report`. Then it sends the user message: [[ ## research_request ## ]] Write a history of Coyote Hills, a park in t