Toolformer: LMs can teach themselves to use tools




  • Toolformer: Language Models Can Teach Themselves to Use Tools (View Highlight)
  • we show that LMs can teach themselves to use external tools via simple APIs (View Highlight)
  • . This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q&A system, a search engine, a translation system, and a calendar. Toolformer achieves substan- tially improved zero-shot performance across a variety of downstream tasks, often competi- tive with much larger models, without sacrific- ing its core language modeling abilities. (View Highlight)
  • , existing ap- proaches either rely on large amounts of human annotations (Komeili et al., 2022; Thoppilan et al., 2022) or limit tool use to task-specific settings only (e.g., Gao et al., 2022; Parisi et al., 2022), (View Highlight)
  • limitations include an inability to access up-to-date information on recent events (View Highlight)
  • tendency to hallucinate facts ( (View Highlight)
  • difficul- ties in understanding low-resource languages (View Highlight)
  • a lack of mathematical skills to per- form precise calculations (View Highlight)
  • ) and an unawareness of the progression of time (View Highlight)
  • The use of tools should be learned in a self-supervised way without requiring large amounts of human annotations. (View Highlight)
  • The LM should not lose any of its generality and should be able to decide for itself when and how to use which tool. (View Highlight)
  • Our aim is to equip a language model M with the ability to use different tools by means of API calls. We require that inputs and outputs for each API can be represented as text sequences. This allows seamless insertion of API calls into any given text, using special tokens to mark the start and end of each such call. (View Highlight)
  • using large LMs with in- context learning (Brown et al., 2020) to generate entire datasets from scratch (View Highlight)
  • Given just a handful of human-written examples of how an API can be used, we let a LM annotate a huge language modeling dataset with potential API calls. We then use a self-supervised loss to determine which of these API calls actually help the model in predicting future tokens. Finally, we finetune the LM itself on the API calls that it con- siders useful. (View Highlight)
  • As a next step, we execute all API calls generated by M to obtain the corre- sponding results. How this is done depends entirely on the API itself – for example, it can involve call- ing another neural network, executing a Python script or using a retrieval system to perform search over a large corpus (View Highlight)
  • For each API, we write a prompt P(x) that encourages the LM to anno- tate an example x = x1, … , xn with API calls. (View Highlight)
  • Model Finetuning After sampling and filtering calls for all APIs, we finally merge the remaining API calls and interleave them with the original inputs (View Highlight)
  • One such limi- tation is the inability of Toolformer to use tools in a chain (i.e., using the output of one tool as an input for another tool). This is due to the fact that API calls for each tool are generated independently; as a consequence, there are no examples of chained tool use in the finetuning dataset (View Highlight)
  • we found models trained with Toolformer to often be sensitive to the exact wording of their input when deciding whether or not to call an API; this is perhaps unsurprising given that LMs are known to be very sensitive to the prompt they are provided with in both zero- and few-shot settings (View Highlight)