LLMs and Data-to-Text




  • Since then I’ve had lots of discussions with people, in both academic and commercial contexts, about whether it makes sense to use LLMs to generate texts that summarise data. Below I give a high-level summary of my current views on this topic. (View Highlight)
  • The latest versions of GPT and LLMs have improved a bit, but they are still not very good at analytics and extracting insights from data. And perhaps we shouldn’t expect them to be good at analytics; after all, they are language models! (View Highlight)
  • the best approach is to do analytics and insight creation separately (outside the LLM), and then provide these insights to the LLM as part of its input data (View Highlight)
  • I think asking LLMs to do analytics fundamentally makes little sense, but its certainly possible that LLMs will get better at discourse-level issues and hence generate better long-form texts. (View Highlight)
  • LLMs are very good at microplanning and surface realisation, at least in academic leaderboard contexts. However, there are some important caveat about real-world usage. (View Highlight)
  • LLMs currently don’t do nearly as well at document planning, but perhaps this will change over time. (View Highlight)
  • LLMs are poor at signal analysis and data interpretation, and it is a mistake to expect them to do these tasks. (View Highlight)
  • Of course in the real-world we don’t need to be purists and insist on 100% LLM solutions, we can build systems which use LLM technology where it makes sense and other technologies elsewhere. (View Highlight)