Frequency of starting New Conversations with LLM

By Andrej Karpathy

When working with LLMs I am used to starting “New Conversation” for each request. But there is also the polar opposite approach of keeping one giant conversation going forever. The standard approach can still choose to use a Memory tool to write things down in between conversations (e.g. ChatGPT does so), so the “One Thread” approach can be seen as the extreme special case of using memory always and for everything. The other day I’ve come across someone saying that their conversation with Grok (which was free to them at the time) has now grown way too long for them to switch to ChatGPT. i.e. it functions like a moat hah. LLMs are rapidly growing in the allowed maximum context length *in principle*, and it’s clear that this might allow the LLM to have a lot more context and knowledge of you, but there are some caveats. Few of the major ones as an example: – Speed. A giant context window will cost more compute and will be slower. – Ability. Just because you can feed in all those tokens doesn’t mean that they can also be manipulated effectively by the LLM’s attention and its in-context-learning mechanism for problem solving (the simplest demonstration is the “needle in the haystack” eval). – Signal to noise. Too many tokens fighting for attention may *decrease* performance due to being too “distracting”, diffusing attention too broadly and decreasing a signal to noise ratio in the features. – Data; i.e. train – test data mismatch. Most of the training data in the finetuning conversation is likely ~short. Indeed, a large fraction of it in academic datasets is often single-turn (one single question -> answer). One giant conversation forces the LLM into a new data distribution it hasn’t seen that much of during training. This is in large part because… – Data labeling. Keep in mind that LLMs still primarily and quite fundamentally rely on human supervision. A human labeler (or an engineer) can understand a short conversation and write optimal responses or rank them, or inspect whether an LLM judge is getting things right. But things grind to a halt with giant conversations. Who is supposed to write or inspect an alleged “optimal response” for a conversation of a few hundred thousand tokens? Certainly, it’s not clear if an LLM should have a “New Conversation” button at all in the long run. It feels a bit like an internal implementation detail that is surfaced to the user for developer convenience and for the time being. And that the right solution is a very well-implemented memory feature, along the lines of active, agentic context management. Something I haven’t really seen at all so far. Anyway curious to poll if people have tried One Thread and what the word is.

Dan Calle replied

I manage hundreds (thousands?) of conversations that fall into four groups: 1) long-running, bookmarked – basically my staff 3 examples: – I have an AI personal trainer/nutritionist I always return to for training/nutrition questions. – I have one conversation that helped me build my current home Linux box, and I return to it for any HW/OS/SW questions related to it. – I have several AI professors I use to learn various subjects – one per subject 2) useful, may return to, but not necessarily examples: – I saw nice sweet potatoes at the grocery store – asked about sweet potato soup, made soup. A week later, I saw a nice pumpkin – wanted to make a similar soup. Remember that convo, which already knows my equipment and preferences, returned to that conversation for a different soup. – in general, if I think I’ve asked a question before, and the context from before will save me some time now, I use search to look at previous conversations, and might continue one of them rather than start a new one 3) One-off questions: I usually ask them in a fresh conversation 4) Truly throwaway questions. Not only do I start a fresh conversation, but I will usually archive/delete it when I’m done. This is when the subject is pretty trivial, and I view it as clutter. Special case for some long-running conversations: I have also noticed that sometimes overly long context can start to produce weird effects. The LLM starts to hallucinate more, is less reliable about remembering details, and so on. In situations like this I sometimes ask it to generate a detailed summary of everything we have been working on, and I may ask follow-up questions, and then I paste the results into a new conversation and continue from there.

Andrej Karpathy

Actually this seems quite interesting thank you. You basically create “Projects” and group queries into them (in the form of long convo), combined with a one-off default, and a manual “summarize and move on” to manage *too long* contexts.

𝐍𝐒𝐟𝐞𝐦𝐒 𝐀π₯𝐩𝐒𝐧𝐞

with a well-written persistent memory RAG implementation, the system will retain memory of key moments and store references of relevant information to be used as context later on.

if they take away my ability to start a fresh thread I will riot

Actually I feel the same way btw. It feels a little bit irrational (?) but real. It’s some (illusion?) or degree of control and some degree of interpretability of what is happening when I press go.

Spreadsheets were the software 1.0 version of this. They can be for the LLM age. 1/ scalability 2/ interpretable 3/ capable of complexity Each agent step shows you its inputs and outputs. The effective functional programming makes it easy to build flows. Small context for each action ensures accuracy.

By:

Posted in:


Design a site like this with WordPress.com
Get started