A note on the importance of prompt and template formatting - as seen from starcoder

noneabove1182 · 2 years ago

A note on the importance of prompt and template formatting - as seen from starcoder

einsteinx2@programming.dev · 2 years ago

Thanks this is really helpful. I recently started playing with that web ui and various models and was having really terrible results. I think this is exactly why.

micheal65536@lemmy.micheal65536.duckdns.org · 2 years ago

More generally, make sure that you have the correct template format selected in the chat settings when you’re using a conversational model.

Some models supposedly require an additional “instruction” template where the “instruction” is something like "Continue the following conversation between and by writing a single reply for " although personally I get better results without this even on models that are instruction-tuned rather than conversation-tuned. Most models that have any form of basic tuning beyond a bare “continue/complete the text” model (which requires an entirely different approach to prompting) seem to be able to understand the basic format/concept of a conversation.

micheal65536@lemmy.micheal65536.duckdns.org · 2 years ago

How would you ask for a follow-up change using this instruction template?

Personally I interpreted the request as “if it’s been less than 2 minutes, sleep/block until it’s 2 minutes since last time” rather than dropping/discarding the string immediately and continuing. Suppose this is what I had actually wanted, can you ask the model to modify its code accordingly without having to go back and edit the original prompt to start over?

I find that a lot of programming questions require multiple rounds of refinements. I tend to favor models that are able to modify existing code in a back-and-forth discussion, and that are capable of writing out just the modified parts of their code with each change to save on time and token count (seriously, so many models will insist on repeating the entire thing no matter how firmly you tell them not to - if you’re lucky, they’ll actually include the changes in their second reply instead of thereafter getting stuck in a loop of writing out identical code every time).

noneabove1182 · 2 years ago

Your best bet is likely going to be editing the original prompt to add information until you get the right output, however, you can also get clever with it and add to the response of the model itself. Remember, all it’s doing is filling in the most likely next word, so you could just add extra text at the end that says “now, to implement it in X way” or "I noticed I made a mistake in Y, to fix that " and then hit generate and let it continue the sentence

VraethrDalkr · 2 years ago

There’s actually much more going on than just filling in the most likely next word. We don’t fully understand how LLMs work. I found this article very interesting: https://arstechnica.com/science/2023/07/a-jargon-free-explanation-of-how-ai-large-language-models-work/

noneabove1182 · 2 years ago

Sure it’s a simplistic view, I meant it more that you can guide it towards completing a sentence, but you’re right that it’s worth recognizing what’s actually going on!

noneabove1182 · 2 years ago

That is interesting though how you interpreted the question, I think the principle of “rate limiting” is playing in my favour here where typically when you rate limit something you don’t throw it into a queue, you deny it and wait for the next request (think APIs)

micheal65536@lemmy.micheal65536.duckdns.org · 2 years ago

I have also encountered “rate limits” where the request is not dropped/errored out but is simply stalled until the timeout expires.

Usually this happens in a client library though rather than over the network itself, where the library blocks the thread until it knows that the rate-limit is due to expire before issuing the request to a server (and then blocks and reissues again if the server still returns a rate-limit error). This allows the application developer to know that their request will complete “at some point” rather than having to handle the error and timeout themselves. Usually this is preferred in single-threaded application, or one where all the API stuff happens on a single thread (i.e. one request at a time, no new request is issued until the previous request has completed).