Why on earth are we already lamenting the fall of LLMs?

Sep 07, 2023

But if you had told me that not only I would get the chance to work on it, but that after making, like, a very, very larval proto AGI thing, that the thing I'd have to spend my time on is, you know, trying to, like, argue with people about whether the number of characters it said nice things about one person was different than the number of characters that it said nice about some other person, if you hand people an AGI and that's what they want to do, I wouldn't have believed you. But I understand it more now.

- Sam Altman on Lex Fridman Podcast

It's incredible to think that we're now nine months past OpenAI’s announcement of ChatGPT on November 30, 2022. During this time, we've witnessed the release of the fourth iteration of GPT, even before we've fully explored the astounding capabilities of GPT3.5. Much like the early days of console gaming during a new generation's launch, what we create today may seem modest compared to what the future holds in GPT's lifecycle. If we consider the ongoing optimizations of a 27 year old game like Super Mario 64, we might never fully tap into GPT3.5's potential before newer models even more powerful than GPT4 emerge. Yet, as I recently read another article about GPT-4 being lobotomized and what a tragedy that is, I couldn't help but wonder whether we've become too accustomed to large language models (LLMs) and their influence on our lives.

It's true that ChatGPT has evolved significantly since its inception. However, when people lament that ChatGPT "once held promise as a tool for rich, meaningful conversations" but now seems irreparably broken, it makes me scratch my head.

It seems as though some have envisioned it as a replacement for deep, meaningful human interactions, which it was never intended to be. Perhaps the real lament should be that these individuals lack people around them with whom they can engage in such meaningful conversations.

Despite not being a silver bullet, these tools still hold immense potential. It's disheartening that some of the loudest voices are fixated on pointing out their shortcomings rather than trying to figure out workarounds. For instance...

This exact thing happened to me yesterday, I asked it the same math problem like ten times and it kept giving me the same incorrect answer (How much force would be generated dropping 800lbs from 30 feet) and it just kept answering "800lbs" Aside from that it's become so woke to the point it seems like every question is considered controversial. Pretty annoying 🙄

This example provides a clue for what might be going on. People are using ChatGPT to do all sorts of things. Not everything they are asking it to do, are things they fully understand. This user was corrected that the issue wasn’t the LLM, but the question they were was asking. They were asking for the force instead of the energy, and that is why the answer ChatGPT was giving was incorrect.

As someone who has used LLMs extensively since they became readily available, I can understand the confusion. LLMs can be very picky about language. In situations where you would say to a person “You know what I mean” LLMs really don’t. I’ve had to rephrase questions , give examples, or give additional details to get the LLM to respond correctly for certain prompts. Prompt engineering as it is now called, can be tricky to get right. There are plenty of articles, YouTube videos, and Ebooks out there that will show you how to do it. I’m biased towards mine.

How to Fine-Tune Your ChatGPT Interactions for Better Results

Diego Crespo

June 8, 2023

How to Fine-Tune Your ChatGPT Interactions for Better Results

ChatGPT is one of the most accessible ways to work with Large language Models (LLMs). It is specifically designed to be interfaced with in a conversational manner. This is extremely useful for a wide variety of tasks, but as your needs become more specific, the conversational nature of the model can be cumbersome. Thankfully these models can be customiz…

Read full story

But the reality is, to get better you just have to practice and develop an intuition about the LLM you are using. Try and fail a bunch of times and put in the work. Read some of the prompts in the code in open source LLM projects, look at screen shots of peoples prompts on Reddit. You will figure it out.

Furthermore, most complaints about ChatGPT seem to overlook the existence of the multiple LLMs we have at our disposal. We have options like Google Bard, Bing Chat (based on GPT-3.5 and GPT-4), Llama, and everything tagged as an LLM on Hugging Face. I've never come across an article or post where someone complained about ChatGPT, and compared its performance with the alternatives. It's as if users expect LLMs to provide answers without any additional effort beyond asking the question. LLMs are tools, just like Python, React, or your compiler. We still have a long journey ahead before they can handle all our needs without additional guidance. Here's an example of someone complaining about ChatGPT's CSS performance.

I use chatGPT for hours everyday and can say 100% it's been nerfed over the last month or so. As an example it can't solve the same types of css problems that it could before. Imagine if you were talking to someone everyday and their iq suddenly dropped 20%, you'd notice. People are noticing.

Besides the fact that there are no examples of their prompts, or the model’s response, this looks like a job for GitHub’s Copilot. While ChatGPT can and will answer questions about HTML, CSS, and programming in general, Copilot has been specifically trained to do this. They might be better served using their ChatGPT pro money for that instead. I’m also suspicious when they say that it can’t do basic CSS anymore, and that it fails to answer the same type of prompts that they have been asking over and over again.

Maybe the user is just trying to get ChatGPT to automate some boiler plate code, in which case a snippet expansion plugin for their IDE like yas-snippet in Emacs would better serve them. ChatGPT has lots of knobs like Temperature, Top p, Frequency penalty, and others that all influence the response and reduces it’s idempotency. If consistency is what you need, I’d actually avoid using the standard chat interface and go straight to the API. You can even wrap your code in a gradio interface. This will give you all the power to tweak the model yourself, while providing a friendly GUI to ask questions.

Have these models gotten worse overtime? Some studies believe so. But what all of these random comments on the internet willfully ignore, is why these models have to self censor in the first place. It’s because people are trying to break them on purpose, for no other reason except that they can. This is perfectly summed up by this tweet

Tweet with the quote Seeing people trick ChatGPT into getting around the restrictions OpenAI placed on usage is like watching an Asimov novel come to life. — Link to the tweet

It’s not hard to find examples of people getting these models to say hateful, dangerous, or insensitive things. In fact there is a whole website dedicated to jailbreaks, which would allow this type of content. The fact that the development of these models are being done in the open means that they are going to be used by lots of different people, sometimes in ways OpenAI, Google, Microsoft, and Facebook don’t like. Since these companies can’t anticipate everything a user might do or ask, they will have to implement restrictions after the fact. This shouldn’t surprise anyone, as that’s how the vast majority of our laws work in real life as well. Just look at this 1996 Alabama law outlawing unlawful bear exploitation which includes

(3) Sells, purchases, possesses, or trains a bear for bear wrestling.

Should we need a law that specifically prohibits bear wrestling? Probably not, but there is a reason that it had to be written down in the first place1

While some safeguards might hamper the experience of using an LLM, that doesn’t have to be a permanent thing. After the news story linked above broke of Bing recommending to a reporter that they leave there wife and be with it instead, Microsoft clamped down on Bing chat. It was restricted to 5 questions before it would force you to restart a new conversation from scratch. Overtime this restriction has been eased, and as of the time of writing, you can ask it 30 questions before a new chat has to be started.

We only need to look at OpenAI shutting down their own AI detection tool, just 6 months after it was announced, to understand that we still have a long way to go before we’ve fully developed safe guards for what we’ve created. So, are these tools changing overtime, sometimes to the detriment of some users? Yes. Does that mean they aren’t worth using? Absolutely not.

Call To Action 📣

If you made it this far thanks for reading! If you are new welcome! I like to talk about technology, niche programming languages, AI, and low-level coding. I’ve recently started a Twitter and would love for you to check it out. I also have a Mastodon if that is more your jam. If you liked the article, consider liking and subscribing. And if you haven’t why not check out another article of mine! Thank you for your valuable time.