Best Free AI ChatGPT (right now)

This space is moving very quickly and it's exciting to see. First GPT stands for generative pre-trained transformer so this is a chat bot that uses a generative pre-trained transformer. It has nothing to do with OpenAI beside maybe some of the data it's been trained on, but that's another discussion.

The reason I like this model is I can run it locally on a RTX 2080 Ti. This is not possible with many LLMs (large language models). It also runs Vicuna with reported 90% chatgpt quality. You can find very detailed information about this here. https://vicuna.lmsys.org/

I will not be writing a tutorial here but I will be sharing the tutorials I followed. I will include both gpu and cpu versions.

First the CPU. This is the video I used to install llama.cpp https://youtu.be/cCQdzqAHcFk?t=604 this is timestamped. The other parts of this video I found a complete waste and the result was terrible. The llama.cpp runs on the CPU so it is slow, but works on MacOS, Linux, and Windows. The output of this model was amazing and is very quick to get running. Here is an example of me asking about a private class in iOS.

Here I ask it how to take a snapshot of the homescreen.

Needless to say I was very impressed by the responses. This ran on the CPU though and was slow. I then decide to try to get it to work on the GPU.

Running on a GPU This is tough for many models as they just take too much VRAM (video ram) to actually run. Then I found this post. https://medium.com/@martin-thissen/vicuna-13b-best-free-chatgpt-alternative-according-to-gpt-4-tutorial-gpu-ec6eb513a717

This uses a quantized version of Vicuna. Which changes the VRAM requirement from 28 GB to about 10 GB. This was great for my 2080 ti, but it does degrade the model slightly. I do have this model running now and it's very fast compared to when I ran it on the GPU. I have not noticed any degradation in my prompts.

When trying to get this to run I did run into issues as this computer did not have cuda setup correctly. I would make sure that cuda is installed and working correctly before attempting this tutorial. Through some trial and error I did get this working after the fact, but I think if I setup cuda before it would have been much less of a headache. Note this is a fork of https://github.com/lm-sys/FastChat I looked over the code and changes and trust the source at the time of writing this, but since it is a fork I do not expect the creator to keep it up to date like FastChat. FastChat has many contributors and seems they are updating things very quickly. I will continue to track the original FastChat repo and will switch when they have added support for these quantized models.

If you don't have the funds to shell into things like OpenAI ChatGPT or you are waiting until these models mature to shell money into them, then these are the best options I found so far. I've tried many. I'm very happy for those creating these open source models.

Thanks for reading, if you have questions I have an ai-chat in my discord. Discord

Buy JunesiPhone a coffee

More from JunesiPhone