How to run AI directly on your own PC

in LeoFinancelast year

ChatGPT is OpenAI's first public introduction to chat based AI. By now most people have heard of it and even used it. It has enabled every day users to have access to bleeding edge artificial intellegence. You can even use it for free! To use their latest models, you will need to pay $20 a month for a subscription.

ChatGPT isn't the only game in town though, open source projects have been closing the gap between big business funded AI and community developed AI. As I said in my previous post, I actually run a lot of modesl locally on my machine. This post will go into detail on how you can do this as well. Keep in mind, for good performance you will need a decent GPU, the faster the better. A lot of these models will run on CPU, but they will be a lot slower to respond and process your requests.

Introducing Ollama

Ollama is the easiest way to get into running community provided AI models. In fact, it is so easy, I can tell you how to do it in two lines.

That's it, you are now running the latest version of the Llama2 AI model locally on your machine.

This is running the Llama2 model, which has a lot of restrictions and isn't very good. There are a lot better models, and some are for specific purposes. Let's check them out.

If you head over to https://ollama.ai/library, you can find a list of the models supported by Ollama. You are not limited to these models, but these have been tested.

One I recommend checking out is Mistral OpenOrca, this is a great model that is really small and will run on most GPUs without a problem.

After pulling the model with ollama run mistral-openorca you will be left at a chat prompt.

One thing you might want to do, is see how well a model is performing on your machine. If you type /set verbose you will get a summary at the end of your requests.

With an nVidia 3090 I tend to get a little over 100 tokens per second. This is faster than a typical user can read, and is a very acceptable speed. Most models have 7 billion parameters, these typically require around 8GB of VRAM to run. 33 billion is usually the next class of model and will require at least 16GB or more of VRAM.

Using a 33 billion parameter model on my nVidia 3090 I am looking at around 30-33 tokens/second. This is a lot slower, still usable and pretty close to what a human can read.

total duration:       14.078252008s
load duration:        682.399µs
prompt eval count:    357 token(s)
prompt eval duration: 677.234ms
prompt eval rate:     527.14 tokens/s
eval count:           408 token(s)
eval duration:        13.398543s
eval rate:            30.45 tokens/s

From here, the next step would be using a 70 billion parameter model, but on a single nVidia 3090 with only 24G of ram, this isn't doable. There are ways to do this with system ram, and even a mix of VRAM and system ram, and even CPU only. So in theory, I could get it working, but performance would be awful. This is where dedicated cards for AI are critical to using larger models. The cost for these cards goes up expontially, many starting at around $10,000 and those only have around 40GB of ram.

There is another aspect of open source models I didn't mention, this is training and fine tuning. You can take existing models and tune them to domains you are interested in. Let's say you are doctor, and you want to use AI to assist you in diagnosing patients. You can fine tune a model to review hundreds, thousands, and even millions of books and documents to learn your specific industry. This will perform better for this use case than other models, even ChatGPT if done correctly. This process though is extremely expensive and hardware dependent.

Companies are buying up hundreds of thousands of GPUS to do this. For example, Meta has disclosed they are looking to buy 350,000 H100 GPUs priced at around $30,000. This will double their current AI infrastructure.

Tools like Ollama and LM Studio allow anyone to install and run models on their own machines. Many of these models have distinct advantages over other commerical models. To really take advantage of these models though, you will need to learn how well they perform to your prompts and potentially tailor them to the specific model to get good results.

Some models punch well above their weight class, like Mistral, but some you will easily tell it's a small model and will have difficulty getting results similar to chatGPT and larger models. Some models can come close to or exceed ChatGPT 3.5 Turbo (default ChatGPT model) but nothing you can download can really compete with ChatGPT 4 at this point. It is a massive model with lots of training, but the gap is closing quickly.

Google recently wrote an article commonly referred to as "There is no moat". In this article, Google goes on to say they have no "secret sauce", nor does Open AI, that will protect them from the open source community surpassing them. It won't be any time soon, but it is very likely one day. If this interests you, I highly recommend you read the article.

Posted Using InLeo Alpha

Sort:  

I love the direction you are taking your writing in. This is very informative stuff.

You provide a much deeper perspective than I have on this while not overwhelm us with too much technical jargon.

Keep these articles coming. It is easier to bring a 5,000 foot view; you are posting a 100 foot view without getting too techie. That is needed.

Thank you so much for sharing this information. This is very informative and helpful.

Thank you very much for this input. While I have been trying to keep up with the commercial offerings to work out fitment at work I must say I completely missed observing what was happening in opensource. This post is most interesting and has got me excited to check out that area too.

Ever heard of GPT4All? I use it on my older but high end laptop and it works great for me. Its cool that we have so many options and they are open source!

#WeHaveTheTools

Yes, but I like ollama a lot better.

Good articles, dude. Also, Fuck Zukcerberg.

!DHEDGE

you have 0.0 vote calls available today, your vote calls will reset at next snapshot. You can buy DHEDGE on Tribaldex or earn some daily by joining one of our many delegation pools at app.dhedge.cc to increase your daily amount.

One day we will have our own personal AI that comes along every where we go, a bit like the beacon 23 tv show.

Wow amazing

Sounds exciting! Did play with AI around more websites that offer it at no cost but most of them are limited in uses so I'm always looking for something better.

A Macbook Pro with the M1 Max chip and 64Gb unified RAM can handle a pretty large parameter model.

mac studio's with 96-128g of ram are pretty popular for this as you can use all but 4G for vram.

This is going to be very helpful for many in the workforce, I can see the benefits already while it will cost jobs there are limits to what the AI can currently deliver on. Requires manual check but for project managers. Man this tool saves ALOT of time

Good tutorial.

Congrats!

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).


 
You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support. 

The AI space is wild. I did not think I would be using it but I use chatGPT to wrote up task at work that I don't want to do. My review was written by chatGPT and my manager was very impressed by the detail and effort that went into it. Littledoes he know it only took seconds.