Finally got a local LLM to run on my 5 year old laptop... barely.

I've been reading about running models locally for a while but my laptop has a GTX 1060 with 6GB of VRAM (you know, ancient by AI standards). I tried a few smaller models like Phi-3 and they were just too sluggish to be useful. Then I stumbled on this thing called quantization where you compress the model down to 4-bit precision. I loaded up a 7B parameter model with that setting and it actually generates responses in under 30 seconds now. Has anyone else found a particular quantized model that works well on older hardware?

2 comments

2 Comments

michael69315d ago

30 seconds is fast, everything's getting compressed these days like my grocery budget.

alice_harris3514d ago

oh man, the GTX 1060 was a beast back in the day. i ran phi-3 on my old 1050 ti and it was like watching paint dry. quantization is a lifesaver though, the 4-bit mistral 7b runs okay on mine now, maybe 20-30 seconds per response. the 2.7b phi-2 model is actually pretty snappy if you haven't tried that one yet.