📢
29
c/ai-innovations•susan_wardsusan_ward•15d ago

Finally got a local LLM to run on my 5 year old laptop... barely.

I've been reading about running models locally for a while but my laptop has a GTX 1060 with 6GB of VRAM (you know, ancient by AI standards). I tried a few smaller models like Phi-3 and they were just too sluggish to be useful. Then I stumbled on this thing called quantization where you compress the model down to 4-bit precision. I loaded up a 7B parameter model with that setting and it actually generates responses in under 30 seconds now. Has anyone else found a particular quantized model that works well on older hardware?
2 comments

Log in to join the discussion

Log In
2 Comments
michael693
michael69315d ago
30 seconds is fast, everything's getting compressed these days like my grocery budget.
5
alice_harris35
oh man, the GTX 1060 was a beast back in the day. i ran phi-3 on my old 1050 ti and it was like watching paint dry. quantization is a lifesaver though, the 4-bit mistral 7b runs okay on mine now, maybe 20-30 seconds per response. the 2.7b phi-2 model is actually pretty snappy if you haven't tried that one yet.
3