Skip to content Skip to footer
0 items - $0.00 0

DeepSeek-R1-671B-Q4_K_M with 1 or 2 Arc A770 on Xeon by colorant

DeepSeek-R1-671B-Q4_K_M with 1 or 2 Arc A770 on Xeon by colorant

DeepSeek-R1-671B-Q4_K_M with 1 or 2 Arc A770 on Xeon by colorant

10 Comments

  • Post Author
    superkuh
    Posted March 6, 2025 at 1:10 am

    No… this headline is incorrect. You can't do that. I think they've confused the performance of running one of the small distills to existing smaller models. Two Arc cards cannot fit a 4 bit k-quant of a 671b model.

    But a portable (no install) way to run llama.cpp on intel GPUs is really cool.

  • Post Author
    ryao
    Posted March 6, 2025 at 1:23 am

    Where is the benchmark data?

  • Post Author
    zamadatix
    Posted March 6, 2025 at 1:31 am

    Since the Xeon alone could run the model in this set up it'd be more interesting if they compared the performance uplift with using 0/1/2..8 Arc A770 GPUs.

    Also, it's probably better to link straight to the relevant section https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quic…

  • Post Author
    colorant
    Posted March 6, 2025 at 1:37 am

    https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quic…

    Requirements (>8 token/s):

    380GB CPU Memory

    1-8 ARC A770

    500GB Disk

  • Post Author
    jamesy0ung
    Posted March 6, 2025 at 1:54 am

    What exactly does the Xeon do in this situation, is there a reason you couldn't use any other x86 processor?

  • Post Author
    7speter
    Posted March 6, 2025 at 2:10 am

    I’ve been following the progress Intel Arc support in Pytorch is making, at least in Linux, and it seems like if things stay on track, we may see the first version of pytorch with full Xe/Arc support by around June. I think I’m just going to wait until then instead of dealing with anything ipex or openvino.

  • Post Author
    CamperBob2
    Posted March 6, 2025 at 2:39 am

    Article could stand to include a bit more information. Why are all the TPS figures x'ed out? What kind of performance can be expected from this setup (and how does it compare to the dual Epyc workstation recipe that was popularized recently?)

  • Post Author
    yongjik
    Posted March 6, 2025 at 3:02 am

    Did DeepSeek learn how to name their models from OpenAI.

  • Post Author
    anacrolix
    Posted March 6, 2025 at 3:07 am

    Now we just need a model that can actually code

  • Post Author
    chriscappuccio
    Posted March 6, 2025 at 3:17 am

    Better to run the Q8 model on an epyc pair with 768GB, you'll get the same performance

Leave a comment

In the Shadows of Innovation”

© 2025 HackTech.info. All Rights Reserved.

Sign Up to Our Newsletter

Be the first to know the latest updates

Whoops, you're not connected to Mailchimp. You need to enter a valid Mailchimp API key.