nvidia-smi reports this model is using 15475MiB after changing the max batch size from 32 to 8 (see link in above post)
As others have stated someone may have injected unknown code in to the pickled checkpoint, so I recommend running this in docker. I use this command to run the docker image after getting nvidia docker stuff configured.
docker run --runtime=nvidia -it --mount type=bind,source=/MY_LLAMA_SOURCE_PATH,target=/llama --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04
Then install the necessary dependencies in that container (obv could make a dockerfile), stick your model as well as tokenizer files (from the root dir in download) in to some directory (here models/LLaMA-7B) and run this:
torchrun example.py --ckpt_dir ./models/LLaMA-7B --tokenizer_path ./models/LLaMA-7B/tokenizer.model
enjoy! the 7B parameter model is not amazing from my initial testing. I am very curious if larger models can be run on 1 GPU by sequentially loading checkpoints. I don't know how all this stuff is organized.
Example output below.
Prompt:
Please respond to this question: As a large language model, what are three things that you find most important?
Output:
To get students to analyze what they are doing in their learning
To get students to analyze what they are doing in their learning so that they can find the best practices
To get students to analyze what they are doing in their learning so that they can find the best practices and then to make changes to those practices so that they can improve their learning
To get students to analyze what they are doing in their learning so that they can find the best practices and then to make changes to those practices so that they can improve their learning
I don't think this question really made much sense because the sentence in the question is incomplete
To get students to analyze what they are doing in their learning so that they can find the best practices and then to make changes to those practices so that they can improve their learning
I don't think this question really made much sense because the sentence in the question is incomplete.
I think this question is in fact a good question, even though the initial sentence is incomplete, and I do think it makes sense.
I think this question is in fact a good question, even though the initial sentence is incomplete, and I do think it makes sense. (I think it is a good question but I am not sure it makes sense).