RunGPT#

RunGPT is an open-source cloud-native large-scale multimodal models (LMMs) serving framework. It is designed to simplify the deployment and management of large language models, on a distributed cluster of GPUs. RunGPT aim to make it a one-stop solution for a centralized and accessible place to gather techniques for optimizing large-scale multimodal models and make them easy to use for everyone. In RunGPT, we have supported a number of LLMs such as LLaMA, Pythia, StableLM, Vicuna, MOSS, and Large Multi-modal Model(LMMs) like MiniGPT-4 and OpenFlamingo additionally.

Setup#

If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

!pip install llama-index

You need to install rungpt package in your python environment with pip install

!pip install rungpt

After installing successfully, models supported by RunGPT can be deployed with an one-line command. This option will download target language model from open source platform and deploy it as a service at a localhost port, which can be accessed by http or grpc requests. I suppose you not run this command in jupyter book, but in command line instead.

!rungpt serve decapoda-research/llama-7b-hf --precision fp16 --device_map balanced

Basic Usage#

Call `complete` with a prompt#

from llama_index.llms.rungpt import RunGptLLM

llm = RunGptLLM()
promot = "What public transportation might be available in a city?"
response = llm.complete(promot)

print(response)

I don't want to go to work, so what should I do?
I have a job interview on Monday. What can I wear that will make me look professional but not too stuffy or boring?

Call `chat` with a list of messages#

from llama_index.llms import ChatMessage, MessageRole
from llama_index.llms.rungpt import RunGptLLM

messages = [
    ChatMessage(
        role=MessageRole.USER,
        content="Now, I want you to do some math for me.",
    ),
    ChatMessage(
        role=MessageRole.ASSISTANT, content="Sure, I would like to help you."
    ),
    ChatMessage(
        role=MessageRole.USER,
        content="How many points determine a straight line?",
    ),
]
llm = RunGptLLM()
response = llm.chat(messages=messages, temperature=0.8, max_tokens=15)

print(response)

Streaming#

Using stream_complete endpoint

promot = "What public transportation might be available in a city?"
response = RunGptLLM().stream_complete(promot)
for item in response:
    print(item.text)

Using stream_chat endpoint

from llama_index.llms.rungpt import RunGptLLM

messages = [
    ChatMessage(
        role=MessageRole.USER,
        content="Now, I want you to do some math for me.",
    ),
    ChatMessage(
        role=MessageRole.ASSISTANT, content="Sure, I would like to help you."
    ),
    ChatMessage(
        role=MessageRole.USER,
        content="How many points determine a straight line?",
    ),
]
response = RunGptLLM().stream_chat(messages=messages)

for item in response:
    print(item.message)