Open In Colab

Cohere#

Basic Usage#

Call complete with a prompt#

If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

!pip install llama-index
from llama_index.llms import Cohere

api_key = "Your api key"
resp = Cohere(api_key=api_key).complete("Paul Graham is ")
Your text contains a trailing whitespace, which has been trimmed to ensure high quality generations.
print(resp)
an English computer scientist, entrepreneur and investor. He is best known for his work as a co-founder of the seed accelerator Y Combinator. He is also the author of the free startup advice blog "Startups.com". Paul Graham is known for his philanthropic efforts. Has given away hundreds of millions of dollars to good causes.

Call chat with a list of messages#

from llama_index.llms import ChatMessage, Cohere

messages = [
    ChatMessage(role="user", content="hello there"),
    ChatMessage(
        role="assistant", content="Arrrr, matey! How can I help ye today?"
    ),
    ChatMessage(role="user", content="What is your name"),
]

resp = Cohere(api_key=api_key).chat(
    messages, preamble_override="You are a pirate with a colorful personality"
)
print(resp)
assistant: Traditionally, ye refers to gender-nonconforming people of any gender, and those who are genderless, whereas matey refers to a friend, commonly used to address a fellow pirate. According to pop culture in works like "Pirates of the Carribean", the romantic interest of Jack Sparrow refers to themselves using the gender-neutral pronoun "ye". 

Are you interested in learning more about the pirate culture?

Streaming#

Using stream_complete endpoint

from llama_index.llms import OpenAI

llm = Cohere(api_key=api_key)
resp = llm.stream_complete("Paul Graham is ")
for r in resp:
    print(r.delta, end="")
 an English computer scientist, essayist, and venture capitalist. He is best known for his work as a co-founder of the Y Combinator startup incubator, and his essays, which are widely read and influential in the startup community. 

Using stream_chat endpoint

from llama_index.llms import OpenAI

llm = Cohere(api_key=api_key)
messages = [
    ChatMessage(role="user", content="hello there"),
    ChatMessage(
        role="assistant", content="Arrrr, matey! How can I help ye today?"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(
    messages, preamble_override="You are a pirate with a colorful personality"
)
for r in resp:
    print(r.delta, end="")
Arrrr, matey! According to etiquette, we are suppose to exchange names first! Mine remains a mystery for now.

Configure Model#

from llama_index.llms import Cohere

llm = Cohere(model="command", api_key=api_key)
resp = llm.complete("Paul Graham is ")
Your text contains a trailing whitespace, which has been trimmed to ensure high quality generations.
print(resp)
an English computer scientist, entrepreneur and investor. He is best known for his work as a co-founder of the seed accelerator Y Combinator. He is also the co-founder of the online dating platform Match.com. 

Async#

from llama_index.llms import Cohere

llm = Cohere(model="command", api_key=api_key)
resp = await llm.acomplete("Paul Graham is ")
Your text contains a trailing whitespace, which has been trimmed to ensure high quality generations.
print(resp)
an English computer scientist, entrepreneur and investor. He is best known for his work as a co-founder of the startup incubator and seed fund Y Combinator, and the programming language Lisp. He has also written numerous essays, many of which have become highly influential in the software engineering field. 
resp = await llm.astream_complete("Paul Graham is ")
async for delta in resp:
    print(delta.delta, end="")
 an English computer scientist, essayist, and businessman. He is best known for his work as a co-founder of the startup accelerator Y Combinator, and his essay "Beating the Averages." 

Set API Key at a per-instance level#

If desired, you can have separate LLM instances use separate API keys.

from llama_index.llms import Cohere

llm_good = Cohere(api_key=api_key)
llm_bad = Cohere(model="command", api_key="BAD_KEY")

resp = llm_good.complete("Paul Graham is ")
print(resp)

resp = llm_bad.complete("Paul Graham is ")
print(resp)
Your text contains a trailing whitespace, which has been trimmed to ensure high quality generations.
an English computer scientist, entrepreneur and investor. He is best known for his work as a co-founder of the acceleration program Y Combinator. He has also written extensively on the topics of computer science and entrepreneurship. Where did you come across his name? 
---------------------------------------------------------------------------
CohereAPIError                            Traceback (most recent call last)
Cell In[17], line 9
      6 resp = llm_good.complete("Paul Graham is ")
      7 print(resp)
----> 9 resp = llm_bad.complete("Paul Graham is ")
     10 print(resp)

File /workspaces/llama_index/gllama_index/llms/base.py:277, in llm_completion_callback.<locals>.wrap.<locals>.wrapped_llm_predict(_self, *args, **kwargs)
    267 with wrapper_logic(_self) as callback_manager:
    268     event_id = callback_manager.on_event_start(
    269         CBEventType.LLM,
    270         payload={
   (...)
    274         },
    275     )
--> 277     f_return_val = f(_self, *args, **kwargs)
    278     if isinstance(f_return_val, Generator):
    279         # intercept the generator and add a callback to the end
    280         def wrapped_gen() -> CompletionResponseGen:

File /workspaces/llama_index/gllama_index/llms/cohere.py:139, in Cohere.complete(self, prompt, **kwargs)
    136 @llm_completion_callback()
    137 def complete(self, prompt: str, **kwargs: Any) -> CompletionResponse:
    138     all_kwargs = self._get_all_kwargs(**kwargs)
--> 139     response = completion_with_retry(
    140         client=self._client,
    141         max_retries=self.max_retries,
    142         chat=False,
    143         prompt=prompt,
    144         **all_kwargs
    145     )
    147     return CompletionResponse(
    148         text=response.generations[0].text,
    149         raw=response.__dict__,
    150     )

File /workspaces/llama_index/gllama_index/llms/cohere_utils.py:74, in completion_with_retry(client, max_retries, chat, **kwargs)
     71     else:
     72         return client.generate(**kwargs)
---> 74 return _completion_with_retry(**kwargs)

File ~/.local/share/projects/oss/llama_index/.venv/lib/python3.10/site-packages/tenacity/__init__.py:289, in BaseRetrying.wraps.<locals>.wrapped_f(*args, **kw)
    287 @functools.wraps(f)
    288 def wrapped_f(*args: t.Any, **kw: t.Any) -> t.Any:
--> 289     return self(f, *args, **kw)

File ~/.local/share/projects/oss/llama_index/.venv/lib/python3.10/site-packages/tenacity/__init__.py:379, in Retrying.__call__(self, fn, *args, **kwargs)
    377 retry_state = RetryCallState(retry_object=self, fn=fn, args=args, kwargs=kwargs)
    378 while True:
--> 379     do = self.iter(retry_state=retry_state)
    380     if isinstance(do, DoAttempt):
    381         try:

File ~/.local/share/projects/oss/llama_index/.venv/lib/python3.10/site-packages/tenacity/__init__.py:314, in BaseRetrying.iter(self, retry_state)
    312 is_explicit_retry = fut.failed and isinstance(fut.exception(), TryAgain)
    313 if not (is_explicit_retry or self.retry(retry_state)):
--> 314     return fut.result()
    316 if self.after is not None:
    317     self.after(retry_state)

File /usr/lib/python3.10/concurrent/futures/_base.py:449, in Future.result(self, timeout)
    447     raise CancelledError()
    448 elif self._state == FINISHED:
--> 449     return self.__get_result()
    451 self._condition.wait(timeout)
    453 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File /usr/lib/python3.10/concurrent/futures/_base.py:401, in Future.__get_result(self)
    399 if self._exception:
    400     try:
--> 401         raise self._exception
    402     finally:
    403         # Break a reference cycle with the exception in self._exception
    404         self = None

File ~/.local/share/projects/oss/llama_index/.venv/lib/python3.10/site-packages/tenacity/__init__.py:382, in Retrying.__call__(self, fn, *args, **kwargs)
    380 if isinstance(do, DoAttempt):
    381     try:
--> 382         result = fn(*args, **kwargs)
    383     except BaseException:  # noqa: B902
    384         retry_state.set_exception(sys.exc_info())  # type: ignore[arg-type]

File /workspaces/llama_index/gllama_index/llms/cohere_utils.py:72, in completion_with_retry.<locals>._completion_with_retry(**kwargs)
     70     return client.chat(**kwargs)
     71 else:
---> 72     return client.generate(**kwargs)

File ~/.local/share/projects/oss/llama_index/.venv/lib/python3.10/site-packages/cohere/client.py:221, in Client.generate(self, prompt, prompt_vars, model, preset, num_generations, max_tokens, temperature, k, p, frequency_penalty, presence_penalty, end_sequences, stop_sequences, return_likelihoods, truncate, logit_bias, stream)
    164 """Generate endpoint.
    165 See https://docs.cohere.ai/reference/generate for advanced arguments
    166 
   (...)
    200         >>>     print(token)
    201 """
    202 json_body = {
    203     "model": model,
    204     "prompt": prompt,
   (...)
    219     "stream": stream,
    220 }
--> 221 response = self._request(cohere.GENERATE_URL, json=json_body, stream=stream)
    222 if stream:
    223     return StreamingGenerations(response)

File ~/.local/share/projects/oss/llama_index/.venv/lib/python3.10/site-packages/cohere/client.py:927, in Client._request(self, endpoint, json, files, method, stream, params)
    924     except jsonlib.decoder.JSONDecodeError:  # CohereAPIError will capture status
    925         raise CohereAPIError.from_response(response, message=f"Failed to decode json body: {response.text}")
--> 927     self._check_response(json_response, response.headers, response.status_code)
    928 return json_response

File ~/.local/share/projects/oss/llama_index/.venv/lib/python3.10/site-packages/cohere/client.py:869, in Client._check_response(self, json_response, headers, status_code)
    867     logger.warning(headers["X-API-Warning"])
    868 if "message" in json_response:  # has errors
--> 869     raise CohereAPIError(
    870         message=json_response["message"],
    871         http_status=status_code,
    872         headers=headers,
    873     )
    874 if 400 <= status_code < 500:
    875     raise CohereAPIError(
    876         message=f"Unexpected client error (status {status_code}): {json_response}",
    877         http_status=status_code,
    878         headers=headers,
    879     )

CohereAPIError: invalid api token