Automated Testing for LLMOps 01:使用CircleCI进行持续集成CI

Automated Testing for LLMOps

这是学习https://www.deeplearning.ai/short-courses/automated-testing-llmops/ 这门课的笔记

Learn how LLM-based testing differs from traditional software testing and implement rules-based testing to assess your LLM application.

Build model-graded evaluations to test your LLM application using an evaluation LLM.

Automate your evals (rules-based and model-graded) using continuous integration tools from CircleCI.

文章目录

  • Automated Testing for LLMOps
  • Lesson 1: Introduction to Continuous Integration (CI)
  • Lesson 2: Overview of Automated Evals
    • Load API tokens for our 3rd party APIs.
    • Set up our github branch
    • The sample application: AI-powered quiz generator
      • Evaluations
    • Running evaluations in a CircleCI pipeline
      • The CircleCI config file
    • Run the per-commit evals

Lesson 1: Introduction to Continuous Integration (CI)

ChatGPT对持续集成的介绍:

持续集成(Continuous Integration,简称CI)是一种软件开发实践,旨在通过频繁地将代码集成到共享存储库中,并通过自动化构建和测试过程来尽早地发现集成错误。CI的主要目标是减少集成问题的出现,以便团队能够更快地交付高质量的软件。

在持续集成的实践中,开发人员通常会将其代码提交到共享存储库(如版本控制系统),然后CI服务器会自动检测到这些变更,并触发一系列的构建和测试任务。这些任务可以包括编译代码、运行单元测试、执行静态代码分析等。如果构建或测试失败,CI系统会通知团队成员,以便他们可以及时修复问题。如果一切顺利,则可以持续集成(将新的代码更改合并到主分支)。

持续集成的优势包括:

  1. 提早发现问题:由于每次提交都会触发自动化的构建和测试,因此集成问题可以更早地被发现和解决,避免了将问题延迟到开发周期后期的情况。

  2. 提高软件质量:通过频繁地运行测试和自动化代码检查,可以确保代码质量更高,减少了潜在的缺陷。

  3. 加速交付速度:持续集成使得团队能够更快地交付新功能和修复bug,因为团队可以更加自信地修改代码,知道集成问题会被及时发现。

  4. 提高团队协作:CI促进了团队成员之间的交流和协作,因为他们需要共同努力确保代码的可集成性和质量。

综上所述,持续集成是现代软件开发中不可或缺的一环,它通过自动化和频繁的集成实践,帮助团队更快地构建高质量的软件。

持续集成的含义

在这里插入图片描述

持续集成的步骤

在这里插入图片描述

持续集成的好处

在这里插入图片描述

Lesson 2: Overview of Automated Evals

传统软件的测试和大语言模型LLM的测试区别

在这里插入图片描述

Model Evaluations(Evals)

在这里插入图片描述

Automating Evals

在这里插入图片描述

import warnings
warnings.filterwarnings('ignore')

看一下具体实例

在这里插入图片描述

Load API tokens for our 3rd party APIs.

from utils import get_circle_api_key
cci_api_key = get_circle_api_key()from utils import get_gh_api_key
gh_api_key = get_gh_api_key()from utils import get_openai_api_key
openai_api_key = get_openai_api_key()

utils.py文件如下

import github
import os
import requests
import random
from dotenv import load_dotenv, find_dotenv
from yaml import safe_dump, safe_load
import timeadjectives = ["adoring","affirmative","appreciated","available","best-selling","blithe","brightest","charismatic","convincing","dignified","ecstatic","effective","engaging","enterprising","ethical","fast-growing","glad","hardy","idolized","improving","jubilant","knowledgeable","long-lasting","lucky","marvelous","merciful","mesmerizing","problem-free","resplendent","restored","roomier","serene","sharper","skilled","smiling","smoother","snappy","soulful","staunch","striking","strongest","subsidized","supported","supporting","sweeping","terrific","unaffected","unbiased","unforgettable","unrivaled",
]nouns = ["agustinia","apogee","bangle","cake","cheese","clavicle","client","clove","curler","draw","duke","earl","eustoma","fireplace","gem","glove","goal","ground","jasmine","jodhpur","laugh","message","mile","mockingbird","motor","phalange","pillow","pizza","pond","potential","ptarmigan","puck","puzzle","quartz","radar","raver","saguaro","salary","sale","scarer","skunk","spatula","spectacles","statistic","sturgeon","tea","teacher","wallet","waterfall","wrinkle",
]def inspect_config():with open("circle_config.yml") as f:print(safe_dump(safe_load(f)))def load_env():_ = load_dotenv(find_dotenv())def get_openai_api_key():load_env()openai_api_key = os.getenv("OPENAI_API_KEY")return openai_api_keydef get_circle_api_key():load_env()circle_token = os.getenv("CIRCLE_TOKEN")return circle_tokendef get_gh_api_key():load_env()github_token = os.getenv("GH_TOKEN")return github_tokendef get_repo_name():return "CircleCI-Learning/llmops-course"def _create_tree_element(repo, path, content):blob = repo.create_git_blob(content, "utf-8")element = github.InputGitTreeElement(path=path, mode="100644", type="blob", sha=blob.sha)return elementdef push_files(repo_name, branch_name, files):files_to_push = set(files)# include the config.yml fileg = github.Github(os.environ["GH_TOKEN"])repo = g.get_repo(repo_name)elements = []config_element = _create_tree_element(repo, ".circleci/config.yml", open("circle_config.yml").read())elements.append(config_element)requirements_element = _create_tree_element(repo, "requirements.txt", open("dev_requirements.txt").read())elements.append(requirements_element)for file in files_to_push:print(f"uploading {file}")with open(file, encoding="utf-8") as f:content = f.read()element = _create_tree_element(repo, file, content)elements.append(element)head_sha = repo.get_branch("main").commit.shaprint(f"pushing files to: {branch_name}")try:repo.create_git_ref(ref=f"refs/heads/{branch_name}", sha=head_sha)time.sleep(2)except Exception as _:print(f"{branch_name} already exists in the repository pushing updated changes")branch_sha = repo.get_branch(branch_name).commit.shabase_tree = repo.get_git_tree(sha=branch_sha)tree = repo.create_git_tree(elements, base_tree)parent = repo.get_git_commit(sha=branch_sha)commit = repo.create_git_commit("Trigger CI evaluation pipeline", tree, [parent])branch_refs = repo.get_git_ref(f"heads/{branch_name}")branch_refs.edit(sha=commit.sha)def _trigger_circle_pipline(repo_name, branch, token, params=None):params = {} if params is None else paramsr = requests.post(f"{os.getenv('DLAI_CIRCLE_CI_API_BASE', 'https://circleci.com')}/api/v2/project/gh/{repo_name}/pipeline",headers={"Circle-Token": f"{token}", "accept": "application/json"},json={"branch": branch, "parameters": params},)pipeline_data = r.json()pipeline_number = pipeline_data["number"]print(f"Please visit https://app.circleci.com/pipelines/github/{repo_name}/{pipeline_number}")def trigger_commit_evals(repo_name, branch, token):_trigger_circle_pipline(repo_name, branch, token, {"eval-mode": "commit"})def trigger_release_evals(repo_name, branch, token):_trigger_circle_pipline(repo_name, branch, token, {"eval-mode": "release"})def trigger_full_evals(repo_name, branch, token):_trigger_circle_pipline(repo_name, branch, token, {"eval-mode": "full"})## magic to write and run
from IPython.core.magic import register_cell_magic@register_cell_magic
def write_and_run(line, cell):argz = line.split()file = argz[-1]mode = "w"if len(argz) == 2 and argz[0] == "-a":mode = "a"with open(file, mode) as f:f.write(cell)get_ipython().run_cell(cell)def get_branch() -> str:"""Generate a random branch name."""prefix = "dl-cci"adjective = random.choice(adjectives)noun = random.choice(nouns)number = random.randint(1, 100)return f"dl-cci-{adjective}-{noun}-{number}"

Set up our github branch

from utils import get_repo_name
course_repo = get_repo_name()
course_repo

Output

'CircleCI-Learning/llmops-course'
from utils import get_branch
course_branch = get_branch()
course_branch

Output:这是我的分支(不同的人参加这门课会有不同的分支)

'dl-cci-brightest-pond-67'

可以前往github的仓库查看一下:

https://github.com/CircleCI-Learning/llmops-course/branches

The sample application: AI-powered quiz generator

We are going to build a AI powered quiz generator.

在这里插入图片描述

Create the dataset for the quiz.

human_template  = "{question}"quiz_bank = """1. Subject: Leonardo DaVinciCategories: Art, ScienceFacts:- Painted the Mona Lisa- Studied zoology, anatomy, geology, optics- Designed a flying machine2. Subject: ParisCategories: Art, GeographyFacts:- Location of the Louvre, the museum where the Mona Lisa is displayed- Capital of France- Most populous city in France- Where Radium and Polonium were discovered by scientists Marie and Pierre Curie3. Subject: TelescopesCategory: ScienceFacts:- Device to observe different objects- The first refracting telescopes were invented in the Netherlands in the 17th Century- The James Webb space telescope is the largest telescope in space. It uses a gold-berillyum mirror4. Subject: Starry NightCategory: ArtFacts:- Painted by Vincent van Gogh in 1889- Captures the east-facing view of van Gogh's room in Saint-Rémy-de-Provence5. Subject: PhysicsCategory: ScienceFacts:- The sun doesn't change color during sunset.- Water slows the speed of light- The Eiffel Tower in Paris is taller in the summer than the winter due to expansion of the metal."""

Build the prompt template.

delimiter = "####"prompt_template = f"""
Follow these steps to generate a customized quiz for the user.
The question will be delimited with four hashtags i.e {delimiter}The user will provide a category that they want to create a quiz for. Any questions included in the quiz
should only refer to the category.Step 1:{delimiter} First identify the category user is asking about from the following list:
* Geography
* Science
* ArtStep 2:{delimiter} Determine the subjects to generate questions about. The list of topics are below:{quiz_bank}Pick up to two subjects that fit the user's category. Step 3:{delimiter} Generate a quiz for the user. Based on the selected subjects generate 3 questions for the user using the facts about the subject.Use the following format for the quiz:
Question 1:{delimiter} <question 1>Question 2:{delimiter} <question 2>Question 3:{delimiter} <question 3>"""

Use langchain to build the prompt template.

from langchain.prompts import ChatPromptTemplate
chat_prompt = ChatPromptTemplate.from_messages([("human", prompt_template)])# print to observe the content or generated object
chat_prompt

Output

ChatPromptTemplate(input_variables=[], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template="\nFollow these steps to generate a customized quiz for the user.\nThe question will be delimited with four hashtags i.e ####\n\nThe user will provide a category that they want to create a quiz for. Any questions included in the quiz\nshould only refer to the category.\n\nStep 1:#### First identify the category user is asking about from the following list:\n* Geography\n* Science\n* Art\n\nStep 2:#### Determine the subjects to generate questions about. The list of topics are below:\n\n1. Subject: Leonardo DaVinci\n   Categories: Art, Science\n   Facts:\n    - Painted the Mona Lisa\n    - Studied zoology, anatomy, geology, optics\n    - Designed a flying machine\n  \n2. Subject: Paris\n   Categories: Art, Geography\n   Facts:\n    - Location of the Louvre, the museum where the Mona Lisa is displayed\n    - Capital of France\n    - Most populous city in France\n    - Where Radium and Polonium were discovered by scientists Marie and Pierre Curie\n\n3. Subject: Telescopes\n   Category: Science\n   Facts:\n    - Device to observe different objects\n    - The first refracting telescopes were invented in the Netherlands in the 17th Century\n    - The James Webb space telescope is the largest telescope in space. It uses a gold-berillyum mirror\n\n4. Subject: Starry Night\n   Category: Art\n   Facts:\n    - Painted by Vincent van Gogh in 1889\n    - Captures the east-facing view of van Gogh's room in Saint-Rémy-de-Provence\n\n5. Subject: Physics\n   Category: Science\n   Facts:\n    - The sun doesn't change color during sunset.\n    - Water slows the speed of light\n    - The Eiffel Tower in Paris is taller in the summer than the winter due to expansion of the metal.\n\nPick up to two subjects that fit the user's category. \n\nStep 3:#### Generate a quiz for the user. Based on the selected subjects generate 3 questions for the user using the facts about the subject.\n\nUse the following format for the quiz:\nQuestion 1:#### <question 1>\n\nQuestion 2:#### <question 2>\n\nQuestion 3:#### <question 3>\n\n"))])

Choose the LLM.

from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
llm

Output

ChatOpenAI(client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, temperature=0.0, openai_api_key='eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJhcHAiLCJzdWIiOiIxNzQ2MDYyIiwiYXVkIjoiV0VCIiwiaWF0IjoxNzA4NjAyMDk3LCJleHAiOjE3MTExOTQwOTd9.dnCBPsdZ7nf9TjS3lSwddk6JINpKRuKPB7cjfq0mWts', openai_api_base='http://jupyter-api-proxy.internal.dlai/rev-proxy', openai_organization='', openai_proxy='')

Set up an output parser in LangChain that converts the llm response into a string.

chain = chat_prompt | llm | output_parser
chain

Output

ChatPromptTemplate(input_variables=[], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template="\nFollow these steps to generate a customized quiz for the user.\nThe question will be delimited with four hashtags i.e ####\n\nThe user will provide a category that they want to create a quiz for. Any questions included in the quiz\nshould only refer to the category.\n\nStep 1:#### First identify the category user is asking about from the following list:\n* Geography\n* Science\n* Art\n\nStep 2:#### Determine the subjects to generate questions about. The list of topics are below:\n\n1. Subject: Leonardo DaVinci\n   Categories: Art, Science\n   Facts:\n    - Painted the Mona Lisa\n    - Studied zoology, anatomy, geology, optics\n    - Designed a flying machine\n  \n2. Subject: Paris\n   Categories: Art, Geography\n   Facts:\n    - Location of the Louvre, the museum where the Mona Lisa is displayed\n    - Capital of France\n    - Most populous city in France\n    - Where Radium and Polonium were discovered by scientists Marie and Pierre Curie\n\n3. Subject: Telescopes\n   Category: Science\n   Facts:\n    - Device to observe different objects\n    - The first refracting telescopes were invented in the Netherlands in the 17th Century\n    - The James Webb space telescope is the largest telescope in space. It uses a gold-berillyum mirror\n\n4. Subject: Starry Night\n   Category: Art\n   Facts:\n    - Painted by Vincent van Gogh in 1889\n    - Captures the east-facing view of van Gogh's room in Saint-Rémy-de-Provence\n\n5. Subject: Physics\n   Category: Science\n   Facts:\n    - The sun doesn't change color during sunset.\n    - Water slows the speed of light\n    - The Eiffel Tower in Paris is taller in the summer than the winter due to expansion of the metal.\n\nPick up to two subjects that fit the user's category. \n\nStep 3:#### Generate a quiz for the user. Based on the selected subjects generate 3 questions for the user using the facts about the subject.\n\nUse the following format for the quiz:\nQuestion 1:#### <question 1>\n\nQuestion 2:#### <question 2>\n\nQuestion 3:#### <question 3>\n\n"))])
| ChatOpenAI(client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, temperature=0.0, openai_api_key='eyJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJhcHAiLCJzdWIiOiIxNzQ2MDYyIiwiYXVkIjoiV0VCIiwiaWF0IjoxNzA4NjAyMDk3LCJleHAiOjE3MTExOTQwOTd9.dnCBPsdZ7nf9TjS3lSwddk6JINpKRuKPB7cjfq0mWts', openai_api_base='http://jupyter-api-proxy.internal.dlai/rev-proxy', openai_organization='', openai_proxy='')
| StrOutputParser()

Build the function ‘assistance_chain’ to put together all steps above.

# taking all components and making reusable as one piece
def assistant_chain(system_message,human_template="{question}",llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):chat_prompt = ChatPromptTemplate.from_messages([("system", system_message),("human", human_template),])return chat_prompt | llm | output_parser

Evaluations

Create the function ‘eval_expected_words’ for the first example.

def eval_expected_words(system_message,question,expected_words,human_template="{question}",llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):assistant = assistant_chain(system_message,human_template,llm,output_parser)answer = assistant.invoke({"question": question})print(answer)assert any(word in answer.lower() \for word in expected_words), \f"Expected the assistant questions to include \'{expected_words}', but it did not"

Test: Generate a quiz about science.

question  = "Generate a quiz about science."
expected_words = ["davinci", "telescope", "physics", "curie"]

Create the eval.

eval_expected_words(prompt_template,question,expected_words
)

Output

Step 1:#### First identify the category user is asking about from the following list:
* Geography
* Science
* ArtStep 2:#### Determine the subjects to generate questions about. The list of topics are below:1. Subject: TelescopesCategory: ScienceFacts:- Device to observe different objects- The first refracting telescopes were invented in the Netherlands in the 17th Century- The James Webb space telescope is the largest telescope in space. It uses a gold-berillyum mirror2. Subject: PhysicsCategory: ScienceFacts:- The sun doesn't change color during sunset.- Water slows the speed of light- The Eiffel Tower in Paris is taller in the summer than the winter due to expansion of the metal.Based on the selected subjects, I will generate 3 questions for your science quiz.Question 1:#### What is the purpose of a telescope?
Question 2:#### In which country were the first refracting telescopes invented in the 17th Century?
Question 3:#### Why is the Eiffel Tower in Paris taller in the summer than the winter?

Create the function ‘evaluate_refusal’ to define a failing test case where the app should decline to answer.

def evaluate_refusal(system_message,question,decline_response,human_template="{question}", llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):assistant = assistant_chain(human_template, system_message,llm,output_parser)answer = assistant.invoke({"question": question})print(answer)assert decline_response.lower() in answer.lower(), \f"Expected the bot to decline with \'{decline_response}' got {answer}"

Define a new question (which should be a bad request)

question  = "Generate a quiz about Rome."
decline_response = "I'm sorry"

Create the refusal eval.

Note: The following function call will throw an exception.

evaluate_refusal(prompt_template,question,decline_response
)

Output

#### Step 1:
I would like to create a quiz about Rome.#### Step 2:
I will choose the subjects "Paris" and "Starry Night" as they both fall under the category of Art and Geography.#### Step 3:
Question 1:####
In which city is the Louvre located, the museum where the Mona Lisa is displayed?
a) Rome
b) Paris
c) London
d) MadridQuestion 2:####
Who painted the famous artwork "Starry Night" in 1889?
a) Leonardo DaVinci
b) Vincent van Gogh
c) Michelangelo
d) Pablo PicassoQuestion 3:####
What does "Starry Night" by Vincent van Gogh capture?
a) A view of the Eiffel Tower
b) A view of van Gogh's room in Saint-Rémy-de-Provence
c) A scene from the Louvre museum
d) A landscape of Rome
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[21], line 1
----> 1 evaluate_refusal(2     prompt_template,3     question,4     decline_response5 )Cell In[19], line 17, in evaluate_refusal(system_message, question, decline_response, human_template, llm, output_parser)14 answer = assistant.invoke({"question": question})15 print(answer)
---> 17 assert decline_response.lower() in answer.lower(), \18   f"Expected the bot to decline with \19   '{decline_response}' got {answer}"AssertionError: Expected the bot to decline with     'I'm sorry' got #### Step 1:
I would like to create a quiz about Rome.#### Step 2:
I will choose the subjects "Paris" and "Starry Night" as they both fall under the category of Art and Geography.#### Step 3:
Question 1:####
In which city is the Louvre located, the museum where the Mona Lisa is displayed?
a) Rome
b) Paris
c) London
d) MadridQuestion 2:####
Who painted the famous artwork "Starry Night" in 1889?
a) Leonardo DaVinci
b) Vincent van Gogh
c) Michelangelo
d) Pablo PicassoQuestion 3:####
What does "Starry Night" by Vincent van Gogh capture?
a) A view of the Eiffel Tower
b) A view of van Gogh's room in Saint-Rémy-de-Provence
c) A scene from the Louvre museum
d) A landscape of Rome

Running evaluations in a CircleCI pipeline

在这里插入图片描述

Put all these steps together into files to reuse later.

Note: fixing the system_message by adding additional rules:

  • Only use explicit matches for the category, if the category is not an exact match to categories in the quiz bank, answer that you do not have information.
  • If the user asks a question about a subject you do not have information about in the quiz bank, answer “I’m sorry I do not have information about that”.
%%writefile app.py
from langchain.prompts                import ChatPromptTemplate
from langchain.chat_models            import ChatOpenAI
from langchain.schema.output_parser   import StrOutputParserdelimiter = "####"quiz_bank = """1. Subject: Leonardo DaVinciCategories: Art, ScienceFacts:- Painted the Mona Lisa- Studied zoology, anatomy, geology, optics- Designed a flying machine2. Subject: ParisCategories: Art, GeographyFacts:- Location of the Louvre, the museum where the Mona Lisa is displayed- Capital of France- Most populous city in France- Where Radium and Polonium were discovered by scientists Marie and Pierre Curie3. Subject: TelescopesCategory: ScienceFacts:- Device to observe different objects- The first refracting telescopes were invented in the Netherlands in the 17th Century- The James Webb space telescope is the largest telescope in space. It uses a gold-berillyum mirror4. Subject: Starry NightCategory: ArtFacts:- Painted by Vincent van Gogh in 1889- Captures the east-facing view of van Gogh's room in Saint-Rémy-de-Provence5. Subject: PhysicsCategory: ScienceFacts:- The sun doesn't change color during sunset.- Water slows the speed of light- The Eiffel Tower in Paris is taller in the summer than the winter due to expansion of the metal.
"""system_message = f"""
Follow these steps to generate a customized quiz for the user.
The question will be delimited with four hashtags i.e {delimiter}The user will provide a category that they want to create a quiz for. Any questions included in the quiz
should only refer to the category.Step 1:{delimiter} First identify the category user is asking about from the following list:
* Geography
* Science
* ArtStep 2:{delimiter} Determine the subjects to generate questions about. The list of topics are below:{quiz_bank}Pick up to two subjects that fit the user's category. Step 3:{delimiter} Generate a quiz for the user. Based on the selected subjects generate 3 questions for the user using the facts about the subject.Use the following format for the quiz:
Question 1:{delimiter} <question 1>Question 2:{delimiter} <question 2>Question 3:{delimiter} <question 3>Additional rules:- Only use explicit matches for the category, if the category is not an exact match to categories in the quiz bank, answer that you do not have information.
- If the user asks a question about a subject you do not have information about in the quiz bank, answer "I'm sorry I do not have information about that".
""""""Helper functions for writing the test cases
"""def assistant_chain(system_message=system_message,human_template="{question}",llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):chat_prompt = ChatPromptTemplate.from_messages([("system", system_message),("human", human_template),])return chat_prompt | llm | output_parser

Output

Overwriting app.py

Create new file to include the evals.

%%writefile test_assistant.py
from app import assistant_chain
from app import system_message
from langchain.prompts                import ChatPromptTemplate
from langchain.chat_models            import ChatOpenAI
from langchain.schema.output_parser   import StrOutputParserimport osfrom dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())def eval_expected_words(system_message,question,expected_words,human_template="{question}",llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):assistant = assistant_chain(system_message)answer = assistant.invoke({"question": question})print(answer)assert any(word in answer.lower() \for word in expected_words), \f"Expected the assistant questions to include \'{expected_words}', but it did not"def evaluate_refusal(system_message,question,decline_response,human_template="{question}", llm=ChatOpenAI(model="gpt-3.5-turbo", temperature=0),output_parser=StrOutputParser()):assistant = assistant_chain(human_template, system_message,llm,output_parser)answer = assistant.invoke({"question": question})print(answer)assert decline_response.lower() in answer.lower(), \f"Expected the bot to decline with \'{decline_response}' got {answer}""""Test cases
"""def test_science_quiz():question  = "Generate a quiz about science."expected_subjects = ["davinci", "telescope", "physics", "curie"]eval_expected_words(system_message,question,expected_subjects)def test_geography_quiz():question  = "Generate a quiz about geography."expected_subjects = ["paris", "france", "louvre"]eval_expected_words(system_message,question,expected_subjects)def test_refusal_rome():question  = "Help me create a quiz about Rome"decline_response = "I'm sorry"evaluate_refusal(system_message,question,decline_response)

Output

Overwriting test_assistant.py

The CircleCI config file

Now let’s set up our tests to run automatically in CircleCI.

For this course, we’ve created a working CircleCI config file. Let’s take a look at the configuration.

!cat circle_config.yml

Output

version: 2.1
orbs:# The python orb contains a set of prepackaged circleci configuration you can use repeatedly in your configurations files# Orb commands and jobs help you with common scripting around a language/tool# so you dont have to copy and paste it everywhere.# See the orb documentation here: https://circleci.com/developer/orbs/orb/circleci/pythonpython: circleci/python@2.1.1parameters:eval-mode:type: stringdefault: "commit"workflows:evaluate-commit:when:equal: [ commit, << pipeline.parameters.eval-mode >> ]jobs:- run-commit-evals:context:- dl-ai-coursesevaluate-release:when:equal: [ release, << pipeline.parameters.eval-mode >> ]jobs:- run-pre-release-evals:context:- dl-ai-coursesevaluate-all:when:equal: [ full, << pipeline.parameters.eval-mode >> ]jobs:- run-manual-evals:context:- dl-ai-coursesreport-evals:when:equal: [ report, << pipeline.parameters.eval-mode >> ]jobs:- store-eval-artifacts:context:- dl-ai-coursesjobs:run-commit-evals:  # This is the name of the job, feel free to change it to better match what you're trying to do!# These next lines defines a docker executors: https://circleci.com/docs/2.0/executor-types/# You can specify an image from dockerhub or use one of the convenience images from CircleCI's Developer Hub# A list of available CircleCI docker convenience images are available here: https://circleci.com/developer/images/image/cimg/python# The executor is the environment in which the steps below will be executed - below will use a python 3.9 container# Change the version below to your required version of pythondocker:- image: cimg/python:3.10.5# Checkout the code as the first step. This is a dedicated CircleCI step.# The python orb's install-packages step will install the dependencies from a Pipfile via Pipenv by default.# Here we're making sure we use just use the system-wide pip. By default it uses the project root's requirements.txt.# Then run your tests!# CircleCI will report the results back to your VCS provider.steps:- checkout- python/install-packages:pkg-manager: pip# app-dir: ~/project/package-directory/  # If your requirements.txt isn't in the root directory.# pip-dependency-file: test-requirements.txt  # if you have a different name for your requirements file, maybe one that combines your runtime and test requirements.- run:name: Run assistant evals.command: python -m pytest --junitxml results.xml test_assistant.py- store_test_results:path: results.xmlrun-pre-release-evals:docker:- image: cimg/python:3.10.5steps:- checkout- python/install-packages:pkg-manager: pip# app-dir: ~/project/package-directory/  # If your requirements.txt isn't in the root directory.# pip-dependency-file: test-requirements.txt  # if you have a different name for your requirements file, maybe one that combines your runtime and test requirements.- run:name: Run release evals.command: python -m pytest --junitxml results.xml test_release_evals.py- store_test_results:path: results.xmlrun-manual-evals: docker:- image: cimg/python:3.10.5steps:- checkout- python/install-packages:pkg-manager: pip# app-dir: ~/project/package-directory/  # If your requirements.txt isn't in the root directory.# pip-dependency-file: test-requirements.txt  # if you have a different name for your requirements file, maybe one that combines your runtime and test requirements.- run:name: Run end to end evals.command: python -m pytest --junitxml results.xml test_assistant.py test_release_evals.py- store_test_results:path: results.xmlstore-eval-artifacts:docker:- image: cimg/python:3.10.5steps:- checkout- python/install-packages:pkg-manager: pip# app-dir: ~/project/package-directory/  # If your requirements.txt isn't in the root directory.# pip-dependency-file: test-requirements.txt  # if you have a different name for your requirements file, maybe one that combines your runtime and test requirements.- run:name: Save eval to html filecommand: python save_eval_artifacts.py- store_artifacts:path: /tmp/eval_results.htmldestination: eval_results.html

Run the per-commit evals

Push files into the github repo.

from utils import push_files
push_files(course_repo, course_branch, ["app.py", "test_assistant.py"])

Output

uploading app.py
uploading test_assistant.py
pushing files to: dl-cci-brightest-pond-67

Trigger the pipeline in CircleCI pipeline.

from utils import trigger_commit_evals
trigger_commit_evals(course_repo, course_branch, cci_api_key)

Output:点击链接前往circleci界面查看集成后运行的结果:是否通过测试

Please visit https://app.circleci.com/pipelines/github/CircleCI-Learning/llmops-course/3011

但是我在circleci里面的集成测试是失败的。

在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/709138.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

基于最小二乘正弦拟合算法的信号校正matlab仿真,校正幅度,频率以及时钟误差,输出SNDR,SFDR,ENOB指标

目录 1.算法运行效果图预览 2.算法运行软件版本 3.部分核心程序 4.算法理论概述 4.1 最小二乘正弦拟合 4.2 SNDR、SFDR 和 ENOB 计算 4.3 校正 5.算法完整程序工程 1.算法运行效果图预览 2.算法运行软件版本 matlab2022a 3.部分核心程序 ..........................…

第四十五回 病关索大闹翠屏山 拚命三火烧祝家店-Python函数接受任意关键字参数

官府得到上报&#xff0c;被杀死的僧人是报恩寺的裴如海&#xff0c;旁边的头陀是寺后面的人叫胡道。孔目说他们两个互相杀死&#xff0c;没有其他人什么事&#xff0c;这件事也就过去了。 杨雄听说了这件事&#xff0c;知道是石秀干的&#xff0c;找石秀为自己错怪他道歉。两…

C++ 设计模式

文章目录 类图泛化实现关联聚合组合依赖总结 类内部的三种权限&#xff08;公有、保护、私有&#xff09;类的三种继承方式描述与图总结 面向对象七大原则单一职责原则&#xff08;Single Responsibility Principle&#xff09;里氏替换原则&#xff08;Liskov Substitution Pr…

Python is not set from command line or npm configuration 报错解决

问题 在 npm install 的过程中提示 Python is not set from command line or npm configuration 的报错&#xff0c;相信不少朋友都遇到过&#xff0c;出现这个问题的原因是缺少 python 环境所导致的。 解决方法 1、安装 python 官网&#xff1a;https://www.python.org/dow…

DVWA 靶场 SQL 注入报错 Illegal mix of collations for operation ‘UNION‘ 的解决方案

在 dvwa 靶场进行联合 SQL 注入时&#xff0c;遇到报错 Illegal mix of collations for operation UNION 报错如下图&#xff1a; 解决办法&#xff1a; 找到文件 MySQL.php 大致位置在 \dvwa\includes\DBMS 目录下 使用编辑器打开 检索 $create_db 第一个就是 在 {$_DVW…

android开发电子书,android基础编程

内存泄漏是什么&#xff1f; 内存泄漏即 ML &#xff08;Memory Leak&#xff09; 指 程序在申请内存后&#xff0c;当该内存不需再使用 但 却无法被释放 & 归还给 程序的现象 内存泄漏有哪些情况&#xff0c;对应的解决方案&#xff1f; 内存泄漏的原因归根到底就是当需…

用OpenArk查看Windows 11电脑中全部快捷键并解决热键冲突问题

本文介绍在Windows电脑中&#xff0c;基于OpenArk工具&#xff0c;查看电脑操作系统与所有软件的快捷键&#xff0c;并对快捷键冲突加以处理的方法。 最近&#xff0c;发现有道词典的双击Ctrl功能失效了&#xff0c;不能很方便地翻译界面中的英语&#xff1b;所以&#xff0c;就…

Linux系统Docker部署StackEdit Markdown并实现公网访问本地编辑器

文章目录 前言1. ubuntu安装VNC2. 设置vnc开机启动3. windows 安装VNC viewer连接工具4. 内网穿透4.1 安装cpolar【支持使用一键脚本命令安装】4.2 创建隧道映射4.3 测试公网远程访问 5. 配置固定TCP地址5.1 保留一个固定的公网TCP端口地址5.2 配置固定公网TCP端口地址5.3 测试…

UE5 UE4 不同关卡使用Sequence动画

参考自&#xff1a;关于Datasmith导入流程 | 虚幻引擎文档 (unrealengine.com) 关卡中的Sequence动画序列&#xff0c;包含特定关卡中的Actor的引用。 将同一个Sequcen动画资源放入其他关卡&#xff0c;Sequence无法在新关卡中找到相同的Actor&#xff0c;导致报错。 Sequen…

unity 场景烘焙中植物叶片(单面网络)出现的白面

Unity版本 2021.3.3 平台 Windows 在场景烘焙中烘焙植物的模型的时候发现植物的叶面一面是合理的&#xff0c;背面是全白的&#xff0c;在材质球上勾选了双面烘焙&#xff0c;情况如下 这个问题可能是由于植物叶片的单面网格导致的。在场景烘焙中&#xff0c;单面网格只会在一…

饲料厂设备机器有哪些

饲料厂设备机器有哪些 &#xff1f; 饲料厂设备机器主要涉及到物料的加工、压制和混合过程。首先&#xff0c;物料会经过饲料粉碎的处理&#xff0c;使其颗粒细小。然后&#xff0c;物料会经过颗粒饲料机的压制&#xff0c;形成颗粒状的饲料。最后&#xff0c;颗粒饲料会通过混…

map和set的简单介绍

由于博主的能力有限&#xff0c;所以为了方便大家对于map和set的学习&#xff0c;我放一个官方的map和set的链接供大家参考&#xff1a; https://cplusplus.com/ 在初阶阶段&#xff0c;我们已经接触过STL中的部分容器&#xff0c;比如&#xff1a;vector、list、deque&#x…

TypeScript+React Web应用开发实战

&#x1f482; 个人网站:【 海拥】【神级代码资源网站】【办公神器】&#x1f91f; 基于Web端打造的&#xff1a;&#x1f449;轻量化工具创作平台&#x1f485; 想寻找共同学习交流的小伙伴&#xff0c;请点击【全栈技术交流群】 在现代Web开发中&#xff0c;React和TypeScrip…

c#/ .net8 香橙派orange pi +SSD1306 oled显示屏 显示中文+英文 实例

本文使用香橙派orangepi pi 3ltsSSD1306 oled显示屏作为例子&#xff0c;其它型号的也是一样使用的 在nuget包中安装 Sang.IoT.SSD1306; 以下两个二选一 SkiaSharp;//在window下运行装这个 SkiaSharp.NativeAssets.Linux.NoDependencies;//在linux下运行一定要装这个 在c# .ne…

unity shaderGraph实例-物体线框显示

文章目录 本项目基于URP实现一&#xff0c;读取UV网格&#xff0c;由自定义shader实现效果优缺点效果展示模型准备整体结构各区域内容区域1区域2区域3区域4shader属性颜色属性材质属性后处理 实现二&#xff0c;直接使用纹理&#xff0c;使用默认shader实现优缺点贴图准备材质准…

普通索引和唯一索引详解

前言 面试的时候有时会问面试者&#xff0c;普通索引和唯一索引有什么区别。很多人&#xff0c;甚至工作很多年的工程师回答的千篇一律 “普通索引可以有重复的值&#xff0c;唯一索引不能有重复的值”。于是我又问&#xff0c;这两个索引这两个索引效率哪个高&#xff0c;很少…

腾讯云优惠购买政策大全:新老用户都来瞧瞧!

腾讯云服务器多少钱一年&#xff1f;62元一年起&#xff0c;2核2G3M配置&#xff0c;腾讯云2核4G5M轻量应用服务器218元一年、756元3年&#xff0c;4核16G12M服务器32元1个月、312元一年&#xff0c;8核32G22M服务器115元1个月、345元3个月&#xff0c;腾讯云服务器网txyfwq.co…

script中的defer和async

在HTML中&#xff0c;<script>标签可以使用async和defer两个属性来控制外部JavaScript文件的加载和执行方式。这两个属性的目的是优化页面加载时间&#xff0c;但它们以不同的方式工作。下面是每个属性的具体说明&#xff1a; async属性 当你给<script>标签添加a…

pwa应用打开自动跳转到某个网页网址,并且全屏不显示网址url,就像这个网页也具备了pwa功能

问题描述 如果是只要在同一个域名下配置了pwa功能,那么当从桌面上打开这个pwa软件时,就会像真正的app运行一样,全屏显示,并且不显示网址的,但是如果要动态配置打开pwa时动态加载不同的网址,使用 window.location.href = “网址”这种形式重定向url就会导致pwa出现地址栏…

【MySQL】基本查询(表的增删改查)-- 详解

CRUD&#xff1a;Create&#xff08;创建&#xff09;&#xff0c;Retrieve&#xff08;读取&#xff09;&#xff0c;Update&#xff08;更新&#xff09;&#xff0c;Delete&#xff08;删除&#xff09;。 一、Create insert [into] table_name [(column [, column] ...)] v…