基于Google Vertex AI 和 Llama 2进行RLHF训练和评估

Reinforcement Learning from Human Feedback

基于Google Vertex AI 和 Llama 2进行RLHF训练和评估

课程地址:https://www.deeplearning.ai/short-courses/reinforcement-learning-from-human-feedback/

Topic:

Get a conceptual understanding of Reinforcement Learning from Human Feedback (RLHF), as well as the datasets needed for this technique

Fine-tune the Llama 2 model using RLHF with the open source Google Cloud Pipeline Components Library

Evaluate tuned model performance against the base model with evaluation methods

Instructor: Nikita Namjoshi

文章目录

  • Reinforcement Learning from Human Feedback
    • What you’ll learn in this course
  • Lesson1: How does RLHF work
  • Lesson 2: Datasets For Reinforcement Learning Training
      • Loading and exploring the datasets
        • Preference dataset
        • Prompt dataset
  • Lesson 3: Tune an LLM with RLHF
        • Project environment setup
      • Compile the pipeline
    • Define the Vertex AI pipeline job
      • Define the location of the training and evaluation data
      • Choose the foundation model to be tuned
      • Calculate the number of reward model training steps
      • Calculate the number of reinforcement learning training steps
      • Define the instruction
      • Train with full dataset: dictionary 'parameter_values'
      • Set up Google Cloud to run the Vertex AI pipeline
    • Run the pipeline job on Vertex AI
      • Create and run the pipeline job
  • Lesson 4: Evaluate the Tuned Model
        • Project environment setup
      • Explore results with Tensorboard
      • Evaluate using the tuned and untuned model
      • Explore the results side by side in a dataframe
    • End

What you’ll learn in this course

Large language models (LLMs) are trained on human-generated text, but additional methods are needed to align an LLM with human values and preferences.

Reinforcement Learning from Human Feedback (RLHF) is currently the main method for aligning LLMs with human values and preferences. RLHF is also used for further tuning a base LLM to align with values and preferences that are specific to your use case.

In this course, you will gain a conceptual understanding of the RLHF training process, and then practice applying RLHF to tune an LLM. You will:

  • Explore the two datasets that are used in RLHF training: the “preference” and “prompt” datasets.
  • Use the open source Google Cloud Pipeline Components Library, to fine-tune the Llama 2 model with RLHF.
  • Assess the tuned LLM against the original base model by comparing loss curves and using the “Side-by-Side (SxS)” method.

Lesson1: How does RLHF work

一种是把{Input text, Summary}作为输入,训练模型

在这里插入图片描述

在这里插入图片描述

但是每个人的summary都不一样,此时需要的是把几个summary的候选项都喂给模型,然后给出人们的preference

在这里插入图片描述

几个步骤

在这里插入图片描述

Reward model在训练和推理时候的不同之处:

在这里插入图片描述

completion是base LLM的输出

在这里插入图片描述

在RL loop中用奖励模型来fine tune大模型

在这里插入图片描述

下面是三步的展示:

在这里插入图片描述

两篇需要参考的论文

Deep Reinforcement Learning from Human Preferences

Training language models to follow instructions with human feedback

在这里插入图片描述

Lesson 2: Datasets For Reinforcement Learning Training

Loading and exploring the datasets

“Reinforcement Learning from Human Feedback” (RLHF) requires the following datasets:

  • Preference dataset
    • Input prompt, candidate response 0, candidate response 1, choice (candidate 0 or 1)
  • Prompt dataset
    • Input prompt only, no response
Preference dataset
preference_dataset_path = 'sample_preference.jsonl'import jsonpreference_data = []with open(preference_dataset_path) as f:for line in f:preference_data.append(json.loads(line))

sample_preference.jsonl文件内容

{"input_text": "I live right next to a huge university, and have been applying for a variety of jobs with them through their faceless electronic jobs portal (the \"click here to apply for this job\" type thing) for a few months. \n\nThe very first job I applied for, I got an interview that went just so-so. But then, I never heard back (I even looked up the number of the person who called me and called her back, left a voicemail, never heard anything).\n\nNow, when I'm applying for subsequent jobs - is it that same HR person who is seeing all my applications?? Or are they forwarded to the specific departments?\n\nI've applied for five jobs there in the last four months, all the resumes and cover letters tailored for each open position. Is this hurting my chances? I never got another interview there, for any of the positions. [summary]: ", "candidate_0": " When applying through a massive job portal, is just one HR person seeing ALL of them?", "candidate_1": " When applying to many jobs through a single university jobs portal, is just one HR person reading ALL my applications?", "choice": 1}
{"input_text": "I currently live in Texas and I plan on going to university in England, and I think I want to stay there for a while. Before I go to university, though, I wanted to plan a road trip across the US. Obviously this is going to be expensive and I plan on saving money (I already have a lot saved up), but I'm still unsure of the route. I've lived in a couple different places and I've traveled a lot inside the US, but there's still a lot that I haven't seen. I want to make the route as short as possible while still visiting the places I want. So, in your opinion, should I try and go mostly places that mean something to me from my childhood, or should I try to go mostly to places that I've never seen? [summary]: ", "candidate_0": " I want to go on a road trip from Texas to England to visit as many places as possible. Which route should I choose?", "candidate_1": " How do I plan a road trip in a way that I can see the places I want to see, but also see the places I haven't seen?", "choice": 1}
{"input_text": "Dear people on Reddit,\n\nMy husband is American and I'm a foreigner so we applied for a K1 visa which is basically \" a visa issued to the fianc\u00e9 or fianc\u00e9e of a United States citizen to enter the United States. A K-1 visa requires a foreigner to marry his or her U.S. citizen petitioner within 90 days of entry, or depart the United States.\"\n\nWith this visa I need to get married in the USA and I cannot leave USA until I adjust my status, which can takes several months. This means I can't leave USA to go to a honeymoon or to do a second wedding in my home country. \nThe thing is that I have lived in several countries and have friends and family all around the world so I don't even know how to start planning something. I had several ideas of weddings in the USA but either my fianc\u00e9 didn't like or it was too expensive. I wanted to get married in a cruise (to Alaska), fianc\u00e9 agreed but there is something called Jones act that says that every cruise must pass through foreigner ports so even if we go to Alaska, the cruise would go through Canadian waters.\n\nI really do not want a background wedding, although this would be a reasonable choice. \n\nI would like to have some ideas of really small destination wedding because if we get married only with our parents (and fiance's closest friends/family) present, it would be the best option because I  wouldn't be happy having huge a wedding where my best friends and family couldn't attend. \n\nFianc\u00e9 lives in Mississippi and I would like to go to somewhere snowy (we are planning to get married during xmas holiday)\n\nI feel like I'm going crazy trying to plan something in those circumstances. [summary]: ", "candidate_0": " I need some ideas of how to plan a really small destination wedding (with only closest family) in the USA. Visa says I need to get married in the US and cannot leave the US for honeymoon.", "candidate_1": " I need to get married in USA but I have no idea how to plan a wedding. I want to have a small destination wedding. I have no idea how to plan something.", "choice": 0}
{"input_text": "As a kid I started reading a book series, but I need your help in remembering what it is called.\nI was about \"magicians\" in a post apocalyptic world, who searched city ruins for, what is now, modern technology.  However they lost most knowledge of the tech in this great catasptrophy.  These magicians were identified by an earring the wore with a blue ball.  I remember it started off with some street rat sneaking into a mage's house and getting caught and the mage taking him under his wing after creating some doll to threaten the boy, then dismantling it.  Any help would be appreciated. [summary]: ", "candidate_0": " Magicians with blue earrings searching for lost modern technology after some great catastrophe, which caused them to lose all knowledge of modern technology.", "candidate_1": " What is the name of a book series of magic?", "choice": 1}
{"input_text": "Hey guys, I'm having a really frustrating time with one of my computers in my home, and I'm wondering about ways in which I can fix it.\n\nThis is the situation: I built a computer 3 years ago (April '07).  It ran perfectly with occasional hiccups due to viruses and such for two years, but for the past year or so it has been almost unbearable to use according to my family members.  It BSoD's often when it's in use, clicking can be heard at times when programs are loaded, and then if it is left idle for 5 minutes or so, it freezes completely.  The screen still shows everything that was occurring, but is completely unresponsive.\n\nNow, the BSoD's I think has to do with a hardware component of the computer failing, and the clicking leads me to believe it's the hard drive (It basically sounds like something that happens whenever the hard drive is required to start up).  I'm already looking into getting a new hard drive for it and hooking it up, which I feel would solve these two problems (potentially).\n\nThe one I have trouble with it is the random freezing.  I hate that I can't run AV scans or leave it to do anything without coming back and moving the mouse or typing something constantly.  I've tried looking for OS updates (Vista), installing new drivers for just about everything on the computer, and removing almost all of the junk that was on it, yet I'm still getting the same problem.\n\nAnyway, I was just wondering if anyone had experienced the same problem(s) before and could offer any help.  I'll be home from work in a couple of hours and can give specific details if you guys think it'd be useful. [summary]: ", "candidate_0": " Computer is freezing after inactivity for the past year; hard drive has been failing and I can't figure out why.  Help?", "candidate_1": " Computer randomly freezes randomly and I'm wondering if it's due to a hardware failure, and/or if it's the hard drive.", "choice": 1}
  • Print out to explore the preference dataset
sample_1 = preference_data[0]print(type(sample_1))

Output

<class ‘dict’>

# This dictionary has four keys
print(sample_1.keys())

Output

dict_keys(['input_text', 'candidate_0', 'candidate_1', 'choice'])
  • Key: ‘input_test’ is a prompt.
sample_1['input_text']

Output

'I live right next to a huge university, and have been applying for a variety of jobs with them through their faceless electronic jobs portal (the "click here to apply for this job" type thing) for a few months. \n\nThe very first job I applied for, I got an interview that went just so-so. But then, I never heard back (I even looked up the number of the person who called me and called her back, left a voicemail, never heard anything).\n\nNow, when I\'m applying for subsequent jobs - is it that same HR person who is seeing all my applications?? Or are they forwarded to the specific departments?\n\nI\'ve applied for five jobs there in the last four months, all the resumes and cover letters tailored for each open position. Is this hurting my chances? I never got another interview there, for any of the positions. [summary]: '
# Try with another examples from the list, and discover that all data end the same way
preference_data[2]['input_text'][-50:]

Output

'plan something in those circumstances. [summary]: '
  • Print ‘candidate_0’ and ‘candidate_1’, these are the completions for the same prompt.
print(f"candidate_0:\n{sample_1.get('candidate_0')}\n")
print(f"candidate_1:\n{sample_1.get('candidate_1')}\n")

Output

candidate_0:When applying through a massive job portal, is just one HR person seeing ALL of them?candidate_1:When applying to many jobs through a single university jobs portal, is just one HR person reading ALL my applications?
  • Print ‘choice’, this is the human labeler’s preference for the results completions (candidate_0 and candidate_1)
print(f"choice: {sample_1.get('choice')}")

choice:1

Prompt dataset
prompt_dataset_path = 'sample_prompt.jsonl'prompt_data = []with open(prompt_dataset_path) as f:for line in f:prompt_data.append(json.loads(line))# Check how many prompts there are in this dataset
len(prompt_data)

Output: 6

Note: It is important that the prompts in both datasets, the preference and the prompt, come from the same distribution.

For this lesson, all the prompts come from the same dataset of Reddit posts.

sample_prompt.jsonl文件内容如下:

{"input_text": "I noticed this the very first day! I took a picture of it to send to one of my friends who is a fellow redditor. Later when I was getting to know my suitemates, I asked them if they ever used reddit, and they showed me the stencil they used to spray that! Along with the lion which is his trademark. \n But [summary]: "}
{"input_text": "Nooooooo, I loved my health class! My teacher was amazing! Most days we just went outside and played and the facility allowed it because the health teacher's argument was that teens need to spend time outside everyday and he let us do that. The other days were spent inside with him teaching us how to live a healthy lifestyle. He had guest speakers come in and reach us about nutrition and our final was open book...if we even had a final.... [summary]: "}
{"input_text": "Unlike Python (and some other packages), this isn't a situation where you get to choose which version you're running. Staying up with the latest major version is vital as they address bug and security fixes. There have been some major performance gains in the last few releases as well. Additionally, they are ramping up efforts to release Puppet 4 in the near future. [summary]: "}
{"input_text": "You could be in the right or in the wrong, depending on what server it was. If it was on a community server, like a 24/7 Hightower server especially, then there is sort of an unwritten rule that you don't tryhard on Hightower. But if it was on a valve server then odds are more people than just you will want to cap, so it becomes an objective map again. However, it seems in your experience it was not a tryhard server, because you thought that it was easy to cap, and you only had one team mate helping you. This means the enemy team rarely tried to stop you, and that most of your team mates were just messing around. If you like playing the objective on Hightower, go to a valve server. You were right to ask however, I see too many people just ignorantly making everyone else have  a bad time without concern. [summary]: "}
{"input_text": "300 hrs in a month? My hrs are spread over the entire development of the mod, from beginning to end. It's been a fun ride, but constant re-looting has gotten tedious to say the least. \nYou're right about it being better than standalone, but I've moved on from this \"broken\" genre for now. \n >Especially considering the devs do it all for free. Yea they get donations, \n Get real. Donations are huge. I bought a private server during the dayz hayday. Made $800/month clear above expenses. Only had a few hundred regular players. It pays a lot more than working min wage. So don't pretend they are making the mod out of the goodness of their hearts, that's actually silly. [summary]: "}
{"input_text": "I'm getting my PhD in chemistry, and I hope to become a professor at a liberal arts college. I have been accepted to three schools. One is top tier, one is mid tier, and one is bottom tier. (All three are RU/VH.) I'm having a hard time with the decision, and I need some insight. \n \n I want to be close to family:\nBottom tier school is 30 minutes away. It's in my home state. I would be happy to teach at some of the schools in my home state. \n \n Research is not the most important thing in my life:\nI am not a workaholic. I will not stay awake thinking about my work until the wee hours in the morning. The top tier school is a pressure cooker, and I'm more likely to have mental health issues if I go there. Mid tier is slightly more laid back, but it's still grad school. 5th year grad students at the bottom tier school said that 40-50 hours a week is the norm. It is a very relaxed department. \n \n I want to like the research at least a little:\nTop school is amazing. Everything is awesome. Funding and resources are not even a little bit of an issue. \nMid school has cool stuff happening. I would be happy doing that work.\nBottom tier school has only one professor I would want to work for, and even that research wasn't that exciting to me. [summary]: "}
# Function to print the information in the prompt dataset with a better visualization
def print_d(d):for key, val in d.items():        print(f"key:{key}\nval:{val}\n")
print_d(prompt_data[0])

Output

key:input_text
val:I noticed this the very first day! I took a picture of it to send to one of my friends who is a fellow redditor. Later when I was getting to know my suitemates, I asked them if they ever used reddit, and they showed me the stencil they used to spray that! Along with the lion which is his trademark. But [summary]: 
# Try with another prompt from the list 
print_d(prompt_data[1])

Output

key:input_text
val:Nooooooo, I loved my health class! My teacher was amazing! Most days we just went outside and played and the facility allowed it because the health teacher's argument was that teens need to spend time outside everyday and he let us do that. The other days were spent inside with him teaching us how to live a healthy lifestyle. He had guest speakers come in and reach us about nutrition and our final was open book...if we even had a final.... [summary]: 

Lesson 3: Tune an LLM with RLHF

Project environment setup

The RLHF training process has been implemented in a machine learning pipeline as part of the (Google Cloud Pipeline Components) library. This can be run on any platform that supports KubeFlow Pipelines (an open source framework), and can also run on Google Cloud’s Vertex AI Pipelines.

To run it locally, install the following:

!pip3 install google-cloud-pipeline-components
!pip3 install kfp

Compile the pipeline

# Import (RLHF is currently in preview)
from google_cloud_pipeline_components.preview.llm \
import rlhf_pipeline# Import from KubeFlow pipelines
from kfp import compiler

这行代码导入了 kfp 包中的 compiler 模块。KFP 是指 Kubeflow Pipelines,它是一个用于机器学习工作流程的开源平台。而 compiler 模块则提供了一些功能,用于将机器学习工作流程编译成可在 Kubernetes 上执行的形式。

通常,使用 kfp.compiler 模块可以将定义为 Python 函数的机器学习工作流程编译成 Kubernetes 执行的规范格式。这使得可以将机器学习工作流程部署到 Kubernetes 上运行,并且可以与 Kubeflow Pipelines 中的其他组件集成。

对KubeFlow的解释:

Kubeflow 是一个开源的机器学习(ML)工作流程工具包,旨在使在 Kubernetes 上部署、管理和扩展机器学习工作流程变得更加简单。Kubernetes 是一个用于自动化部署、扩展和管理容器化应用程序的开源平台,而 Kubeflow 则是专门为机器学习工作负载设计的 Kubernetes 的一套扩展。

Kubeflow 提供了一系列工具和组件,用于在 Kubernetes 上构建端到端的机器学习工作流程,包括:

  1. Jupyter Notebooks: Kubeflow 提供了 Jupyter Notebook 服务,使得数据科学家可以在 Kubernetes 上使用 Notebooks 进行交互式的数据分析和模型训练。

  2. TFJob: TFJob 是 Kubernetes 上的 TensorFlow 训练作业的自定义资源定义(CRD),用于在 Kubernetes 集群中运行 TensorFlow 训练任务。

  3. Pipeline: Kubeflow Pipelines 是一个基于 Kubernetes 的编排引擎,用于构建和部署机器学习工作流程。它允许用户以可重复和可扩展的方式定义和执行端到端的机器学习工作流程。

  4. Serving: Kubeflow Serving 允许用户轻松部署经过训练的模型作为 RESTful 服务,并自动处理模型版本控制、负载均衡和扩展。

  5. Metadata: Kubeflow Metadata 是用于跟踪实验元数据和模型版本控制的工具,有助于管理和组织机器学习工作负载。

总的来说,Kubeflow 旨在为机器学习团队提供一个可扩展、灵活且易于管理的平台,以简化机器学习工作流程的开发、部署和管理。

# Define a path to the yaml file
RLHF_PIPELINE_PKG_PATH = "rlhf_pipeline.yaml"# Execute the compile function
compiler.Compiler().compile(pipeline_func=rlhf_pipeline,package_path=RLHF_PIPELINE_PKG_PATH
)# Print the first lines of the YAML file
!head rlhf_pipeline.yaml

Output

# PIPELINE DEFINITION
# Name: rlhf-train-template
# Description: Performs reinforcement learning from human feedback.
# Inputs:
#    deploy_model: bool [Default: True]
#    eval_dataset: str
#    instruction: str
#    kl_coeff: float [Default: 0.1]
#    large_model_reference: str
#    location: str [Default: '{{$.pipeline_google_cloud_location}}']

Note: to print the whole YAML file, use the following:

!cat rlhf_pipeline.yaml

Define the Vertex AI pipeline job

Define the location of the training and evaluation data

Previously, the datasets were loaded from small JSONL files, but for typical training jobs, the datasets are much larger, and are usually stored in cloud storage (in this case, Google Cloud Storage).

Note: Make sure that the three datasets are stored in the same Google Cloud Storage bucket.

parameter_values={"preference_dataset": \"gs://vertex-ai/generative-ai/rlhf/text_small/summarize_from_feedback_tfds/comparisons/train/*.jsonl","prompt_dataset": \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/train/*.jsonl","eval_dataset": \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/val/*.jsonl",...

gs://vertex-ai 是一个 Google Cloud Storage (GCS) 存储桶的地址,用于存储与 Google Cloud Vertex AI 服务相关的数据、模型、元数据等。

Google Cloud Storage (GCS) 是 Google Cloud 平台上的一项云存储服务,它提供了高度可扩展的对象存储,允许您以安全、可靠且成本效益的方式存储数据。gs:// 开头的地址是用于指示文件路径或存储位置的一种约定,gs://vertex-ai 表示数据存储在 Google Cloud Storage 中的 vertex-ai 存储桶中。

Google Cloud Vertex AI 是 Google Cloud 平台上的一个服务,用于构建、部署和管理机器学习模型。在使用 Vertex AI 时,通常需要将数据、模型等资源存储在 Google Cloud Storage 中,以便 Vertex AI 可以访问并使用这些资源。因此,gs://vertex-ai 可能是存储在 Google Cloud Storage 中,用于支持 Vertex AI 服务的资源的路径。

Choose the foundation model to be tuned

In this case, we are tuning the Llama-2 foundational model, the LLM to tune is called large_model_reference.

In this course, we’re tuning the llama-2-7b, but you can also run an RLHF pipeline on Vertex AI to tune models such as: the T5x or text-bison@001.

parameter_values={"large_model_reference": "llama-2-7b",...

Calculate the number of reward model training steps

reward_model_train_steps is the number of steps to use when training the reward model. This depends on the size of your preference dataset. We recommend the model should train over the preference dataset for 20-30 epochs for best results.

s t e p s P e r E p o c h = ⌈ d a t a s e t S i z e b a t c h S i z e ⌉ stepsPerEpoch = \left\lceil \frac{datasetSize}{batchSize} \right\rceil stepsPerEpoch=batchSizedatasetSize
t r a i n S t e p s = s t e p s P e r E p o c h × n u m E p o c h s trainSteps = stepsPerEpoch \times numEpochs trainSteps=stepsPerEpoch×numEpochs

The RLHF pipeline parameters are asking for the number of training steps and not number of epochs. Here’s an example of how to go from epochs to training steps, given that the batch size for this pipeline is fixed at 64 examples per batch.

# Preference dataset size
PREF_DATASET_SIZE = 3000# Batch size is fixed at 64
BATCH_SIZE = 64import math
REWARD_STEPS_PER_EPOCH = math.ceil(PREF_DATASET_SIZE / BATCH_SIZE)
print(REWARD_STEPS_PER_EPOCH)  # 47REWARD_NUM_EPOCHS = 30# Calculate number of steps in the reward model training
reward_model_train_steps = REWARD_STEPS_PER_EPOCH * REWARD_NUM_EPOCHSprint(reward_model_train_steps) # 1410

Calculate the number of reinforcement learning training steps

The reinforcement_learning_train_steps parameter is the number of reinforcement learning steps to perform when tuning the base model.

  • The number of training steps depends on the size of your prompt dataset. Usually, this model should train over the prompt dataset for roughly 10-20 epochs.
  • Reward hacking: if given too many training steps, the policy model may figure out a way to exploit the reward and exhibit undesired behavior.
# Prompt dataset size
PROMPT_DATASET_SIZE = 2000# Batch size is fixed at 64
BATCH_SIZE = 64import mathRL_STEPS_PER_EPOCH = math.ceil(PROMPT_DATASET_SIZE / BATCH_SIZE)
print(RL_STEPS_PER_EPOCH) # 32RL_NUM_EPOCHS = 10# Calculate the number of steps in the RL training
reinforcement_learning_train_steps = RL_STEPS_PER_EPOCH * RL_NUM_EPOCHSprint(reinforcement_learning_train_steps) # 320

Define the instruction

  • Choose the task-specific instruction that you want to use to tune the foundational model. For this example, the instruction is “Summarize in less than 50 words.”
  • You can choose different instructions, for example, “Write a reply to the following question or comment.” Note that you would also need to collect your preference dataset with the same instruction added to the prompt, so that both the responses and the human preferences are based on that instruction.
# Completed values for the dictionary
parameter_values={"preference_dataset": \"gs://vertex-ai/generative-ai/rlhf/text_small/summarize_from_feedback_tfds/comparisons/train/*.jsonl","prompt_dataset": \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/train/*.jsonl","eval_dataset": \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/val/*.jsonl","large_model_reference": "llama-2-7b","reward_model_train_steps": 1410,"reinforcement_learning_train_steps": 320, # results from the calculations above"reward_model_learning_rate_multiplier": 1.0,"reinforcement_learning_rate_multiplier": 1.0,"kl_coeff": 0.1, # increased to reduce reward hacking"instruction":\"Summarize in less than 50 words"}

“Reward hacking” 是指在强化学习中的一种问题,指的是智能体通过找到不完全符合任务目标但仍然能获得高奖励的策略来“欺骗”或“利用”奖励函数的情况。

在强化学习中,智能体的目标通常是通过最大化奖励函数来学习良好的策略。然而,如果奖励函数设计不当或者智能体能够发现一些奖励函数的漏洞,它可能会找到一些不符合任务真正目标的策略,但能够获得高奖励。这种行为可能会导致智能体学习到了错误的行为或策略。

举个例子,假设一个智能体被训练来玩一个游戏,游戏的奖励函数设计为给予智能体一定数量的奖励,只要它成功击败了游戏中的对手。智能体可能会发现一种“reward hacking”的策略,即不去学习如何击败对手,而是通过不断自杀来获得游戏结束的奖励,从而获得高奖励。

解决这个问题的方法包括设计更加准确和完善的奖励函数,以确保它能够真正反映出任务的目标,并且在训练过程中引入一些技术,如奖励函数的监督和调整,以减少智能体利用奖励函数漏洞的可能性。

Train with full dataset: dictionary ‘parameter_values’

  • Adjust the settings for training with the full dataset to achieve optimal results in the evaluation (next lesson). Take a look at the new values; these results are from various training experiments in the pipeline, and the best parameter values are displayed here.
parameter_values={"preference_dataset": \"gs://vertex-ai/generative-ai/rlhf/text/summarize_from_feedback_tfds/comparisons/train/*.jsonl","prompt_dataset": \"gs://vertex-ai/generative-ai/rlhf/text/reddit_tfds/train/*.jsonl","eval_dataset": \"gs://vertex-ai/generative-ai/rlhf/text/reddit_tfds/val/*.jsonl","large_model_reference": "llama-2-7b","reward_model_train_steps": 10000,"reinforcement_learning_train_steps": 10000, "reward_model_learning_rate_multiplier": 1.0,"reinforcement_learning_rate_multiplier": 0.2,"kl_coeff": 0.1,"instruction":\"Summarize in less than 50 words"}

在这里插入图片描述

Set up Google Cloud to run the Vertex AI pipeline

Vertex AI is already installed in this classroom environment. If you were running this on your own project, you would install Vertex AI SDK like this:

!pip3 install google-cloud-aiplatform
# Authenticate in utils
from utils import authenticate
credentials, PROJECT_ID, STAGING_BUCKET = authenticate()# RLFH pipeline is available in this region
REGION = "europe-west4"

utils.py文件中的authenticate函数如下:

import os
from dotenv import load_dotenv
import json
import base64
from google.auth.transport.requests import Request
from google.oauth2.service_account import Credentialsdef authenticate():#Load .envload_dotenv()#DLAI Custom Keyreturn "DLAI_CREDENTIALS", "DLAI_PROJECT", "gs://gcp-sc2-rlhf"#Decode key and store in .JSONSERVICE_ACCOUNT_KEY_STRING_B64 = os.getenv('SERVICE_ACCOUNT_KEY')SERVICE_ACCOUNT_KEY_BYTES_B64 = SERVICE_ACCOUNT_KEY_STRING_B64.encode("ascii")SERVICE_ACCOUNT_KEY_STRING_BYTES = base64.b64decode(SERVICE_ACCOUNT_KEY_BYTES_B64)SERVICE_ACCOUNT_KEY_STRING = SERVICE_ACCOUNT_KEY_STRING_BYTES.decode("ascii")SERVICE_ACCOUNT_KEY = json.loads(SERVICE_ACCOUNT_KEY_STRING)# Create credentials based on key from service account# Make sure your account has the roles listed in the Google Cloud Setup sectioncredentials = Credentials.from_service_account_info(SERVICE_ACCOUNT_KEY,scopes=['https://www.googleapis.com/auth/cloud-platform'])if credentials.expired:credentials.refresh(Request())#Set project ID according to environment variable    PROJECT_ID = os.getenv('PROJECT_ID')STAGING_BUCKET = os.getenv('STAGING_BUCKET')# 'gs://gcp-sc2-rlhf-staging'return credentials, PROJECT_ID, STAGING_BUCKET

Run the pipeline job on Vertex AI

Now that we have created our dictionary of values, we can create a PipelineJob. This just means that the RLHF pipeline will execute on Vertex AI. So it’s not running locally here in the notebook, but on some server on Google Cloud.

import google.cloud.aiplatform as aiplatformaiplatform.init(project = PROJECT_ID,location = REGION,credentials = credentials)# Look at the path for the YAML file
RLHF_PIPELINE_PKG_PATH  # 'rlhf_pipeline.yaml'

Create and run the pipeline job

  • Here is how you would create the pipeline job and run it if you were working on your own project.

  • This job takes about a full day to run with multiple accelerators (TPUs/GPUs), and so we’re not going to run it in this classroom.

  • To create the pipeline job:

job = aiplatform.PipelineJob(display_name="tutorial-rlhf-tuning",pipeline_root=STAGING_BUCKET,template_path=RLHF_PIPELINE_PKG_PATH,parameter_values=parameter_values)
  • To run the pipeline job:
job.run()
  • The content team has run this RLHF training pipeline to tune the Llama-2 model, and in the next lesson, you’ll get to evaluate the log data to compare the performance of the tuned model with the original foundational model.

Lesson 4: Evaluate the Tuned Model

Project environment setup
  • Install Tensorboard (if running locally)
!pip install tensorboard

在这里插入图片描述

ROUGE-L(Recall-Oriented Understudy for Gisting Evaluation - Longest Common Subsequence)是用于评估自动生成的摘要与参考摘要之间相似程度的一种指标。它是 ROUGE(Recall-Oriented Understudy for Gisting Evaluation)指标家族的一部分,旨在衡量自动生成的文本与参考文本之间的重叠程度。

ROUGE-L 的计算基于最长公共子序列(LCS)的概念。它测量了自动生成的文本和参考文本中最长公共子序列的长度,然后将其归一化为参考摘要的长度。因此,ROUGE-L 越高,表示自动生成的文本与参考摘要的匹配程度越高。

ROUGE-L 的计算方式使其更加偏向于考虑内容的一致性,而不是简单地匹配重复的词语或短语。因此,它通常被用于评估自动生成的摘要在保留原文核心信息的同时,也具有一定的连贯性和完整性。

ROUGE-L 在自然语言处理领域的自动摘要、机器翻译等任务中经常被用作评估指标,以评估生成的文本与人工参考文本之间的相似性。

Explore results with Tensorboard

%load_ext tensorboardport = %env PORT1
%tensorboard --logdir reward-logs --port $port --bind_all 
# Look at what this directory has
%ls reward-logsport = %env PORT2
%tensorboard --logdir reinforcer-logs --port $port --bind_all
port = %env PORT3
%tensorboard --logdir reinforcer-fulldata-logs --port $port --bind_all
  • The dictionary of ‘parameter_values’ defined in the previous lesson
parameter_values={"preference_dataset": \"gs://vertex-ai/generative-ai/rlhf/text_small/summarize_from_feedback_tfds/comparisons/train/*.jsonl","prompt_dataset": \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/train/*.jsonl","eval_dataset": \"gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/val/*.jsonl","large_model_reference": "llama-2-7b","reward_model_train_steps": 1410,"reinforcement_learning_train_steps": 320,"reward_model_learning_rate_multiplier": 1.0,"reinforcement_learning_rate_multiplier": 1.0,"kl_coeff": 0.1,"instruction":\"Summarize in less than 50 words"}

Note: Here, we are using “text_small” for our datasets for learning purposes. However for the results that we’re evaluating in this lesson, the team used the full dataset with the following hyperparameters:

parameter_values={"preference_dataset": \"gs://vertex-ai/generative-ai/rlhf/text/summarize_from_feedback_tfds/comparisons/train/*.jsonl","prompt_dataset": \"gs://vertex-ai/generative-ai/rlhf/text/reddit_tfds/train/*.jsonl","eval_dataset": \"gs://vertex-ai/generative-ai/rlhf/text/reddit_tfds/val/*.jsonl","large_model_reference": "llama-2-7b","reward_model_train_steps": 10000,"reinforcement_learning_train_steps": 10000, "reward_model_learning_rate_multiplier": 1.0,"reinforcement_learning_rate_multiplier": 0.2,"kl_coeff": 0.1,"instruction":\"Summarize in less than 50 words"}

Evaluate using the tuned and untuned model

在这里插入图片描述

用tuned model产生的结果:

import jsoneval_tuned_path = 'eval_results_tuned.jsonl'eval_data_tuned = []with open(eval_tuned_path) as f:for line in f:eval_data_tuned.append(json.loads(line))# Import for printing purposes
from utils import print_d# Look at the result produced by the tuned model
print_d(eval_data_tuned[0])

eval_results_tuned.jsonl文件内容如下:

{"inputs": {"inputs_pretokenized": "Summarize in less than 50 words.\n\n\nBefore anything, not a sad story or anything. My country's equivalent to Valentine's Day is coming and I had this pretty simple idea to surprise my girlfriend and it would involve giving her some roses. The thing is, although I know she would appreciate my intention in and of itself, I don't know if she would like the actual flowers and such, so I wanted to find out if she likes roses and if she would like getting some, but without her realizing it so as not to spoil the surprise. Any ideas on how to get that information out of her? [summary]: ", "targets_pretokenized": ""}, "prediction": "My country's equivalent to Valentine's Day is coming. Want to surprise my girlfriend with roses but don't know if she would like getting some. Any ideas on how to get that information out of her without spoiling the surprise"}
{"inputs": {"inputs_pretokenized": "Summarize in less than 50 words.\n\n\nFor most of high school, I've been the go to \"computer kid.\" I'll be the first to admit that I know a lot about how computers work, and often fix things for teachers before the IT guys have a chance to get a whack at them. I worked at a computer repair shop for half a year as a technician. I've done the typical tech guy thing and drooled over new computers that come out (read: Wired/PopSci/2600 articles and centerfolds). \n\nThat said, I have NO IDEA what kind of computer to get for college. I've always wanted to buy a ton of parts and build my own \"super\" desktop, but I for sure need a laptop. If everything goes according to plan, I'm enrolling at Champlain College next Fall to double major in Computer Network Information Security and Digital Forensics. \n\nA lot of you probably already know this, but this major involves a lot of virtual desktops. Essentially I set up entire networks on my own rig and then go in and play around with them, all from one computer. It takes a lot of processing power and hard drive space.\n\nOver the years I've built up several stereotypes about different brands, operating systems, products... Recently, my mind has been opened back up and I'm willing to take a look at anything, even a Dell or an Asus.\n\nIt's recently been brought to my attention that Macs are actually pretty good. I've always been a notorious Windows-only-guy. I finally decided to bite my tongue and take a look under the hood, and my god are these things powerful... I guess it really does help making everything but the processor in-house.\n\nAnywho, what I really need is suggestions that might work well for my major, hopefully staying under a budget of $1600. Also, what are the big differences between a MacBook and  MacBook Pro? [summary]: ", "targets_pretokenized": ""}, "prediction": "Been the go to \"computer kid\" in high school. Majoring in Computer Network Information Security and Digital Forensics. Need suggestions for a laptop under 1600. What are the big differences between a MacBook and MacBook Pro"}
{"inputs": {"inputs_pretokenized": "Summarize in less than 50 words.\n\n\nSo I graduated in may 2014 in the DC area but couldn't start applying to jobs till April of this year due to family circumstances. So far I've applied to nearly 50+ places, 4 recruiters contacted me and added me to their available candidates list after a small interview (one of them got me a small office assignment but that's over) and gotten two interviews by myself (none of which panned out). I have experience in office software and am pretty well versed in excel (i'd say a little less than advanced but a lot more than proficient) and advanced mathematics. I have a GPA higher than 3 but lower than 3.5 which I don't mention in my resume. Could that be it? \n\nI don't understand what I'm doing wrong. So far I've been applying to anything remotely related with data, analysis, finance at entry level but I barely get any response. Data entry and admin assistant positions are no different. I also write cover letters for almost every place. I also don't know many people here since my family isn't really well connected enough for me to have great networking opportunities. I really don't know what to do next. I can't imagine how people applying to over 200-400 places deal with the frustrating and emotionally draining experience. I would really appreciate some advice from recent econ/finance grads or hiring personnel. [summary]: ", "targets_pretokenized": ""}, "prediction": "Graduated in may 2014. Applied to nearly 50+ places for data, analysis finance entry level. Barely get any response. Don't know what to do next. Would appreciate advice from recent econ/finance grads or hiring personnel."}
{"inputs": {"inputs_pretokenized": "Summarize in less than 50 words.\n\n\nHello /r/jobs, I graduated a few months ago and had no luck so far to get a job in marketing/sales. \n\nThere's a small local company (perhaps 30 employees) but they are actually pretty successful in what they're doing (known worldwide). I checked their website and it's awful. Looks like a website from the early 2000's. So I guess they are not pretty good in (online-)marketing. \n\nI would like to do a cold application (not sure if they are looking for a marketing guy) but I had no luck with this kind of application in the past. That's why I thought I try something different. I have good skills in photoshop, indesign and illustrator. As a teenager I also built websites using HTML, so I thought I build a dummy website fitted to their company and attach some screenshots to my application.\n\nWhat do you think? I this a feasible idea or will they be offended?\n\nThank you very much in advance. [summary]: ", "targets_pretokenized": ""}, "prediction": "Graduated a few months ago. No luck so far to get a job in marketing/sales. Thought to build a dummy website fitted to the company and attach some screenshots to my application. What do you think?"}
{"inputs": {"inputs_pretokenized": "Summarize in less than 50 words.\n\n\nHello!\n\nI hope that someone can lend me some advice for a situation that I find myself in. Here's my situation:\n\nAbout 10 years ago as I was traveling, I applied for my first credit card which was an AMEX delta card. For the years after, I traveled often, and got benefits from this card.\n\nNow, after I have recently graduated college, and landed a full time job, I no longer have time to travel as much, and therefore have no need to use the Delta card. Last year I paid off my balance with the card, and haven't used it since.\n\nRecently, I had to pay the $95 annual fee, and realized that I should perhaps consider another credit card. I thought about closing this account and applying for a cash rewards card.\n\nNow my issue is that this is my ONLY and OLDEST credit card account. I would like to close the account and apply for a Bank of America cash rewards card as my checking/savings accounts are with them. However, I am concerned with how this will affect my credit report.\n\nWill closing this account greatly affect my credit score?\n\nI even spoke with the reps at AMEX about downgrading to a cash rewards card with no annual fee, and apparently I cannot do this as my AMEX card is linked with the Delta account. I do not want this card, I'm stuck in a pickle...\n\nAny advice would be appreciated! [summary]: ", "targets_pretokenized": ""}, "prediction": "About 10 years ago applied for my first credit card AMEX delta card. No longer have time to travel as much. Thinking of closing the account and apply for a cash rewards card. Any advice would be appreciated."}

Output

--------------------------------------------------
key:inputsval--------------------------------------------------key:inputs_pretokenizedval:Summarize in less than 50 words.Before anything, not a sad story or anything. My country's equivalent to Valentine's Day is coming and I had this pretty simple idea to surprise my girlfriend and it would involve giving her some roses. The thing is, although I know she would appreciate my intention in and of itself, I don't know if she would like the actual flowers and such, so I wanted to find out if she likes roses and if she would like getting some, but without her realizing it so as not to spoil the surprise. Any ideas on how to get that information out of her? [summary]: --------------------------------------------------key:targets_pretokenizedval:
--------------------------------------------------
key:predictionval:My country's equivalent to Valentine's Day is coming. Want to surprise my girlfriend with roses but don't know if she would like getting some. Any ideas on how to get that information out of her without spoiling the surprise

用base model产生的结果

eval_untuned_path = 'eval_results_untuned.jsonl'
eval_data_untuned = []with open(eval_untuned_path) as f:for line in f:eval_data_untuned.append(json.loads(line))# Look at the result produced by the untuned model
print_d(eval_data_untuned[0])

eval_results_untuned.jsonl文件如下:

{"inputs": {"inputs_pretokenized": "Summarize in less than 50 words.\n\n\nBefore anything, not a sad story or anything. My country's equivalent to Valentine's Day is coming and I had this pretty simple idea to surprise my girlfriend and it would involve giving her some roses. The thing is, although I know she would appreciate my intention in and of itself, I don't know if she would like the actual flowers and such, so I wanted to find out if she likes roses and if she would like getting some, but without her realizing it so as not to spoil the surprise. Any ideas on how to get that information out of her? [summary]: ", "targets_pretokenized": ""}, "prediction": "The author wants to surprise his girlfriend with roses on Valentine's Day but he doesn't know if she likes roses. He wants to find out without spoiling the surprise."}
{"inputs": {"inputs_pretokenized": "Summarize in less than 50 words.\n\n\nFor most of high school, I've been the go to \"computer kid.\" I'll be the first to admit that I know a lot about how computers work, and often fix things for teachers before the IT guys have a chance to get a whack at them. I worked at a computer repair shop for half a year as a technician. I've done the typical tech guy thing and drooled over new computers that come out (read: Wired/PopSci/2600 articles and centerfolds). \n\nThat said, I have NO IDEA what kind of computer to get for college. I've always wanted to buy a ton of parts and build my own \"super\" desktop, but I for sure need a laptop. If everything goes according to plan, I'm enrolling at Champlain College next Fall to double major in Computer Network Information Security and Digital Forensics. \n\nA lot of you probably already know this, but this major involves a lot of virtual desktops. Essentially I set up entire networks on my own rig and then go in and play around with them, all from one computer. It takes a lot of processing power and hard drive space.\n\nOver the years I've built up several stereotypes about different brands, operating systems, products... Recently, my mind has been opened back up and I'm willing to take a look at anything, even a Dell or an Asus.\n\nIt's recently been brought to my attention that Macs are actually pretty good. I've always been a notorious Windows-only-guy. I finally decided to bite my tongue and take a look under the hood, and my god are these things powerful... I guess it really does help making everything but the processor in-house.\n\nAnywho, what I really need is suggestions that might work well for my major, hopefully staying under a budget of $1600. Also, what are the big differences between a MacBook and  MacBook Pro? [summary]: ", "targets_pretokenized": ""}, "prediction": "A high school senior who is going to major in Computer Network Information Security and Digital Forensics needs a laptop that is powerful enough to run virtual desktops. He is open to suggestions of any brands, but prefers Macs. He is looking for a laptop under $1600 and wants to know the difference between MacBook and"}
{"inputs": {"inputs_pretokenized": "Summarize in less than 50 words.\n\n\nSo I graduated in may 2014 in the DC area but couldn't start applying to jobs till April of this year due to family circumstances. So far I've applied to nearly 50+ places, 4 recruiters contacted me and added me to their available candidates list after a small interview (one of them got me a small office assignment but that's over) and gotten two interviews by myself (none of which panned out). I have experience in office software and am pretty well versed in excel (i'd say a little less than advanced but a lot more than proficient) and advanced mathematics. I have a GPA higher than 3 but lower than 3.5 which I don't mention in my resume. Could that be it? \n\nI don't understand what I'm doing wrong. So far I've been applying to anything remotely related with data, analysis, finance at entry level but I barely get any response. Data entry and admin assistant positions are no different. I also write cover letters for almost every place. I also don't know many people here since my family isn't really well connected enough for me to have great networking opportunities. I really don't know what to do next. I can't imagine how people applying to over 200-400 places deal with the frustrating and emotionally draining experience. I would really appreciate some advice from recent econ/finance grads or hiring personnel. [summary]: ", "targets_pretokenized": ""}, "prediction": "Graduated in May 2014, but couldn't start applying to jobs till April of this year. Applied to nearly 50 places, but only got 2 interviews. GPA is higher than 3 but lower than 3.5. Don't know what I'm doing wrong."}
{"inputs": {"inputs_pretokenized": "Summarize in less than 50 words.\n\n\nHello /r/jobs, I graduated a few months ago and had no luck so far to get a job in marketing/sales. \n\nThere's a small local company (perhaps 30 employees) but they are actually pretty successful in what they're doing (known worldwide). I checked their website and it's awful. Looks like a website from the early 2000's. So I guess they are not pretty good in (online-)marketing. \n\nI would like to do a cold application (not sure if they are looking for a marketing guy) but I had no luck with this kind of application in the past. That's why I thought I try something different. I have good skills in photoshop, indesign and illustrator. As a teenager I also built websites using HTML, so I thought I build a dummy website fitted to their company and attach some screenshots to my application.\n\nWhat do you think? I this a feasible idea or will they be offended?\n\nThank you very much in advance. [summary]: ", "targets_pretokenized": ""}, "prediction": "Graduated a few months ago, looking for a job in marketing/sales. Found a small local company that is successful but has an awful website. I want to apply for a marketing job but not sure if they are hiring. I thought of building a dummy website for them and attach some screenshots to my application."}
{"inputs": {"inputs_pretokenized": "Summarize in less than 50 words.\n\n\nHello!\n\nI hope that someone can lend me some advice for a situation that I find myself in. Here's my situation:\n\nAbout 10 years ago as I was traveling, I applied for my first credit card which was an AMEX delta card. For the years after, I traveled often, and got benefits from this card.\n\nNow, after I have recently graduated college, and landed a full time job, I no longer have time to travel as much, and therefore have no need to use the Delta card. Last year I paid off my balance with the card, and haven't used it since.\n\nRecently, I had to pay the $95 annual fee, and realized that I should perhaps consider another credit card. I thought about closing this account and applying for a cash rewards card.\n\nNow my issue is that this is my ONLY and OLDEST credit card account. I would like to close the account and apply for a Bank of America cash rewards card as my checking/savings accounts are with them. However, I am concerned with how this will affect my credit report.\n\nWill closing this account greatly affect my credit score?\n\nI even spoke with the reps at AMEX about downgrading to a cash rewards card with no annual fee, and apparently I cannot do this as my AMEX card is linked with the Delta account. I do not want this card, I'm stuck in a pickle...\n\nAny advice would be appreciated! [summary]: ", "targets_pretokenized": ""}, "prediction": "The author has an AMEX delta card which has an annual fee. The author wants to close the account and apply for a cash rewards card. However, the author is concerned about how this will affect their credit score."}

Output

key:inputsval--------------------------------------------------key:inputs_pretokenizedval:Summarize in less than 50 words.Before anything, not a sad story or anything. My country's equivalent to Valentine's Day is coming and I had this pretty simple idea to surprise my girlfriend and it would involve giving her some roses. The thing is, although I know she would appreciate my intention in and of itself, I don't know if she would like the actual flowers and such, so I wanted to find out if she likes roses and if she would like getting some, but without her realizing it so as not to spoil the surprise. Any ideas on how to get that information out of her? [summary]: --------------------------------------------------key:targets_pretokenizedval:
--------------------------------------------------
key:predictionval:The author wants to surprise his girlfriend with roses on Valentine's Day but he doesn't know if she likes roses. He wants to find out without spoiling the surprise.

Explore the results side by side in a dataframe

# Extract all the prompts
prompts = [sample['inputs']['inputs_pretokenized']for sample in eval_data_tuned]# Completions from the untuned model
untuned_completions = [sample['prediction']for sample in eval_data_untuned]# Completions from the tuned model
tuned_completions = [sample['prediction']for sample in eval_data_tuned]
  • Now putting all together in one big dataframe
import pandas as pdresults = pd.DataFrame(data={'prompt': prompts,'base_model':untuned_completions,'tuned_model': tuned_completions})pd.set_option('display.max_colwidth', None)# Print the results
results

Output

在这里插入图片描述

如果对RLHF感兴趣,可以参看:

在这里插入图片描述

End

2024年3月2日开始学习这门short course,花费2个小时结课。对整个基于人类反馈的强化学习中实际代码的编写有了一定认识。希望在阅读RLHF论文的时候的理解更深刻一点。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/723702.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

vue3 (四)动态组件Vs异步组件

1.动态组件 点击toggle切换2个组件&#xff0c;配合<keep-alive>使用防止切换后数据丢失 <keep-alive><component :is"currentItem"></component> </keep-alive> 2.异步组件 定义方法&#xff1a;app.component(组件名,Vue.defineAs…

西安雁塔未来人工智能计算中心算力成本分析

先看一例旧闻&#xff1a;西部“最强大脑”落户雁塔——30亿亿次超算能力助力创新之城建设 其中提到一期算力为 300PFLOPS FP16&#xff08;每秒30亿亿次半精度浮点计算&#xff09;&#xff0c;项目总投资约为19亿元。 这个算力是什么概念呢&#xff1f; 我们以深度学习训练中…

GIS之深度学习06:CUDA12安装(适配版)

CUDA&#xff08;Compute Unified Device Architecture&#xff09;是NVIDIA开发的并行计算平台和编程模型&#xff0c;用于利用NVIDIA GPU的并行计算能力&#xff0c;它允许开发者使用类似于C语言的编程语言编写并行程序&#xff0c;利用GPU的大规模并行计算能力加速各种类型的…

C语言写学生信息管理系统

说明:本博文来自CSDN-问答板块,题主提问。 需要:用C语言设计一个学生信息管理系统(尽量不使用指针),学生信息包括学号,姓名,数学成绩,C语言成绩,英语成绩和每个学生的总成绩这几项。系统要实现如下几个功能:1.添加学生2.删除学生3.修改学生信息4.查询学生信息5进行学…

关于Linux上的$ORIGIN解说

1、Linux RPATH & $ORIGIN 许多现代C / C 项目都利用Autotools创建GNU构建系统&#xff0c;例如 根据平台生成make文件。 可执行文件&#xff08;二进制文件&#xff09;在生成/编译过程中生成&#xff0c;并且可以在执行编译的计算机上本地执行。 但是&#xff0c;如果将…

2024大厂Java面试最火问题,1200页文档笔记

前言 ⽂章有点⻓&#xff0c;请耐⼼看完&#xff0c;绝对有收获&#xff01;不想听我BB直接进⼊⾯试分享&#xff1a; 准备过程蚂蚁⾦服⾯试分享拼多多⾯试分享字节跳动⾯试分享最后总结个人所得&#xff08;供大家参考学习&#xff09; 当时我⾃⼰也准备出去看看机会&#…

Claude 3家族惊艳亮相:AI领域掀起新浪潮,GPT-4面临强劲挑战

&#x1f308;个人主页: Aileen_0v0 &#x1f525;热门专栏: 华为鸿蒙系统学习|计算机网络|数据结构与算法|MySQL| ​&#x1f4ab;个人格言:“没有罗马,那就自己创造罗马~” #mermaid-svg-agd7RSCGMblYxo85 {font-family:"trebuchet ms",verdana,arial,sans-serif;f…

实践:Git 如何撤回已 Push 的代码

一、IDEA 操作 操作指南&#xff1a; 第一步&#xff0c;查看提交记录&#xff0c;找到撤回的索引&#xff08;记录&#xff09;。 第二步&#xff0c;工作目录&#xff08;也就是项目代码&#xff09; 回推到指定版本。 第三步&#xff0c;强制提交。 到此&#xff0c;仓库的…

微服务知识03

1、ES搜索引擎,高性能的分布式搜索引擎,底层基于Lucene 主要用于应用程序中的搜索系统 日志收集 2、基础概念 3、ES处理流程 5、下载中文分词器 Releases infinilabs/analysis-ik GitHub 6、分词模式 最细粒度拆分、智能分词 7、Elaticsearch配置流程 (1)把文件拖进…

基于单片机的蓝牙无线密码锁设计

目 录 摘 要 Ⅰ Abstract Ⅱ 引 言 1 1 系统总体设计 3 1.1 系统设计要求 3 1.2 系统设计思路 3 2 系统硬件设计 5 2.1 设计原理 5 2.2 主控模块 5 2.3 芯片模块 8 2.4 矩阵键盘模块 9 2.5 液晶显示模块 10 2.6 继电器驱动模块 12 2.7 蜂鸣器模块 13 2.8 蓝牙模块 14 3 系统软…

一本书讲透ChatGPT,实现从理论到实践的跨越!大模型技术工程师必读

程序员如何选择职业赛道&#xff1f; 文章目录 程序员如何选择职业赛道&#xff1f;前言**作者简介**目录直播预告 前言 OpenAI 在 2022 年 11 月推出了人工智能聊天应用—ChatGPT。它具有广泛的应用场景&#xff0c;在多项专业和学术基准测试中表现出的智力水平&#xff0c;不…

为什么LLMs不适合编程

使用LLMs进行编码的挑战 自制形象 在过去的一年中&#xff0c;大型语言模型&#xff08;LLMs&#xff09;凭借其自然语言理解能力展示出了惊人的能力。这些先进的模型不仅重新定义了自然语言处理的标准&#xff0c;而且还广泛应用于各种应用和服务中。 对使用LLMs进行编码的兴…

精通Docker Compose: 在docker-compose.yml中配置硬件资源访问

精通Docker Compose: 在docker-compose.yml中配置硬件资源访问 引言Docker与硬件资源访问简介为何需要在Docker容器中访问硬件资源Docker如何与硬件资源交互 准备工作安装Docker和Docker Compose确保硬件资源可被主机识别小结 在docker-compose.yml中配置硬件资源访问显卡访问配…

HarmonyOS创建项目和应用—设置数据处理位置

项目和应用介绍 关于项目 项目是资源、应用的组织实体。资源包括服务器、数据库、存储&#xff0c;以及您的应用、终端用户的数据等。在您使用部分服务时&#xff0c;您是数据的控制者&#xff0c;数据将按照您设置的数据处理位置来存储在指定区域。 通常&#xff0c;您不需…

java数据结构与算法刷题-----LeetCode230. 二叉搜索树中第K小的元素

java数据结构与算法刷题目录&#xff08;剑指Offer、LeetCode、ACM&#xff09;-----主目录-----持续更新(进不去说明我没写完)&#xff1a;https://blog.csdn.net/grd_java/article/details/123063846 文章目录 1. 中序遍历2. 预处理 1. 中序遍历 解题思路:时间复杂度O(Hk)其…

鸿蒙Harmony应用开发—ArkTS声明式开发(通用属性:特效绘制合并)

用于对背景模糊等特效进行绘制合并。 说明&#xff1a; 从API Version 10开始支持。后续版本如有新增内容&#xff0c;则采用上角标单独标记该内容的起始版本。 该接口为系统接口。 useEffect useEffect(value: boolean) 用于对背景模糊等特效进行绘制合并。 系统能力&#…

分布式数字身份:通往Web3.0世界的个人钥匙

数字化时代&#xff0c;个人身份已不再仅仅局限于传统形式&#xff0c;分布式数字身份&#xff08;Decentralized Identity&#xff0c;简称DID&#xff09;正崭露头角&#xff0c;它允许个人通过数字签名等加密技术&#xff0c;完全掌握和控制自己的身份信息。研究报告显示&am…

HarmonyOS NEXT应用开发——Navigation开发 页面切换场景范例

简介 在应用开发时&#xff0c;我们常常遇到&#xff0c;需要在应用内多页面跳转场景时中使用Navigation导航组件做统一的页面跳转管理&#xff0c;它提供了一系列属性方法来设置页面的标题栏、工具栏以及菜单栏的各种展示样式。除此之外还拥有动态加载&#xff0c;navPathSta…

Linux 桌面版系统问题分析及CDH 6影响分析,CDH做HA模式配置教程

一、问题 由于之前是第一次安装CentOS系统&#xff0c;基本上是按照网上的教程安装&#xff0c;安装了桌面版&#xff0c;配置默认&#xff0c;在安装CDH6.3后&#xff0c;在使用中发现没有预期的那么好用&#xff0c;然后请教专业人士后&#xff0c;得出以下2个原因 1、桌面…

将圆环区域展开成矩形长条

在机器视觉领域&#xff0c;经常会遇到圆环型的检测目标&#xff0c;比如瓶口&#xff0c;轮胎&#xff0c;橡皮圈等等&#xff0c;比如想检测轮胎上的瑕疵&#xff0c;就可以通过把环形区域展开成矩形形状&#xff0c;然后对胎侧进行瑕疵检测&#xff0c;再比如对圆环扣上的字…