Inferencing with AI2’s OLMo model on AMD GPU — ROCm Blogs
2024 年 4 月 17 日,作者:Douglas Jia.
在这篇博客中,我们将向您展示如何在 AMD GPU 上使用 AI2 的 OLMo 模型生成文本。
简介
由艾伦人工智能研究所(Allen Institute for AI)开发的 OLMo(开放语言模型,Open Language Model)在生成式 AI 领域具有重要意义。它是一种真正开放的大型语言模型(LLM)和框架,旨在提供对其预训练数据、训练代码、模型权重和评估套件的完全访问权限。这种对开放性的承诺在 LLM 领域树立了新的先例,使学术界和研究人员能够共同研究和推进语言模型领域。这种开放方法有望在生成式 AI 方面带来创新和发展。
OLMo 遵循经典的仅解码器 Transformer 架构,这种架构在很多 GPT 风格的模型中都被使用。它在主要基准测试中的性能与同等规模的其他流行模型相匹配或超出其表现。关于其架构和性能评估的更多详细信息,请参考OLMo: Accelerating the Science of Language Models.
值得注意的是,OLMo 团队在其模型的预训练过程中同时使用了 AMD MI250X GPU 和 Nvidia A100 GPU 进行了性能比较。他们的研究,加上 Databricks 团队进行的两个独立调查:Training LLMs with AMD MI250 GPUs and MosaicML 和Training LLMs at Scale with AMD MI250 GPUs, 提供了 AMD 和 Nvidia GPU 性能的全面第三方比较。
这里表达的观点不代表 AMD 的官方观点,且未得到 AMD 的认可。
实现
本文示例代码在以下环境中测试:ROCm 6.0、Ubuntu 20.04、Python 3.9 和 PyTorch 2.1.1。有关支持的 GPU 和操作系统的列表,请参阅 此页面。为了方便和稳定,我们建议您直接在 Linux 系统中使用以下代码拉取并运行 rocm/pytorch
Docker:
docker run -it --ipc=host --network=host --device=/dev/kfd --device=/dev/dri \--group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \--name=olmo rocm/pytorch:rocm6.0_ubuntu20.04_py3.9_pytorch_2.1.1 /bin/bash
进入 Docker 容器后,需要安装所需的包:
pip install transformers ai2-olmo
然后我们将在 Python 控制台中运行以下代码。首先,我们需要检查 PyTorch 是否可以检测到系统上的 GPU。以下代码块将显示系统上的 GPU 设备数量。
import torch torch.cuda.device_count()
8
在下面的代码块中,我们将实例化使用 OLMo-7B 模型的推理管道。请注意,OLMo 模型有不同的大小:1B、7B 和 65B。
import hf_olmo from transformers import pipeline # Default device is CPU; device>=0 is setting the device to a GPU. olmo_pipe = pipeline("text-generation", model="allenai/OLMo-7B", device=0)
接下来,提供文本提示,生成并打印出模型的输出。
output = olmo_pipe("Language modeling is ", max_new_tokens=100) print(output[0]['generated_text'])
Language modeling is a branch of natural language processing that aims to understand the meaning of words and sentences. It is a subfield of computational linguistics. The goal of natural language modeling is to build a model of language that can be used to predict the next word in a sentence. This can be used to improve the accuracy of machine translation, to improve the performance of speech recognition systems, and to improve the performance of
您还可以输入多个提示,在一次运行中生成多个响应。
input = ["Deep learning is the subject that", "There are a lot of attractions in New York", "Why the sky is blue"] output = olmo_pipe(input, max_new_tokens=100) print(*[i[0]['generated_text'] for i in output], sep='\n\n************************\n\n')
Deep learning is the subject that is being studied by the researchers. It is a branch of machine learning that is used to create artificial neural networks. It is a subset of deep learning that is used to create artificial neural networks. It is a subset of deep learning that is used to create artificial neural networks. It is a subset of deep learning that is used to create artificial neural networks. It is a subset of deep learning that is used to create artificial neural networks. It is a subset of deep learning that is used to create artificial************************There are a lot of attractions in New York City, but the most popular ones are the Statue of Liberty, the Empire State Building, and the Brooklyn Bridge. The Statue of Liberty is a symbol of freedom and democracy. It was a gift from France to the United States in 1886. The statue is made of copper and stands on Liberty Island in New York Harbor. The Empire State Building is the tallest building in the world. It was built in 1931 and stands 1,454 feet tall. The building has 102 floors and************************Why the sky is blue? Why the grass is green? Why the sun shines? Why the moon shines? Why the stars shine? Why the birds sing? Why the flowers bloom? Why the trees grow? Why the rivers flow? Why the mountains stand? Why the seas are blue? Why the oceans are blue? Why the stars are blue? Why the stars are white? Why the stars are red? Why the stars are yellow?
您可能注意到,上述生成的文本可能非常重复。例如,第一个响应多次重复了“它是用于创建人工神经网络的深度学习的一个子集”;第三个响应以“为什么 xxx 是 xxx?”的模式多次重复。这是为什么呢?因为管道的默认解码策略是贪心搜索(greedy search),它选择概率最高的下一个Token。虽然这种策略在许多任务和小输出量情况下非常有效,但在生成较长输出时可能会导致重复的结果。接下来,我们将采用其他解码策略来缓解这个问题。如果您对这个主题感兴趣,可以参考来自 Hugging Face 的 教程)。
在接下来的代码块中,我们将演示如何使用 Top-K 和 Top-P 采样策略优化文本生成。典型的方法是使用 Top-K 采样将潜在 Token 缩小到最有可能的 K 个选项,然后在此子集中应用 Top-P 采样来选择累积达到概率阈值 P 的 Token。这个过程平衡了选择高概率 Token(Top-K)和在置信水平内确保多样性(Top-P)。您也可以单独使用这些策略。
output = olmo_pipe(input, max_new_tokens=100, do_sample=True, top_k=40, top_p=0.95) print(*[i[0]['generated_text'] for i in output], sep='\n\n************************\n\n')
Deep learning is the subject that deals with Artificial intelligence and machine learning. In the context of artificial intelligence, Deep learning is an emerging technology that is based on artificial neural networks. It is used in almost all fields of AI such as robotics, language translation, computer vision, and others. This technology is used in computer vision for automatic image processing and recognition tasks. It is also used for image classification, speech recognition, and text translation. With the increasing demand for artificial intelligence, the use of deep learning has also been************************There are a lot of attractions in New York, such as Central Park and the Brooklyn Bridge. Visiting all of these places would be quite overwhelming, so we recommend starting with the ones that you find the most interesting. The best attractions for teens are Times Square, the Statue of Liberty, The Empire State Building, Central Park, and the Brooklyn Bridge. New York City is a very busy city, so it can be challenging for a teenager to get from one place to another. This is why we recommend using public transportation, which************************Why the sky is blue" - it is a question that has been puzzling philosophers and scientists since time began. But the world's top physicist has unveiled the secret to the colour and says he "loves" being asked about it as it has fascinated him throughout his career. Prof Stephen Hawking, 74, of Cambridge University, said blue appears in the sky because it takes the longest wavelength of sunlight, blue, to reach the earth after it passes through the atmosphere. He added that sunlight in the sky
生成的输出有了明显的改进,更加自然,不再那么重复。然而,请注意这些回应可能不完全准确,因为它们完全基于训练的模型生成,并没有进行事实核查。在我们未来的博客中,我们将探索改进回应准确性的方法。敬请期待!