This article explains learning path for large language models (LLMs) interview preparation. You will find below details in this article:
本文介绍了大型语言模型 (LLM) 面试准备的学习路径。您将在本文中找到以下详细信息:

  • Road map路线图
  • Prompt engineering & basics of LLM提示工程和LLM基础知识
  • Retrieval augmented generation (RAG)检索增强生成 (RAG)
  • Chunking strategies分块策略
  • Embedding Models嵌入模型
  • Internal working of vector DB矢量数据库的内部工作
  • Advanced search algorithms高级搜索算法
  • Language models internal working语言模型内部工作
  • Supervised fine-tuning of LLMLLM 的监督微调
  • Preference Alignment (RLHF/DPO)
    首选项对齐 (RLHF/DPO)
  • Evaluation of LLM systemLLM 系统的评估
  • Hallucination control techniques幻觉控制技术
  • Deployment of LLM部署 LLM
  • Agent-based system基于代理的系统
  • Prompt Hacking提示黑客攻击
  • Case Study & Scenario-based question案例研究和基于情景的问题


Roadmap for large language models (LLMs) interview preparation


Prompt engineering & basics of LLM提示工程和LLM基础知识

  • Question 1 : What is the difference between Predictive/ Discriminative AI and generative AI?
    问题 1:预测性/判别性 AI 和生成式 AI 有什么区别?
  • Question 2: What is LLM & how LLMs are trained?
  • Question 3: What is a token in the language model?
    问题 3:语言模型中的令牌是什么?
  • Question 4: How to estimate the cost of running a SaaS-based & Open source LLM model?
  • Question 5: Explain the Temperature parameter and how to set it.
    问题 5:解释 Temperature 参数以及如何设置它。
  • Question 6: What are different decoding strategies for picking output tokens?
    问题 6:选择输出 Token 有哪些不同的解码策略?
  • Question 7: What are the different ways you can define stopping criteria in a large language model?
    问题 7:在大型语言模型中定义停止条件的方法有哪些?
  • Question 8: How to use stop sequence in LLMs?
    问题 8:如何在 LLM 中使用停止序列?
  • Question 9: Explain the basic structure of prompt engineering.
    问题 9:解释提示工程的基本结构。
  • Question 10: Explain the type of prompt engineering
    问题 10:解释提示工程的类型
  • **Question 11: **Explain In-Context Learning问题 11:解释情境学习
  • Question 12: What are some of the aspects to keep in mind while using few-shots prompting?
    问题 12:使用小镜头提示时需要记住哪些方面?
  • Question 13: What are certain strategies to write good prompts?
    问题 13:写好提示的策略是什么?
  • Question 14: What is hallucination & how can it be controlled using prompt engineering?
  • Question 15: How do I improve the reasoning ability of my LLM through prompt engineering?
    问题 15:如何通过提示工程提高我的 LLM 的推理能力?
  • Question 16: How to improve LLM reasoning if your COT prompt fails?
    问题 16:如果您的 COT 提示失败,如何改进 LLM 推理?

Retrieval augmented generation (RAG)检索增强生成 (RAG)

  • Question 1: How to increase accuracy, and reliability & make answers verifiable in LLM?
  • **Question 2: **How does Retrieval augmented generation (RAG) work?
    问题 2:检索增强生成 (RAG) 的工作原理是什么?
  • Question 3: What are some of the benefits of using the RAG system?
    问题 3:使用 RAG 系统有哪些好处?
  • Question 4: What are the architecture patterns you see when you want to customize your LLM with proprietary data?
    问题 4:当您想使用专有数据自定义 LLM 时,您会看到哪些架构模式?
  • Question 5: When should I use Fine-tuning instead of RAG?
    问题 5:我什么时候应该使用 Fine-tuning 而不是 RAG?

Chunking strategies分块策略

  • Question 1: What is chunking and why do we chunk our data?
    问题 1:什么是分块,为什么我们要对数据进行分块?
  • Question 2: What are factors influences chunk size?
    问题 2:影响块大小的因素有哪些?
  • Question 3: What are the different types of chunking methods available?
    问题 3:有哪些不同类型的分块方法可用?
  • Question 4: How to find the ideal chunk size?
    问题 4:如何找到理想的块大小?

Embedding Models嵌入模型

  • Question 1: What are vector embeddings? And what is an embedding model?
    问题 1:什么是向量嵌入?什么是嵌入模型?
  • Question 2: How embedding model is used in the context of LLM application?
    问题 2:在 LLM 应用程序的上下文中如何使用嵌入模型?
  • Question 3: What is the difference between embedding short and long content?
    问题 3:嵌入短内容和长内容有什么区别?
  • Question 4: How to benchmark embedding models on your data?
    问题 4:如何对数据进行嵌入模型基准测试?
  • Question 5: Walk me through the steps of improving the sentence transformer model used for embedding
    问题 5:请向我介绍改进用于嵌入的句子转换模型的步骤

Internal working of vector DB矢量数据库的内部工作

  • Question 1: What is vector DB?
    问题 1:什么是 vector DB?
  • Question 2: How vector DB is different from traditional databases?
    问题 2:矢量数据库与传统数据库有何不同?
  • Question 3: How does a vector database work?
    问题 3:矢量数据库如何工作?
  • Question 4: Explain the difference between vector index, vector DB & vector plugins.
  • Question 5: What are different vector search strategies?
    问题 5:有哪些不同的向量搜索策略?
  • Question 6: How does clustering reduce search space? When does it fail and how can we mitigate these failures?
    问题 6:聚类如何减少搜索空间?它何时会失败,我们如何减轻这些失败?
  • Question 7: Explain the Random projection index.
    问题 7:解释随机投影指数。
  • Question 8: Explain the Localitysensitive hashing (LHS) indexing method?
    问题 8:解释 Localitysensitive 哈希 (LHS) 索引方法?
  • Question 9: Explain the product quantization (PQ) indexing method
    问题 9:解释乘积量化 (PQ) 标定方法
  • Question 10: Compare different Vector indexes and given a scenario, which vector index you would use for a project?
    问题 10:比较不同的 Vector 索引,并在给定一个场景中,您将为项目使用哪个 Vector 索引?
  • Question 11: How would you decide on ideal search similarity metrics for the use case?
    问题 11:您将如何为用例确定理想的搜索相似度指标?
  • Question 12: Explain the different types and challenges associated with filtering in vector DB.
    问题 12:解释与矢量 DB 中的过滤相关的不同类型和挑战。
  • Question 13: How do you determine the best vector database for your needs?
    问题 13:您如何确定最适合您需求的向量数据库?

Advanced search algorithms高级搜索算法

  • Question 1: Why it’s important to have very good search
    问题 1:为什么拥有非常好的搜索很重要
  • Question 2: What are the architecture patterns for information retrieval & semantic search, and their use cases?
  • Question 3: How can you achieve efficient and accurate search results in large scale datasets?
    问题 3:如何在大规模数据集中获得高效准确的搜索结果?
  • Question 4: Explain the keyword-based retrieval method
    问题 4:解释基于关键字的检索方法
  • Question 5: How to fine-tune re-ranking models?
    问题 5:如何微调重新排名模型?
  • Question 6: Explain most common metric used in information retrieval and when it fails?
    问题 6:解释信息检索中最常用的指标以及何时失败?
  • Question 7: I have a recommendation system, which metric should I use to evaluate the system?
    问题 7:我有一个推荐系统,我应该使用哪个指标来评估该系统?
  • Question 8: Compare different information retrieval metrics and which one to use when?
    问题 8:比较不同的信息检索指标以及何时使用哪一个?

Language models internal working语言模型内部工作

  • Question 1: Detailed understanding of the concept of selfattention
    问题 1:详细了解自我注意的概念
  • Question 2: Overcoming the disadvantages of the self-attention mechanism
    问题 2:克服自我注意机制的缺点
  • Question 3: Understanding positional encoding问题 3:了解位置编码
  • Question 4: Detailed explanation of Transformer architecture
    问题 4:Transformer 架构详解
  • Question 5: Advantages of using a transformer instead of LSTM.
    问题 5:使用变压器代替 LSTM 的优势。
  • Question 6: Difference between local attention and global attention
    问题 6:本地关注度与全球关注度的差异
  • Question 7: Understanding the computational and memory demands of transformers
    问题 7:了解 transformer 的计算和内存需求
  • Question 8: Increasing the context length of an LLM.
    问题 8:增加 LLM 的上下文长度。
  • Question 9: How to Optimizing transformer architecture for large vocabularies
    问题 9:如何优化大词汇表的 transformer 架构
  • Question 10: What is a mixture of expert models?
    问题 10:什么是专家模型的混合?

Supervised finetuning of LLMLLM 的监督微调

  • **Question 1: **What is finetuning and why it’ s needed in LLM?
    问题 1:什么是微调,为什么在 LLM 中需要它?
  • **Question 2: **Which scenario do we need to finetune LLM?
    问题 2:我们需要在哪种情况下微调 LLM?
  • **Question 3: **How to make the decision of finetuning?
    问题 3:如何做出微调的决定?
  • Question 4: How do you create a fine-tuning dataset for Q&A?
    问题 4:如何为 Q&A 创建微调数据集?
  • Question 5: How do you improve the model to answer only if there is sufficient context for doing so?
    问题 5:你如何改进模型,只有在有足够的上下文的情况下才能回答?
  • Question 6: How to set hyperparameter for fine-tuning
    问题 6:如何设置超参数进行微调
  • Question 7: How to estimate infra requirements for fine-tuning LLM?
    问题 7:如何估计微调 LLM 的基础设施需求?
  • Question 8: How do you finetune LLM on consumer hardware?
    问题 8:如何在消费类硬件上微调 LLM?
  • Question 9: What are the different categories of the PEFT method?
    问题 9:PEFT 方法有哪些不同的类别?
  • Question 10: Explain different reparameterized methods for finetuning LLM?
    问题 10:解释微调 LLM 的不同重新参数化方法?
  • Question 11: What is catastrophic forgetting in the context of LLMs?
    问题 11:在 LLM 的背景下,什么是灾难性遗忘?

Preference Alignment (RLHF/DPO)

首选项对齐 (RLHF/DPO)

  • **Question 1: **At which stage you will decide to go for the Preference alignment type of method rather than SFT?
    问题 1:在哪个阶段,您将决定使用 Preference alignment 类型的方法而不是 SFT?
  • Question 2: Explain Different Preference Alignment Methods?
    问题 2:解释不同的首选项对齐方法?
  • Question 3: What is RLHF, and how is it used?
    问题 3:什么是 RLHF,如何使用它?
  • Question 4: Explain the reward hacking issue in RLHF.
    问题 4:解释 RLHF 中的奖励黑客问题。

Evaluation of LLM systemLLM 系统的评估

  • Question 1: How do you evaluate the best LLM model for your use case?
    问题 1:您如何评估最适合您的用例的 LLM 模型?
  • Question 2: How to evaluate the RAG-based system?
    问题 2:如何评估基于 RAG 的系统?
  • Question 3: What are the different metrics that can be used to evaluate LLM
    问题 3:有哪些不同的指标可用于评估 LLM
  • Question 4: Explain the Chain of verification问题 4:解释验证链

Hallucination control techniques幻觉控制技术

  • Question 1: What are the different forms of hallucinations?
    问题 1:幻觉有哪些不同形式?
  • Question 2: How do you control hallucinations at different levels?
    问题 2:你如何控制不同层次的幻觉?

Deployment of LLM部署 LLM

  • Question 1: Why does quantization not decrease the accuracy of LLM?
    问题 1:为什么量化不会降低 LLM 的准确性?

Agent-based system基于代理的系统

  • Question 1: Explain the basic concepts of an agent and the types of strategies available to implement agents.
    问题 1:解释代理的基本概念以及可用于实施代理的策略类型。
  • Question 2: Why do we need agents and what are some common strategies to implement agents?
    问题 2:为什么我们需要代理,实施代理的常见策略有哪些?
  • Question 3: Explain ReAct prompting with a code example and its advantages
    问题 3:通过代码示例解释 ReAct 提示及其优势
  • Question 4: Explain Plan and Execute prompting strategy
    问题 4:解释 Plan 和 Execute 提示策略
  • Question 5: Explain OpenAI functions with code examples
    问题 5:通过代码示例解释 OpenAI 函数
  • Question 6: Explain the difference between OpenAI functions vs LangChain Agents.
    问题 6:解释 OpenAI 函数与 LangChain 代理之间的区别。

Prompt Hacking提示黑客攻击

  • Question 1: What is prompt hacking and why should we bother about it?
    问题 1:什么是即时黑客攻击,我们为什么要为此烦恼?
  • Question 2: What are the different types of prompt hacking?
    问题 2:提示黑客攻击有哪些不同类型?
  • Question 3: What are the different defense tactics from prompt hacking?
    问题 3:与快速黑客攻击有哪些不同的防御策略?

Case study & scenario-based Question案例研究和基于情景的问题

  • **Question 1: **How to optimize the cost of the overall LLM System?
    问题 1:如何优化整个 LLM 系统的成本?







