LazyGraphRAG测试结果如下
数据:
curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./ragtest/input/book.txt
失败了
气死我也!!!对deepseek-V3也不是很友好啊,我没钱prompt 微调啊,晕死
将模型从deepseek切换为豆包后成功!
明日继续研究更新
错误log:
主要是ds的json遵循能力还是有点弱啊
The above exception was the direct cause of the following exception:Traceback (most recent call last):File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/fnllm/base/base_llm.py", line 144, in __call__return await self._decorated_target(prompt, **kwargs)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/fnllm/base/services/json.py", line 77, in invokereturn await this.invoke_json(delegate, prompt, kwargs)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/fnllm/base/services/json.py", line 100, in invoke_jsonraise FailedToGenerateValidJsonError from error
fnllm.base.services.errors.FailedToGenerateValidJsonErrorDuring handling of the above exception, another exception occurred:Traceback (most recent call last):File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/graphrag/index/operations/summarize_communities/community_reports_extractor.py", line 80, in __call__response = await self._model.achat(File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/graphrag/language_model/providers/fnllm/models.py", line 81, in achatresponse = await self.model(prompt, **kwargs)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/fnllm/openai/llm/openai_chat_llm.py", line 94, in __call__return await self._text_chat_llm(prompt, **kwargs)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/fnllm/openai/services/openai_tools_parsing.py", line 130, in __call__return await self._delegate(prompt, **kwargs)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/fnllm/base/base_llm.py", line 148, in __call__await self._events.on_error(File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/graphrag/language_model/providers/fnllm/events.py", line 26, in on_errorself._on_error(error, traceback, arguments)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/graphrag/language_model/providers/fnllm/utils.py", line 45, in on_errorcallbacks.error("Error Invoking LLM", error, stack, details)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/graphrag/callbacks/workflow_callbacks_manager.py", line 64, in errorcallback.error(message, cause, stack, details)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/site-packages/graphrag/callbacks/file_workflow_callbacks.py", line 37, in errorjson.dumps(File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/json/__init__.py", line 238, in dumps**kw).encode(obj)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/json/encoder.py", line 201, in encodechunks = list(chunks)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/json/encoder.py", line 431, in _iterencodeyield from _iterencode_dict(o, _current_indent_level)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/json/encoder.py", line 405, in _iterencode_dictyield from chunksFile "/home/zli/miniconda3/envs/graphrag/lib/python3.10/json/encoder.py", line 405, in _iterencode_dictyield from chunksFile "/home/zli/miniconda3/envs/graphrag/lib/python3.10/json/encoder.py", line 405, in _iterencode_dictyield from chunksFile "/home/zli/miniconda3/envs/graphrag/lib/python3.10/json/encoder.py", line 438, in _iterencodeo = _default(o)File "/home/zli/miniconda3/envs/graphrag/lib/python3.10/json/encoder.py", line 179, in defaultraise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ModelMetaclass is not JSON serializable
22:53:43,292 graphrag.callbacks.file_workflow_callbacks INFO Community Report Extraction Error details=None
22:53:43,293 graphrag.index.operations.summarize_communities.strategies WARNING No report found for community: 8.0
配置如下:
models:default_chat_model:type: openai_chat # or azure_openai_chatapi_base: https://ark.cn-beijing.volces.com/api/v3/# api_version: 2024-05-01-previewauth_type: api_key # or azure_managed_identityapi_key: ${GRAPHRAG_API_KEY} # set this in the generated .env file# audience: "https://cognitiveservices.azure.com/.default"# organization: <organization_id>model: deepseek-v3-241226# deployment_name: <azure_model_deployment_name>encoding_model: cl100k_base # automatically set by tiktoken if left undefinedmodel_supports_json: true # recommended if this is available for your model.concurrent_requests: 25 # max number of simultaneous LLM requests allowedasync_mode: threaded # or asyncioretry_strategy: nativemax_retries: -1 # set to -1 for dynamic retry logic (most optimal setting based on server response)tokens_per_minute: 0 # set to 0 to disable rate limitingrequests_per_minute: 0 # set to 0 to disable rate limitingdefault_embedding_model:type: openai_embedding # or azure_openai_embeddingapi_base: http://localhost:11434/v1/# api_version: 2024-05-01-preview#auth_type: api_key # or azure_managed_identity#type: openai_chatapi_key: ollama# audience: "https://cognitiveservices.azure.com/.default"# organization: <organization_id>model: bge-m3# deployment_name: <azure_model_deployment_name>encoding_model: cl100k_base # automatically set by tiktoken if left undefinedmodel_supports_json: true # recommended if this is available for your model.concurrent_requests: 25 # max number of simultaneous LLM requests allowedasync_mode: threaded # or asyncioretry_strategy: nativemax_retries: -1 # set to -1 for dynamic retry logic (most optimal setting based on server response)tokens_per_minute: 0 # set to 0 to disable rate limitingrequests_per_minute: 0 # set to 0 to disable rate limitingvector_store:default_vector_store:type: lancedbdb_uri: output/lancedbcontainer_name: defaultoverwrite: Trueembed_text:model_id: default_embedding_modelvector_store_id: default_vector_store### Input settings ###input:type: file # or blobfile_type: text #[csv, text, json]base_dir: "input"chunks:size: 1200overlap: 100group_by_columns: [id]### Output settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be providedcache:type: file # [file, blob, cosmosdb]base_dir: "cache"reporting:type: file # [file, blob, cosmosdb]base_dir: "logs"output:type: file # [file, blob, cosmosdb]base_dir: "output"### Workflow settings ####extract_graph:
# model_id: default_chat_model
# prompt: "prompts/extract_graph.txt"
# entity_types: [organization,person,geo,event]
# max_gleanings: 1summarize_descriptions:model_id: default_chat_modelprompt: "prompts/summarize_descriptions.txt"max_length: 500extract_graph_nlp:text_analyzer:extractor_type: regex_english # [regex_english, syntactic_parser, cfg]extract_claims:enabled: falsemodel_id: default_chat_modelprompt: "prompts/extract_claims.txt"description: "Any claims or facts that could be relevant to information discovery."max_gleanings: 1community_reports:model_id: default_chat_modelgraph_prompt: "prompts/community_report_graph.txt"text_prompt: "prompts/community_report_text.txt"max_length: 8000max_input_length: 4000cluster_graph:max_cluster_size: 10embed_graph:enabled: false # if true, will generate node2vec embeddings for nodesumap:enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)snapshots:graphml: falseembeddings: false### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#querylocal_search:chat_model_id: default_chat_modelembedding_model_id: default_embedding_modelprompt: "prompts/local_search_system_prompt.txt"global_search:chat_model_id: default_chat_modelmap_prompt: "prompts/global_search_map_system_prompt.txt"reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"drift_search:chat_model_id: default_chat_modelembedding_model_id: default_embedding_modelprompt: "prompts/drift_search_system_prompt.txt"reduce_prompt: "prompts/drift_search_reduce_prompt.txt"basic_search:chat_model_id: default_chat_modelembedding_model_id: default_embedding_modelprompt: "prompts/basic_search_system_prompt.txt"