0、概述
通过该案例,应用Cypher查询语言,感受Neo4j套路。官方的用此案例的用意:
The Northwind Graph demonstrates how to migrate(迁移) from a relational database to Neo4j(把一个负责的多表关系数据库关系等价转化为图数据中,并且查询表现出巨大的优势). The transformation is iterative and deliberate, emphasizing the conceptual shift from relational tables to the nodes and relationships of a graph.
This guide will show you how to:
- Load: create data from external CSV files(加载数据)
- Index: index nodes based on label
- Relate: transform foreign key references into data relationships
- Promote: transform join records into relationships
1、找到位置
:play start
弹出如下界面:
点击中间的“Jump into code”,进入到如下界面:
这里有Movie Graph和Northwind Graph两个案例,这里演示第二个案例。
2、 Load: create data from external products CSV files
LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/products.csv" AS row
CREATE (n:Product)
SET n = row,
n.unitPrice = toFloat(row.unitPrice),
n.unitsInStock = toInteger(row.unitsInStock), n.unitsOnOrder = toInteger(row.unitsOnOrder),
n.reorderLevel = toInteger(row.reorderLevel), n.discontinued = (row.discontinued <> "0")-------------------------------------------------------------------------
1、LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/products.csv" AS row return row,这个row到底是什么呢?
{"reorderLevel": "10","unitsInStock": "39","unitPrice": "18.00","supplierID": "1","productID": "1","discontinued": "0","quantityPerUnit": "10 boxes x 20 bags","categoryID": "1","unitsOnOrder": "0","productName": "Chai"
}
{"reorderLevel": "25","unitsInStock": "17","unitPrice": "19.00","supplierID": "1","productID": "2","discontinued": "0","quantityPerUnit": "24 - 12 oz bottles","categoryID": "1","unitsOnOrder": "40","productName": "Chang"
}
.....(后面还有)
2、SET n = row 是建立了77 labels,对应到关系数据库就是用上面的属性结构和数据建立了77条纪律;
3、n.unitPrice = toFloat(row.unitPrice), 有些属性要用Neo4j的字段类型进行强制说明,以免出错;
3、 Load: create data from external categories CSV files
LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/categories.csv" AS row
CREATE (n:Category)
SET n = row
4、 Load: create data from external suppliers CSV files
LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/suppliers.csv" AS row
CREATE (n:Supplier)
SET n = row
5、Create indexes
# 以下三个分开执行
CREATE INDEX ON :Product(productID)
CREATE INDEX ON :Category(categoryID)
CREATE INDEX ON :Supplier(supplierID)
6、Create data relationships
# Product和Category建立PART_OF关系,一个Category可以有多个Product,
# 这个是很自然的隶属关系(PART_OF),注意where的这种查询方式
MATCH (p:Product),(c:Category)
WHERE p.categoryID = c.categoryID
CREATE (p)-[:PART_OF]->(c)# 查询所建立的PART_OF关系
MATCH p=()-[r:PART_OF]->() RETURN p LIMIT 80000
形成的图关系如下所示,里面显示的数量可以简单分析一下,所有的Product(77)都找到了隶属的类,一共隶属8个Category(8),所以下图中共有8组图,共有77+8=85(nodes)。
# 建立Products和Supplier之间的关系
MATCH (p:Product),(s:Supplier)
WHERE p.supplierID = s.supplierID
CREATE (s)-[:SUPPLIES]->(p)# 查询关系如下图
MATCH p=()-[r:SUPPLIES]->() RETURN p LIMIT 2555
7、Query using patterns
# 查询关联上的nodes
MATCH (s:Supplier)-->(:Product)-->(c:Category)
RETURN s.companyName as Company, collect(distinct c.categoryName) as Categories# collect(distinct c.categoryName)是做了去重处理,一个s:Supplier可以
# 有很多Product,每一个Product只对应一个c:Category,所有一个s:Supplier
# 的两件不同的Product可能对应相同的c:Category
查询结果如下:
这样的查询时不直观的,既然建立了三种节点之间的两种关系,那么在图谱中是什么样子呢?
# 查询PART_OF和SUPPLIES两种关系的节点
MATCH p=()-[r1:PART_OF]-()-[r2:SUPPLIES]-() RETURN p LIMIT 2500
# 一个s:Supplier可以对应多个Product
# 一个Product属于一个categoryName
# 查询可以提供Produce类别Product的s:Supplier,返回结果去重
MATCH (c:Category {categoryName:"Produce"})<--(:Product)<--(s:Supplier)
RETURN DISTINCT s.companyName as ProduceSuppliers
8、 Load: create data from external Customer、Orders CSV files
LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/customers.csv" AS row
CREATE (n:Customer)
SET n = rowLOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/orders.csv" AS row
CREATE (n:Order)
SET n = rowCREATE INDEX ON :Customer(customerID)CREATE INDEX ON :Order(orderID)
9、Create data relationships
MATCH (c:Customer),(o:Order)
WHERE c.customerID = o.customerID
CREATE (c)-[:PURCHASED]->(o)MATCH p=()-[r:PURCHASED]->() RETURN p LIMIT 25
10、 Load: create data from external order-details CSV files
# 到这一步实体对象就多了,之前有:
# Sublier-[:SUBLIES]->Product-[:PART_OF]->Category
# Customer-[:PURCHASED]->Order
# 上面两种图之间是不联通的,下面还要建立Order-[details:ORDERS]->p:Product)
# 这样所有的实体都联通了,注意多次执行不会覆盖,会建立重名ORDERS关系,本质上是两个系统id
LOAD CSV WITH HEADERS FROM "http://data.neo4j.com/northwind/order-details.csv" AS row
MATCH (p:Product), (o:Order)
WHERE p.productID = row.productID AND o.orderID = row.orderID
CREATE (o)-[details:ORDERS]->(p)
SET details = row,
details.quantity = toInteger(row.quantity)
11、Query using patterns
# 图数据库的匹配是一种结构匹配,而不是属性值匹配
# 譬如(cust:Customer)-[:PURCHASED]->(:Order)-[o:ORDERS]->(p:Product)-
# [:PART_OF]->(c:Category {categoryName:"Produce"})就是一种结构
# 下面的意思是说:找到买Produce类的Customer的名字,并且计算该用户订单上
# 所有Product价格的总和,这在电商中是非常有意义的
# 和关系数据库相比,这就体现了图数据库多个表链接查询的重大优势
MATCH (cust:Customer)-[:PURCHASED]->(:Order)-[o:ORDERS]->(p:Product)-[:PART_OF]->(c:Category {categoryName:"Produce"})
RETURN DISTINCT cust.contactName as CustomerName, SUM(o.quantity) AS TotalProductsPurchased# 官网给出的是如下的方式,和上面等价的
MATCH (cust:Customer)-[:PURCHASED]->(:Order)-[o:ORDERS]->(p:Product),(p)-[:PART_OF]->(c:Category {categoryName:"Produce"})
RETURN DISTINCT cust.contactName as CustomerName, SUM(o.quantity) AS TotalProductsPurchased
12、我们的图谱长什么样子呢?
# 图关系要拆分成具有线性关系的r1,r2,链接起立展示
MATCH r1=(cust:Customer)-[:PURCHASED]->(:Order)-[o:ORDERS]->(p:Product),r2=(sup:Supplier)-[:SUPPLIES]-(p)-[:PART_OF]->(c:Category {categoryName:"Produce"})
RETURN r1, r2 limit 1