文章http://t.csdnimg.cn/9sS23和http://t.csdnimg.cn/0wa6h分析了rcu的基本实现原理。不过在阅读内核代码的过程中,我们经常能看到函数kfree_rcu()的使用。那么kfree究竟是怎么和rcu联系在一起的呢?
本文分析基于linux内核4.19.195
直接上代码。
/*** kfree_rcu() - kfree an object after a grace period.* @ptr: pointer to kfree* @rcu_head: the name of the struct rcu_head within the type of @ptr.** Many rcu callbacks functions just call kfree() on the base structure.* These functions are trivial, but their size adds up, and furthermore* when they are used in a kernel module, that module must invoke the* high-latency rcu_barrier() function at module-unload time.** The kfree_rcu() function handles this issue. Rather than encoding a* function address in the embedded rcu_head structure, kfree_rcu() instead* encodes the offset of the rcu_head structure within the base structure.* Because the functions are not allowed in the low-order 4096 bytes of* kernel virtual memory, offsets up to 4095 bytes can be accommodated.* If the offset is larger than 4095 bytes, a compile-time error will* be generated in __kfree_rcu(). If this error is triggered, you can* either fall back to use of call_rcu() or rearrange the structure to* position the rcu_head structure into the first 4096 bytes.** Note that the allowable offset might decrease in the future, for example,* to allow something like kmem_cache_free_rcu().** The BUILD_BUG_ON check must not involve any function calls, hence the* checks are done in macros here.*/
#define kfree_rcu(ptr, rcu_head) \__kfree_rcu(&((ptr)->rcu_head), offsetof(typeof(*(ptr)), rcu_head))
注释写的非常清楚,kfree_rcu的作用,就是在一个gp后,将相关对象通过kfree释放掉。
/** Helper macro for kfree_rcu() to prevent argument-expansion eyestrain.*/
#define __kfree_rcu(head, offset) \do { \BUILD_BUG_ON(!__is_kfree_rcu_offset(offset)); \kfree_call_rcu(head, (rcu_callback_t)(unsigned long)(offset)); \} while (0)
/** Queue an RCU callback for lazy invocation after a grace period.* This will likely be later named something like "call_rcu_lazy()",* but this change will require some way of tagging the lazy RCU* callbacks in the list of pending callbacks. Until then, this* function may only be called from __kfree_rcu().*/
void kfree_call_rcu(struct rcu_head *head,rcu_callback_t func)
{__call_rcu(head, func, rcu_state_p, -1, 1);
}
EXPORT_SYMBOL_GPL(kfree_call_rcu);
最后还是通过__call_rcu来实现。
奇怪的是,__kfree_rcu的第二个参数,获取的是offsetof(typeof(*(ptr)), rcu_head),最后将这个值,作为rcu_callback_t类型的变量,传递给了__call_rcu。这个值显然是个偏移的量,作为callback调用的函数肯定会触发系统panic,那么内核是怎么识别并处理这个变量的呢,然后又是怎么使用kfree去将这个变量释放的呢?这里并没有看到有kfree的传入。
调用rcu callback的流程,最终会走到函数__rcu_reclaim,下面来看这个函数的实现
/** Does the specified offset indicate that the corresponding rcu_head* structure can be handled by kfree_rcu()?*/
#define __is_kfree_rcu_offset(offset) ((offset) < 4096)
/** Reclaim the specified callback, either by invoking it (non-lazy case)* or freeing it directly (lazy case). Return true if lazy, false otherwise.*/
static inline bool __rcu_reclaim(const char *rn, struct rcu_head *head)
{unsigned long offset = (unsigned long)head->func;rcu_lock_acquire(&rcu_callback_map);if (__is_kfree_rcu_offset(offset)) {RCU_TRACE(trace_rcu_invoke_kfree_callback(rn, head, offset);)kfree((void *)head - offset);rcu_lock_release(&rcu_callback_map);return true;} else {RCU_TRACE(trace_rcu_invoke_callback(rn, head);)head->func(head);rcu_lock_release(&rcu_callback_map);return false;}
}
原来在函数__rcu_reclaim进行了处理,如果__is_kfree_rcu_offset(offset)返回true,则会调用kfree,将相关变量释放,否则,需要将这个“offset”当做一个函数指针进行调用,从而触发相关的资源回收。
那么问题来了,为什么内核要专门弄个if else来实现这个kfree_rcu,而不是在kfree_rcu中,通过传入kfree作为callback function?这样做岂不是更优雅,也能够去掉一个分支判断的损耗?
其实,内核也确实节省不了这个if else的分支判断。原因在于,rcu callback链表里面,连着的都是struct rcu_head的对象,但是kfree释放的对象的地址,无法直接等同于这个struct rcu_head的对象的地址;内核中很多结构是通过包含struct rcu_head来实现rcu相关逻辑的。所以,上述所谓的“优雅”的方法,会导致kfree无法获取到被释放对象的正确地址。从而,内核只能通过这种不优雅的方式,实现kfree_rcu。