缓存
缓存行:
缓存行越大,局部性空间效率越高,但读取时间慢
缓存行越小,局部性空间效率越低,但读取时间快
取一个折中值,目前多用:
64字节
public class CacheLinePadding { //执行时间在4s左右
public volatile static long[] arr=new long[2];
public static void main(String[] args) throws Exception{
Thread t1=new Thread(()->{
for (long i=0;i< 10_0000_0000L;i++){
arr[0]=i;
}
});
Thread t2=new Thread(()->{
for (long i=0;i< 10_0000_0000L;i++){
arr[1]=i;
}
});
t1.start();
t2.start();
final long start = System.nanoTime();
t1.join();
t2.join();
final long end = System.nanoTime();
System.out.println((end-start)/1000000);
}
}
public class T02_CacheLinePadding { //执行在2s左右
public volatile static long[] arr=new long[16];
public static void main(String[] args) throws Exception{
Thread t1=new Thread(()->{
for (long i=0;i< 10_0000_0000L;i++){
arr[0]=i;
}
});
Thread t2=new Thread(()->{
for (long i=0;i< 10_0000_0000L;i++){
arr[8]=i;
}
});
t1.start();
t2.start();
final long start = System.nanoTime();
t1.join();
t2.join();
final long end = System.nanoTime();
System.out.println((end-start)/1000000);
}
}
缓存行对齐:对于有些特别敏感的数字,会存在线程高竞争的访问,为了保证不发生伪共享,可以使用缓存航对齐的编程方式
JDK7中,很多采用long padding提高效率
eg:
JDK8,加入了@Contended注解(实验)需要加上:JVM -XX:-RestrictContended
public class T03_CacheLinePading {
@Contended
volatile long x;
@Contended
volatile long y;
public static void main(String[] args) throws Exception{ //0.6s
T03_CacheLinePading t3=new T03_CacheLinePading();
Thread t1=new Thread(()->{
for (long i=0;i< 1_0000_0000L;i++){
t3.x=i;
}
});
Thread t2=new Thread(()->{
for (long i=0;i< 1_0000_0000L;i++){
t3.y=i;
}
});
t1.start();
t2.start();
final long start = System.nanoTime();
t1.join();
t2.join();
final long end = System.nanoTime();
System.out.println((end-start)/1000000);
}
}
乱序执行
/**
* CPU的乱序执行
*/
public class DisorderTest {
private static int x=0,y=0;
private static int a=0,b=0;
//第44448次 (0,0)
public static void main(String[] args) throws InterruptedException {
int i = 0;
for (; ; ) {
i++;
x = 0;
y = 0;
a = 0;
b = 0;
Thread one = new Thread(new Runnable() {
public void run() {
//由于线程one先启动,下面这句话让它等一等线程two. 读着可根据自己电脑的实际性能适当调整等待时间.
shortWait(100000);
a = 1;
x = b;
}
});
Thread other = new Thread(new Runnable() {
public void run() {
b = 1;
y = a;
}
});
one.start();
other.start();
one.join();
other.join();
String result = "第" + i + "次(" + x + "," + y + ")";
if (x == 0 && y == 0) {
System.err.println(result);
break;
} else {
//System.out.println(result);
}
}
}
public static void shortWait(long interval){
long start = System.nanoTime();
long end;
do{
end = System.nanoTime();
}while(start + interval >= end);
}
}
禁止乱序
CPU层面:Intel -> 原语(mfence lfence sfence) 或者锁总线
JVM层级:8个hanppens-before原则 4个内存屏障 (LL LS SL SS)
as-if-serial : 不管硬件什么顺序,单线程执行的结果不变,看上去像是serial
合并写
Write Combining Buffer
一般是4个字节
由于ALU速度太快,所以在写入L1的同时,写入一个WC Buffer,满了之后,再直接更新到L2