read/ExtendedAsm

Extended Asm

今天在脉脉上看到了一个很有趣的东西 ( •̀ ω •́ )✧

哦哦, 在连续看了一段时间令人头晕的文档后, 看看这个真是提神呢!

首先, google 了一下, 这个是 gcc 对于汇编指令使用的扩展. 参见

链接里面的文档版本比较过时了, 新的文档请看这个

With extended asm you can read and write C variables from assembler and perform jumps from assembler code to C labels. Extended asm syntax uses colons (‘:’) to delimit the operand parameters after the assembler template:
使用 asm扩展, 可以让你从汇编读取和写入 C 变量, 执行 jump 指令从汇编跳转到 C 标签. asm扩展使用冒号在 assembler模板后分隔操作参数.

asm asm-qualifiers ( AssemblerTemplate 
                 : OutputOperands 
                 [ : InputOperands
                 [ : Clobbers ] ])

asm asm-qualifiers ( AssemblerTemplate 
                      : 
                      : InputOperands
                      : Clobbers
                      : GotoLabels)
where in the last form, asm-qualifiers contains goto (and in the first form, not).
最后一个案例中, asm限定符包含 goto(而在第一个案例中, 没有).

The asm keyword is a GNU extension. When writing code that can be compiled with -ansi and the various -std options, use __asm__ instead of asm (see Alternate Keywords).

Qualifiers

volatile
The typical use of extended asm statements is to manipulate input values to produce output values. However, your asm statements may also produce side effects. If so, you may need to use the volatile qualifier to disable certain optimizations. See Volatile.
这个类型的 asm扩展语句用于管理输入值, 产生输出值. 然而, 你的 asm语句可能产生 side-effects. 你可以使用 volatile限定符取消优化.(简单来说, 就是不允许指令优化, 因为可能造成副作用)

inline
If you use the inline qualifier, then for inlining purposes the size of the asm statement is taken as the smallest size possible (see Size of an asm).
内联(优化指令, 尽可能最短?)

goto
This qualifier informs the compiler that the asm statement may perform a jump to one of the labels listed in the GotoLabels. See GotoLabels.
之前提到的 goto. 可以跳转到外界.

Parameters	(结合一下这里的链接描述, 上述代码就很好理解了)
AssemblerTemplate
This is a literal string that is the template for the assembler code. It is a combination of fixed text and tokens that refer to the input, output, and goto parameters. See AssemblerTemplate.

OutputOperands
A comma-separated list of the C variables modified by the instructions in the AssemblerTemplate. An empty list is permitted. See OutputOperands.

InputOperands
A comma-separated list of C expressions read by the instructions in the AssemblerTemplate. An empty list is permitted. See InputOperands.

Clobbers
A comma-separated list of registers or other values changed by the AssemblerTemplate, beyond those listed as outputs. An empty list is permitted. See Clobbers and Scratch Registers.

GotoLabels
When you are using the goto form of asm, this section contains the list of all C labels to which the code in the AssemblerTemplate may jump. See GotoLabels.

asm statements may not perform jumps into other asm statements, only to the listed GotoLabels. GCC’s optimizers do not know about other jumps; therefore they cannot take account of them when deciding how to optimize.

The total number of input + output + goto operands is limited to 30.

那么. 来说说上述代码的含义吧.

asm volatile("0:\n"	// 标准的起始, volatile标志了, 我们不想指令优化, 并且可能会有输出.
             "ldrex %[newValue], [%[_q_value]]\n"	// 加载寄存器, _q_value 是一个地址, 其值解													// 引用, 放入 newValue 中.
             "add %[newValue], %[newValue], #1\n"		// newValue 增加 1.
             "strex %[result], %[newValue], [%[_q_value]]\n"	// 将其存入 _q_value. 并获取													// 操作状态.
             "teq %[result], #0\n"	// 测试结果
             "bne 0b"	// bne: branch not equal, b是一个标识符, 语句的意义是, 不同的话, 就跳转到 						// "0", 也就是最开始的那个标签. try again! 
             : [newValue] "=&r" (newValue), // 这里是输出参数的传递.
               [result] "=&r" (result), 
             	"+m" (_q_value)	// '+'意味着输入输出参数 汇编代码中已经改变了值, 没必要 '=' 写出. 
             : [_q_value] "r" (&_q_value)	// 输入参数的传递
             : "cc", "memory");	// 我将其理解为特性!.
那么说一下参数符号的意义, 好的, 先盲猜, r 意味着 register!.
‘m’
A memory operand is allowed, with any kind of address that the machine supports in general. Note that the letter used for the general memory constraint can be re-defined by a back end using the TARGET_MEM_CONSTRAINT macro.
支持内存操作.(以及一个注意事项)

‘r’
A register operand is allowed provided that it is in a general register.
是的, 寄存器操作!.
    
‘=’
Means that this operand is written to by this instruction: the previous value is discarded and replaced by new data.
替换, 写入新的值. 也就是说 [newValue] 写入到 (newValue). 标识符写入到参数.
(如果没有这个的话, 上述代码的 result 和 newValue, 不会生效, 不过好像也没用 result = =)
(@warning, 那么, 这是一个可以优化的点)
 
‘&’
Means (in a particular alternative) that this operand is an earlyclobber operand, which is written before the instruction is finished using the input operands. Therefore, this operand may not lie in a register that is read by the instruction or as part of any memory address.

‘&’ applies only to the alternative in which it is written. In constraints with multiple alternatives, sometimes one alternative requires ‘&’ while others do not. See, for example, the ‘movdf’ insn of the 68000.

A operand which is read by the instruction can be tied to an earlyclobber operand if its only use as an input occurs before the early result is written. Adding alternatives of this form often allows GCC to produce better code when only some of the read operands can be affected by the earlyclobber. See, for example, the ‘mulsi3’ insn of the ARM.

Furthermore, if the earlyclobber operand is also a read/write operand, then that operand is written only after it’s used.

‘&’ does not obviate the need to write ‘=’ or ‘+’. As earlyclobber operands are always written, a read-only earlyclobber operand is ill-formed and will be rejected by the compiler.
(emm... 就算翻译出来可能也很涩... 我用自己的话简单说一下, 这个标识意味着"早期易变", 也就是, 在输出之前可能会改变, 联想一下 newValue, 他的值+1了. 简单来说就是这样, 有兴趣可以自己理解一下.)
    
"cc"
The "cc" clobber indicates that the assembler code modifies the flags register. On some machines, GCC represents the condition codes as a specific hardware register; "cc" serves to name this register. On other machines, condition code handling is different, and specifying "cc" has no effect. But it is valid no matter what the target.
会改变 flag 寄存器.
    
"memory"
The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.
代码会执行内存操作.
    
Note that this clobber does not prevent the processor from doing speculative reads past the asm statement. To prevent that, you need processor-specific fence instructions.

而第二段代码与其基本一致, 也就是 add 换成了 sub. ヾ(≧▽≦*)o easy!

老夫打算将 gcc 完整看一遍了（；´д｀）ゞ请祝我好运!

继续

关于上述代码, 通过函数名, 可能作者认为操作是原子的. 但事实并不是这样, 上述汇编指令肉眼可见并非原子的.

因为在 LOAD 和 STREX 之间有一条指令. 而后续的 bne 0b. 可能就是用于检测这种状况的 = =.

但很危险, 假设多线程下. 指令执行顺序是

LOAD	LOAD
ADD		ADD
STREX	STREX
或者
LOAD
ADD		LOAD; ADD
STREX	STREX

(暂不明白 asm volcatile 是否会保护多线程, 但应该不会 ← ←)

所以, 可以想办法优化一下.

LOAD; SUB/ADD;STREX

可以直接简化成一个指令, 在 x86 下, 可以是 xadd, 或者 mov + mfence. 参考以下代码:

以上图片是简单的 C++ atomic 模板和其汇编语言对比, 可以看到 gcc 是如何实现原子操作的. 并不是简单的 LOAD ADD STORE.

对应的 ARM 指令. 请参考

如果还是不明白的话, 可以再参考

(解释起来太麻烦了, 我拒绝 = =)

最后, 我尝试将原子操作在我的测试环境下自行实现了一下.

结果也如期所愿, 是 4. 数据的修改只有 xadd 一条, 并且因为所需的数据都在栈上, 所以应该不会出什么问题.

这里可以将其封装成函数.

这样, inc 的操作应该就是原子的了, 即使在多线程环境下也可以很好地工作.

意味参数来自于一个地址, 增加时是对其地址中地内存直接操作, 不存在间接状态.

(但我不建议这么做, 因为用 atomic 模板会简洁得多, 还附带了 memory order 选项, 所以, 这只是个测试用例).