read/IteratorsandReverseIterators

Iterators and Reverse Iterators

You can convert normal iterators into reverse iterators. Naturally, the iterators must be bidirectional
iterators, but note that the logical position of an iterator is moved during the conversion. Consider
the following program:

可以将普通的迭代器转换为反向迭代去(当然, 这个迭代器必须是双向的)

请记住, 转换后迭代器的逻辑位置会发生改变, 参考以下程序 :

// iter/reviter2.cpp
#include <iterator>
#include <iostream>
#include <vector>
#include <algorithm>

using namespace std;

int main()
{
    // create list with elements from 1 to 9
    vector<int> coll = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };

    // find position of element with value 5
    vector<int>::const_iterator pos;
    pos = find (coll.cbegin(), coll.cend(),
    5);

    // print value to which iterator pos refers
    cout << "pos: " << *pos << endl;

    // convert iterator to reverse iterator rpos
    vector<int>::const_reverse_iterator rpos(pos);

    // print value to which reverse iterator rpos refers
    cout << "rpos: " << *rpos << endl;
}

This program has the following output:
pos: 5
rpos: 4
Thus, if you print the value of an iterator and convert the iterator into a reverse iterator, the value has
changed. This is not a bug; it’s a feature! This behavior is a consequence of the fact that ranges are
half open. To specify all elements of a container, you must use the position after the last argument.
However, for a reverse iterator, this is the position before the first element. Unfortunately, such a
position may not exist. Containers are not required to guarantee that the position before their first
element is valid. Consider that ordinary strings and arrays might also be containers, and the language
does not guarantee that arrays don’t start at address zero.

因此, 更改后迭代器的值发生了变化. 这不是一个 bug, 而是一个特性. 这是一个因半开区间而产生的结果.

要指定容器的所有元素, 需要使用最后一个元素后的位置. 然而, 对于反向迭代器而言, 就是一个在第一个元素之前的位置, 不幸的是, 这样的位置是不存在的. 容器不保证他们第一个位置之前的元素是有效的.

参考原始字符串和数组可能也会是容器, 语言不保证数组不从地址0开始

As a result, the designers of reverse iterators use a trick: They “physically” reverse the “half-open
principle.” Physically, in a range defined by reverse iterators, the beginning is not included, whereas
the end is. However, logically, they behave as usual. Thus, there is a distinction between the physical
position that defines the element to which the iterator refers and the logical position that defines the
value to which the iterator refers (Figure 9.3). The question is, what happens on a conversion from
an iterator to a reverse iterator? Does the iterator keep its logical position (the value) or its physical
position (the element)? As the previous example shows, the latter is the case. Thus, the value is
moved to the previous element (Figure 9.4)

结论是, 反向迭代器设计用了一个方法 : 它们物理性地反向半开区间. 物理性地, 在反向迭代器的定义中, 开始的元素不包含在内, 而尾端被包含. 然而, 逻辑上, 它们的行为和普通的迭代器一样. 因此, 定义迭代器引用的元素的物理位置和定义迭代器引用的值的逻辑位置是有区别的.

问题是, 当普通迭代器转换为反向迭代器的时候, 会发生什么? 迭代器会保持它的逻辑位置, 还是物理位置? 在上面的案例显示中, 属于后者. 因此, 值移动到了之前的元素位置.

内部如何处理的

以上, 就是这篇笔记的原因, 我想看看反向迭代器它的内部是如何工作的

这次的目标是解决如下疑问

普通迭代器转换成反向迭代器会做什么?
反向迭代器如何迭代的?
反向迭代器如何保证不会越界?

首先, 参考一下 C++ 代码 :

int main() {
	array<int, 9> v{1, 2, 3, 4, 5, 6, 7, 8, 9};
	auto it = v.begin();
	cout << *it << endl;
	array<int,9>::const_reverse_iterator rit(it);
	cout << *rit << endl;
}

然后是汇编代码 :

main:
.LFB4368:
▹   .cfi_startproc
▹   pushq▹  %rbp
▹   .cfi_def_cfa_offset 16
▹   .cfi_offset 6, -16
▹   movq▹   %rsp, %rbp
▹   .cfi_def_cfa_register 6
▹   subq▹   $64, %rsp
▹   movl▹   $1, -64(%rbp)
▹   movl▹   $2, -60(%rbp)
▹   movl▹   $3, -56(%rbp)
▹   movl▹   $4, -52(%rbp)
▹   movl▹   $5, -48(%rbp)
▹   movl▹   $6, -44(%rbp)
▹   movl▹   $7, -40(%rbp)
▹   movl▹   $8, -36(%rbp)
▹   movl▹   $9, -32(%rbp)
▹   leaq▹   -64(%rbp), %rax
▹   movq▹   %rax, %rdi
▹   call▹   _ZNSt5arrayIiLm9EE5beginEv	// 上述是 array 的初始化代码
▹   movq▹   %rax, -8(%rbp)	// 构造的返回值即是 begin
▹   movq▹   -8(%rbp), %rax
▹   movl▹   (%rax), %eax	// *it
▹   movl▹   %eax, %esi
▹   movl▹   $_ZSt4cout, %edi
▹   call▹   _ZNSolsEi
▹   movl▹   $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi
▹   movq▹   %rax, %rdi
▹   call▹   _ZNSolsEPFRSoS_E
▹   movq▹   -8(%rbp), %rdx	// -8(%rbp) 即 it
▹   leaq▹   -16(%rbp), %rax	// 这是一块未使用的内存
▹   movq▹   %rdx, %rsi
▹   movq▹   %rax, %rdi
▹   call▹   _ZNSt16reverse_iteratorIPKiEC1ES1_	// 这应当就是 reverse_iterator 的 construct
▹   leaq▹   -16(%rbp), %rax
▹   movq▹   %rax, %rdi
▹   call▹   _ZNKSt16reverse_iteratorIPKiEdeEv	// 在解引用之前, 会调用这个函数
▹   movl▹   (%rax), %eax	// 返回值解引用, 即 *rit
▹   movl▹   %eax, %esi
▹   movl▹   $_ZSt4cout, %edi
▹   call▹   _ZNSolsEi
▹   movl▹   $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi
▹   movq▹   %rax, %rdi
▹   call▹   _ZNSolsEPFRSoS_E
▹   movl▹   $0, %eax
▹   leave
▹   .cfi_def_cfa 7, 8
▹   ret

可以看出, 关键点在于 “ZNSt16reverse_iteratorIPKiEC1ES1“ 和 “_ZNKSt16reverse_iteratorIPKiEdeEv” 将是关键

其中, ZNSt16reverse_iteratorIPKiEC1ES1 的代码如下所示 (没有错, 就是这个)

_ZNSt16reverse_iteratorIPKiEC2ES1_:
.LFB4445:
▹   .cfi_startproc
▹   pushq▹  %rbp
▹   .cfi_def_cfa_offset 16
▹   .cfi_offset 6, -16
▹   movq▹   %rsp, %rbp
▹   .cfi_def_cfa_register 6
▹   subq▹   $16, %rsp
▹   movq▹   %rdi, -8(%rbp)
▹   movq▹   %rsi, -16(%rbp)
▹   movq▹   -8(%rbp), %rax
▹   movq▹   %rax, %rdi
▹   call▹   _ZNSt8iteratorISt26random_access_iterator_tagilPKiRS1_EC2Ev
▹   movq▹   -8(%rbp), %rax
▹   movq▹   -16(%rbp), %rdx
▹   movq▹   %rdx, (%rax)
▹   leave
▹   .cfi_def_cfa 7, 8
▹   ret

代码除 _ZNSt8iteratorISt26random_access_iterator_tagilPKiRS1_EC2Ev 外, 就是简单地将 %rsi(-8(%rbp) 即 it) 的地址放入 %rdi(-16(%rbp) 我提到过的那块未使用的内存)

_ZNSt8iteratorISt26random_access_iterator_tagilPKiRS1_EC2Ev 代码如下所示

_ZNSt8iteratorISt26random_access_iterator_tagilPKiRS1_EC2Ev:
.LFB4443:
▹   .cfi_startproc
▹   pushq▹  %rbp
▹   .cfi_def_cfa_offset 16
▹   .cfi_offset 6, -16
▹   movq▹   %rsp, %rbp
▹   .cfi_def_cfa_register 6
▹   movq▹   %rdi, -8(%rbp)
▹   popq▹   %rbp
▹   .cfi_def_cfa 7, 8
▹   ret
▹   .cfi_endproc

好像… 什么也没做 = = , 这个函数没有栈帧开辟, movq▹ %rdi, -8(%rbp) 也大概率是个无用的语句

那么, 可以暂时得出结论, 反向迭代器是一个二级指针, 它保存了普通迭代器的地址

那么, 来看另外一个函数 _ZNKSt16reverse_iteratorIPKiEdeEv

_ZNKSt16reverse_iteratorIPKiEdeEv:
.LFB4447:
▹   .cfi_startproc
▹   pushq▹  %rbp
▹   .cfi_def_cfa_offset 16
▹   .cfi_offset 6, -16
▹   movq▹   %rsp, %rbp
▹   .cfi_def_cfa_register 6
▹   movq▹   %rdi, -24(%rbp)
▹   movq▹   -24(%rbp), %rax
▹   movq▹   (%rax), %rax
▹   movq▹   %rax, -8(%rbp)
▹   subq▹   $4, -8(%rbp)
▹   movq▹   -8(%rbp), %rax
▹   popq▹   %rbp
▹   .cfi_def_cfa 7, 8
▹   ret
▹   .cfi_endproc

一句话概括: 二级指针解引用 -4. 那么就和文档所说的吻合了, 反向迭代器会往前移动一个位置

这个函数发生在反向迭代器解引用时, 大概可以猜出它的行为

更进一步

疑问 1 和疑问 2 解决了, 但是疑问 3 还未解决

我原来想通过更改内存值来看的, 但是貌似有什么防范操作

Breakpoint 1, main () at t.cpp:13
13	    array<int, 9> v{1, 2, 3, 4, 5, 6, 7, 8, 9};
(gdb) n
14	    auto it = v.begin();
(gdb) p {int}0x7fffffffe480	// 这里就是 it
$16 = 1
(gdb) set {int}0x7fffffffe480 = 100	// 更改 it 中的值
(gdb) p {int}0x7fffffffe480	// 成功更改
$17 = 100
(gdb) n
15	    cout << *it << endl;
(gdb) n	// 本来该打印 1 的, 更改后打印了 100 
100
16	    array<int,9>::const_reverse_iterator rit(it);
(gdb) p v	// 再次验证更改是有效的
$18 = {_M_elems = {100, 2, 3, 4, 5, 6, 7, 8, 9}}
(gdb) p {int}0x7fffffffe47b	// 根据推算, 这个就是 rend 的地址
$19 = 0
(gdb) set {int}0x7fffffffe47b = 200	// 更改 rend ( 在程序崩溃的边缘疯狂试探 :) )
(gdb) p {int}0x7fffffffe47b
$20 = 200
(gdb) n
17	    cout << *rit << endl;
(gdb) n	// 结果让人很失望
0
18	}
(gdb) p {int}0x7fffffffe47b // 谁让你改回去的!?
$21 = 0

我之后在 end 上做了同样的测试, 发现 end 也有防范操作

但我不知道具体是哪里防范了我这个操作, 应该是在解引用时, 但是里面代码不会更改到.

我还看了 objdump 的汇编代码, 依旧没有发现什么特别之处, 和编译器生成的汇编代码是一样的…

算了, 也算有所收获吧…