When giving examples of universal gate sets in the paper Qudits and High-Dimensional Quantum Computing, the authors first define the transformation that maps any given qudit state to $|d-1\rangle$:$$U_d(\alpha): \sum_{l=0}^{d-1}\alpha_l|l\rangle \rightarrow |d-1\rangle, \ \alpha = (\alpha_0, \alpha_1,\dots,\alpha_{d-1})$$This can be decomposed into $d-1$ unitary transformations$$U_d = X_d^{(d-1)}(a_{d-1},b_{d-1}),\dots,X_d^{(1)}(a_1,b_1), \ a_l = \alpha_l,\ b_l = \sqrt{\sum_{l=0}^{l-1}\alpha_i^2}$$with$$X_d^{(l)}(x,y)=\begin{bmatrix}\mathbb{I}_{l-1} & & & \\& \frac{x}{\sqrt{|x|^2 + |y|^2}}& \frac{-y}{\sqrt{|x|^2 + |y|^2}}& \\& \frac{y^*}{\sqrt{|x|^2 + |y|^2}} & \frac{x^*}{\sqrt{|x|^2 + |y|^2}} & \\& & & \mathbb{I}_{d-l-1} \\\end{bmatrix}$$So in the qutrit case, the unitary transformations are$$U_3 = X_3^{(2)}(a_2,b_2),X_3^{(1)}(a_1,b_1)$$
where each is transformation is a $3\times3$ matrix.
Considering the example of mapping the state $|0\rangle \rightarrow |2\rangle$, we have $\alpha_0 = 1, \ \alpha_1 = \alpha_2 = 0 = a_1 = a_2$ and $b_1 = b_2 = 1$. Hence we see$$X_3^{(2)}(0,1)X_3^{(1)}(0,1)|0\rangle = |2\rangle$$However, when mapping $|1\rangle \rightarrow |2\rangle$ we see$$X_3^{(1)}(1,1)X_3^{(2)}(0,1)|1\rangle = |2\rangle \\$$
As it is not intuitively clear (to me), why is it necessary to switch the order of the gates (or where am I going wrong)? More importantly, given $U_d$, is there any method to determine the appropriate sequence for applying the gates $X_d^{(d-1)}(a_{d-1},b_{d-1}),\dots,X_d^{(1)}(a_1,b_1)$?