[논문리뷰] Diffeomorphic Image Registration with Neural Velocity Field (WACV 2023)

htyvv 2024. 7. 1. 12:08

2024. 7. 1. 12:08

✔️ Paper link : https://ieeexplore.ieee.org/document/10030312

Diffeomorphic Image Registration with Neural Velocity Field

Diffeomorphic image registration, offering smooth transformation and topology preservation, is required in many medical image analysis tasks. Traditional methods impose certain modeling constraints on the space of admissible transformations and use optimiz

ieeexplore.ieee.org

➡️ Preliminaries

Deformable image registraion
- 몇 가지 regularization constraints를 지키면서 moving image를 fixed image와 최대한 유사하도록 warping하는 방식
Displacement field
- Registration 과정에서 산출되는 displacement field는 moving image 위의 점들과 일치하는 fixed image 위의 점들 사이의 dense mapping을 정의함
일반적인 registration 표현식
- $\phi^*$ 는 displacement field ${\phi}$ 의 optimal 값이고, ${I}_f$ 와 ${I}_m$ 은 fixed와 moving 이미지를 뜻함
- ${ \phi \circ I_m }$ 는 displacement field에 의해 warping 된 moving 이미지를 뜻함
- ${ \mathcal{L}_{\text{sim}} }$ 는 이미지 사이의 유사도를 측정하는 것이고, $\mathcal{L}_{\text{reg}}$ 는 smooth한 displacement 예측을 위한 regularization 임
- $ \phi^* = \arg\min_{\phi} \mathcal{L}_{\text{sim}}(I_f, \phi \circ I_m) + \mathcal{L}_{\text{reg}}(\phi) \tag{1} $
Diffeomorphic registration
- diffeomorphic을 보장하는 registration은 단순히 두 이미지의 유사도가 최대가 하도록 하는 일치 뿐만 아니라 양방향의 변환이 가능하도록하는 성질도 함께 고려함으로써 topology를 유지할 수 있다는 특징이 있음
- diffeomophic한 변형을 계산하기 위해서 velocity field ${v}$ 가 사용됨
  - $\frac{\partial \phi^{(t)}}{\partial t} = v(\phi^{(t)}) \tag{2} $
- $ \phi^{(0)} = I $ 이고, 해당 논문에서는 velocity field ${v}$ 는 ${t} = [0, 1]$에서 일정한 값을 갖는 SVF (stationary vector field)이며 최종 deformation은 ${\phi^{(1)}}$ 임

➡️ Method

${I_f}$ 와 ${I_m}$ 은 각각 fixed image와 보정이 필요한 moving image임
해당 논문에서 다루는 이미지들은 3D 차원 ${\Omega} \subset \mathcal{R}^3$ 에서 정의되고, 전처리 단계에서 이미 affine register를 마친 상태이므로 이후 모델을 통한 비선형적인 displacement 값만 계산하면 됨

DNVF 는 파라미터로 ${\theta}$를 갖는 MLP를 활용해서 neural velocity field ${v}_\theta $를 산출해내는 최적화 기반의 모델이다. 기존의 다른 registration 방법론들과 다르게, DNVF는 이미지의 intensity 대신에 3D 공간 좌표를 입력으로 받는다. 각 공간 위치점 $\mathbf{p} \in \Omega \ $ 마다 ${v}_\theta$를 통해 해당하는 위치의 velocity vector ${\mathbf{v} = v_\theta(\mathbf{p})}$ 를 얻을 수 있다. Diffeomorphic deformation $\phi_\theta$는 neural velocity field ${v}_\theta$ 를 걸친 적분을 통해 구해지는데, 이 과정은 이후에 자세히 설명한다. DNVF 에서는 Scaling and Squaring (SS) 를 사용해서 적분이 수행되는데 이 과정도 이후에 자세히 설명한다. 또한 아래와 같은 loss function 식을 최소화 함으로써 neural velocity field의 파라미터 ${\theta}$ 를 최적화하고 optimal한 $\hat{\theta}$ 를 찾게된다.

$ \hat{\theta} = \arg\min_{\theta} \mathcal{L}_{\text{sim}}(I_f, \phi_{\theta} \circ I_m) + \mathcal{L}_{\text{reg}}(\phi_{\theta}) \tag{3} $

DNVF를 기반으로 저자는 cascaded framework Cas-DNVF를 추가로 제안했다. 이는 learning-based 방법론의 이점을 DNVF와 결합한 것이다. 첫 번째로 주어진 이미지 쌍에 대한 inital deformation을 얻을 수 있도록 하는 함수 $g_{\beta} (I_f, I_m) \phi^{init} $ 를 모델링하기 위해서 fully convolutional neural network (FCN) 를 학습시켰다. 해당 network를 학습하기 위해서 아래와 같은 loss function을 사용했는데 이는 위의 loss function $(3)$와 유사한 구조를 가진다.

$ \hat{\beta} = \arg\min_{\beta} \mathbb{E}_{(I_f,I_m)\sim\mathcal{D}}[\mathcal{L}_{\text{sim}}(I_f, g_{\beta}(I_f,I_m) \circ I_m) + \\ \mathcal{L}_{\text{reg}}(g_{\beta}(I_f,I_m))] \tag{4} $

위 식에서 $\mathcal{D}$ 는 데이터셋의 분포이고 ${\beta}$ 는 FCN의 파라미터이다. 추론이 진행되는 동안 입력 이미지 쌍 $(I_f, I_m)$ 에 대해 사전에 학습된 (pretrained) ${g}_\beta$를 통과 시켜서 $ \phi^{init} $ 를 예측하게 된다. 다음 단계에서는 DNVF를 통해서 residual deformation $\phi^{res}_{\theta}$ 를 최적화 하게되고, 최종적으로 spatial transformer layer를 통해 $\phi^{init}$ 와 $\phi^{res}_{\theta}$ 를 합쳐서 전체적인 deformation을 계산하게 된다.

$ \hat{\theta} = \arg\min_{\theta} \mathcal{L}_{\text{sim}}(I_f, \phi_{\theta}^{\text{res}} \circ \phi^{\text{init}} \circ I_m) + \mathcal{L}_{\text{reg}}(\phi_{\theta}^{\text{res}} \circ \phi^{\text{init}}) \tag{5} $

저자가 실험을 진행해본 결과, 위와 같은 initial deforamtion 정보를 사용하는 것은 DNVF의 학습이 더 빨리 수렴할 수 있도록 돕는 역할을 한다고 주장한다.

✔️ Neural Velocity Field Representation

3D 공간 좌표에 대한 함수 $\mathbf{v} = v_\theta (\mathbf{p}) $는 일치하는 neural velocity vector $\mathbf{v}$를 출력한다. 의료 데이터는 복잡한 생물학적 구조를 가지는 특성이 있는데, 이러한 구조의 deformation을 정확하고 세밀하게 matching하기 위해서는 high-frequency function을 모델링할 수 있어야 한다. 하지만 전통적인 MLP는 "spectral bias"라는 한계점으로 인해서 low-frequency function에 치중해서 학습하는 경향이 있다. 따라서 본 논문에서는 전통적인 ReLU activation function 대신 periodic sinusoidal function을 사용해서 high-frequency function을 잘 학습할 수 있도록 했다.

$ \begin{align*} f_0(\mathbf{x}_0) &= \mathbf{W}_0\mathbf{x}_0 + \mathbf{b}_0 \mapsto \mathbf{x}_1 \\ \ f_i(\mathbf{x}_i) &= \mathbf{W}_i \sin(\mathbf{x}_i) + \mathbf{b}_i \mapsto \mathbf{x}_{i+1} \end{align*} \tag{6} $

3차원 공간 좌표 $\mathbf{p}$ 는 첫 layer ${f}_0 : \mathbb{R}^3 \mapsto \mathbb{R}^N$ 을 통해 고차원으로 임베딩된다. 그리고 ${i}^{th}$ 번째 layer ${f}_i$ 는 학습가능한 파라미터 $\mathbf{W}_i$ 와 $\mathbf{b}_i$ 를 통한 푸리에 주파수 매핑으로 해석할 수 있다. 이와 같이 네트워크에 주파수 정보를 포함시킴으로써 neural tangent kernel 로의 역할을 수행하게 되어 high-frequency 영역의 자세한 정보들을 통해 복잡한 deformation $\phi_{\theta}$ 를 모델링하게 된다. 저자는 512의 hidden unit을 갖는 5개 층의 MLP를 사용했다고 한다.

✔️ Diffeomorphic Deformation as Integration

DNVF 내부에서 neural velocity field ${v}_\theta$ 를 정의하고난 다음 diffeomorphic deformation $\phi_{\theta}$ 는 식 $(2)$ 에 따라 전체 velocity field를 적분하여 계산하게 된다. 이때 velocity field는 시간에 따라 변하지 않는 stationary한 성질을 띈다고 가정을하고 이로 인해서 아래에 설명하는 scaling and squaring 방식으로 효과적인 적분 연산을 수행할 수 있다.

◾ Scaling and Squaring

velocity field가 stationary 일 때, exponential map $ \phi^{(t)} = \text{exp}(v_\theta) $ 은 diffeomorphism을 만족하는 한 subgroup의 parameter로 정의된다. 최종 deformation ${\phi}^{(1)} = \text{exp}(v_\theta) $ 는 group action을 활용해서 더 효율적으로 도출할 수 있다. 구체적으로, initial deformation은 ${T}$ 가 전체 time step일 때 $ \phi^{(1/2^T)} = \mathbf{p} + v_{\theta}(\mathbf{p})/2^T $ 로 나타낼 수 있다. 그 이후에 회귀적으로 spatial tranformer layer를 통해서 $ \phi^{(1/2^{t-1})} = \phi^{(1/2^t)} \circ \phi^{(1/2^t)} $ 를 계산하고, 최종적인 deformation $ {\phi}^{(1)} $ 는 $ {\phi}^{(1)} = {\phi}^{(1/2)} \circ {\phi}^{(1/2)} $ 와 같이 구할 수 있다. 이와 같은 회귀적인 계산 도중에는 별다른 파라미터가 학습되지 않는다. deformation은 fixed grid 위에서 linear interpolation을 통해 계산된다. 이러한 계산 방식을 통해 전체 step에 대한 미분가능성을 확보할 수 있으며, 논문의 저자는 총 7개의 step을 사용했다.

하지만, 모든 3D 좌표를 DNVF의 입력으로 사용하기는 GPU 메모리의 한계로 인해 불가능함으로, 논문의 저자는 원본 좌표 grid 를 1/3 크기로 downsample 하고, 결과 velocity field 가 전체 resolution에 대해 표현될 수 있도록 velocity integration을 통해 upsample 해줬다고 한다. (Figure 1)

✔️ Cascaded Registration

◾ Initial Deformation

Figure 1에서 소개하는 것과 같이, fully convolutional neural network (FCN), scaling and squaring layer, spatial transform layer 를 통해서 함수 ${g_\beta}(I_f, I_m) = \phi^{init} $ 을 파라미터화한다. FCN은 Unet 구조 (Figure 2) 를 기반으로 구성되어있으며, moving image와 fixed image를 합친 데이터를 입력으로 받아서 직접적으로 velocity field를 출력한다. 이후 FCN은 warping 된 moving image와 기존 fixed image 사이의 유사도(similarity)가 최대가 되는 방향으로 initial deformation ${\phi}^{init}$을 예측할 수 있게 학습된다.

◾Optimization of Residual Deformation

학습된 ${g_\theta}$ 를 통해 예측된 initial deformation 을 기반으로, DNVF 를 사용해서 이미지 쌍에 대한 residual deformation ${\phi}_{\theta}^{res}$ 를 최적화하게 된다. 최종 전체 deformation ${\phi}$ 는 ${\phi}^{init}$ 와 ${\phi}_{\theta}^{res}$ 가 spatial transformer layer를 통한 결합으로 계산된다. 일반적으로 예측된 ${\phi}^{init}$ 는 large deformation 에 대해서 양질의 mapping 을 수행할 수 있음으로, DNVF 구조는 학습기반의 구조에서 잘 표현하지 못하는 local small deformation 을 찾는데 집중한다. ${v}_\theta$ 에서 출력된 velocity field $\mathbf{v}$ 는 warping된 moving image ${\phi}^{init} \circ I_m$ 와 fixed image ${I_f}$ 사이의 local small deformation을 설명한다. 논문의 저자느 두 deformation을 직접적으로 합치기보다는, 실험적으로 최적화 과정의 안정성 확보를 위해 DNVF의 velocity vector $\mathbf{v}$ 를 0.1배로 재조정해서 사용했다고 한다.

✔️ Optimization

loss function $\mathcal{L}$ 는 아래와 같이 크게 두 가지 항으로 구성되어있다.

$\mathcal{L}_{sim}$ : 유사도항
- warping 된 moving image (${I_w}$) 와 fixed image (${I}_f$) 사이의 불일치도에 대해 패널티를 부여하는 항
- 유사도에 대한 측정은 local normalized cross-correlation (NCC) 을 통해 진행
- $ NCC(I_w, I_f) = \sum_{\mathbf{p}\in\Omega} \frac{\left(\sum_{\mathbf{p}_i} \left(I_w(\mathbf{p}_i) - \bar{I}_w(\mathbf{p})\right) \left(I_f(\mathbf{p}_i) - \bar{I}f(\mathbf{p})\right)\right)^2} {\sum{\mathbf{p}_i} \left(I_w(\mathbf{p}_i) - \bar{I}w(\mathbf{p})\right)^2 \sum{\mathbf{p}_i} \left(I_f(\mathbf{p}_i) - \bar{I}_f(\mathbf{p})\right)^2} \tag{7} $
- $\mathbf{p}_i$ 는 $\mathbf{p}$ 근처에 위치한 ${n}^3$의 크기를 갖는 local window 위의 점들을 뜻하고, $\bar{I}_w (\mathbf{p})$ 와 $\bar{I}_f (\mathbf{p})$ 는 local window 내의 이미지 intensity의 평균값
- NCC 값이 클 수록 두 이미지가 정밀하게 일치한다는 의미이기 때문에 유사도 loss 계산을 위해서 음의 NCC 값을 사용했으며, window size ${n}$ 으로는 9를 사용했다.
- $\mathcal{L}_{sim} = -\text{NCC}()$
$\mathcal{L}_{reg}$ : 정규화항
- $\mathcal{L}_{Jdet}$
  - local orientation consistency를 보장하기 위해서, selective Jacobian determinant regularization 항을 사용
  - 주어진 좌표 $\mathbf{p}$ 에서의 Jacobian determinant 가 양수라면 deformation field 는 $\mathbf{p}$ 근처의 orientation을 잘 보존하는 것이고, 음수라면 orientation이 뒤집히고 topology가 무너진 것으로 해석
  - 음의 Jacobian determinant 과 ReLU를 함께 사용해서 local region에 대한 패널티를 부여
  - $ \mathcal{L}_{Jdet} = \frac{1}{N} \sum_{\mathbf{p}\in\Omega} relu(-|J_\phi(\mathbf{p})|) \tag{8} $
    - $ J_\phi(\mathbf{p}) = \begin{bmatrix} \frac{\partial\phi_x(\mathbf{p})}{\partial x} & \frac{\partial\phi_x(\mathbf{p})}{\partial y} & \frac{\partial\phi_x(\mathbf{p})}{\partial z} \\ \frac{\partial\phi_y(\mathbf{p})}{\partial x} & \frac{\partial\phi_y(\mathbf{p})}{\partial y} & \frac{\partial\phi_y(\mathbf{p})}{\partial z} \\ \frac{\partial\phi_z(\mathbf{p})}{\partial x} & \frac{\partial\phi_z(\mathbf{p})}{\partial y} & \frac{\partial\phi_z(\mathbf{p})}{\partial z} \end{bmatrix} \tag{9} $
- $\mathcal{L}_{smooth}$
  - 이상하게 치우친 deformation이 예측되는 경우를 방지하기 위해서, spatial gradient 자체를 loss 의 항으로 추가함으로써 부드러운 deformation field를 예측할 수 있게 함
  - 큰 spatial gradient 는 특정한 local area에서 deformation field의 값이 급변한다는 의미이기 때문에 registration 문제에 적합하지 않음
  - $ \mathcal{L}_{smooth} = \sum_{\mathbf{p}\in\Omega} {\left\| \nabla \phi(\mathbf{p})) \right\|}^2 \tag{10} $

최종 loss 는 위와 같은 항들을 결합하여 사용했는데, 추가로 ${\lambda}_1$ 와 ${\lambda}_2$ 를 각 항의 weight로 사용했다.

$ \mathcal{L} = \mathcal{L}_{sim} + \mathcal{L}_{reg} = \mathcal{L}_{sim} + {\lambda}_1 \mathcal{L}_{Jdet} + {\lambda}_2 \mathcal{L}_{smooth} \tag{11} $

'Paper' 카테고리의 다른 글

[논문리뷰] CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos (2)	2024.10.21
[논문리뷰] CoTracker: It is Better to Track Together (0)	2024.05.31
[논문리뷰] EchoTracker: Advancing Myocardial Point Tracking in Echocardiography (MICCAI 2024) (0)	2024.05.20
[논문리뷰] PIPsUS: Self-Supervised Dense Point Tracking in Ultrasound (0)	2024.05.17
[논문리뷰] Temporal-consistent Segmentation of Echocardiography with Co-learning from Appearance and Shape (MICCAI 2020) (0)	2024.05.05

MarvelCode