Convergence Properties of Score-Based Models for Linear Inverse Problems Using Graduated Optimisation (2024)

Abstract

The incorporation of generative models as regularisers within variational formulations for inverse problems has proven effective across numerous image reconstruction tasks. However, the resulting optimisation problem is often non-convex and challenging to solve. In this work, we show that score-based generative models (SGMs) can be used in a graduated optimisation framework to solve inverse problems. We show that the resulting graduated non-convexity flow converge to stationary points of the original problem and provide a numerical convergence analysis of a 2D toy example. We further provide experiments on computed tomography image reconstruction, where we show that this framework is able to recover high-quality images, independent of the initial value. The experiments highlight the potential of using SGMs in graduated optimisation frameworks. The code is available111https://github.com/alexdenker/GradOpt-SGM.

Index Terms— graduated optimisation, score-based generative models, optimisation, inverse problems

1 Introduction

Many problems in image reconstruction can be formulated as linear inverse problems, where the goal is to recover an image π±βˆˆπ’³π±π’³{\mathbf{x}}\in\mathcal{X}bold_x ∈ caligraphic_X given noisy measurements 𝐲δsuperscript𝐲𝛿{\mathbf{y}}^{\delta}bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT that are related via

𝐲δ=𝐀𝐱+Ο΅.superscript𝐲𝛿𝐀𝐱italic-Ο΅\displaystyle{\mathbf{y}}^{\delta}={\mathbf{A}}{\mathbf{x}}+\epsilon.bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT = bold_Ax + italic_Ο΅ .(1)

Here 𝐀:𝒳→𝒴:𝐀→𝒳𝒴{\mathbf{A}}:\mathcal{X}\to\mathcal{Y}bold_A : caligraphic_X β†’ caligraphic_Y is a linear forward operator and Ο΅βˆˆπ’΄italic-ϡ𝒴\epsilon\in\mathcal{Y}italic_Ο΅ ∈ caligraphic_Y the noise. Inverse problems are often ill-posed and require regularisation to stabilise the reconstruction.Variational regularisation is a popular framework to address ill-posedness [1]. It formulates the reconstruction as an optimisation problem

𝐱^∈arg⁒min𝐱⁑{π’Ÿβ’(𝐀𝐱,𝐲δ)+α⁒ℛ⁒(𝐱)},^𝐱subscriptargminπ±π’Ÿπ€π±superscript𝐲𝛿𝛼ℛ𝐱\displaystyle\hat{{\mathbf{x}}}\in\operatorname*{arg\,min}_{\mathbf{x}}\{%\mathcal{D}({\mathbf{A}}{\mathbf{x}},{\mathbf{y}}^{\delta})+\alpha\mathcal{R}(%{\mathbf{x}})\},over^ start_ARG bold_x end_ARG ∈ start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT { caligraphic_D ( bold_Ax , bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT ) + italic_Ξ± caligraphic_R ( bold_x ) } ,(2)

where π’Ÿ:𝒴×𝒴→ℝβ‰₯0:π’Ÿβ†’π’΄π’΄subscriptℝabsent0\mathcal{D}:\mathcal{Y}\times\mathcal{Y}\to\mathbb{R}_{\geq 0}caligraphic_D : caligraphic_Y Γ— caligraphic_Y β†’ blackboard_R start_POSTSUBSCRIPT β‰₯ 0 end_POSTSUBSCRIPT quantifies fitness to the measurements, β„›:𝒳→ℝβ‰₯0:ℛ→𝒳subscriptℝabsent0\mathcal{R}:\mathcal{X}\to\mathbb{R}_{\geq 0}caligraphic_R : caligraphic_X β†’ blackboard_R start_POSTSUBSCRIPT β‰₯ 0 end_POSTSUBSCRIPT is a regulariser, and Ξ±β‰₯0𝛼0\alpha\geq 0italic_Ξ± β‰₯ 0 balances the two terms.For additive Gaussian noise the datafit is typically chosen as π’Ÿβ’(𝐀𝐱,𝐲δ)=12β’β€–π€π±βˆ’π²Ξ΄β€–22π’Ÿπ€π±superscript𝐲𝛿12superscriptsubscriptnorm𝐀𝐱superscript𝐲𝛿22\mathcal{D}({\mathbf{A}}{\mathbf{x}},{\mathbf{y}}^{\delta})=\frac{1}{2}\|{%\mathbf{A}}{\mathbf{x}}-{\mathbf{y}}^{\delta}\|_{2}^{2}caligraphic_D ( bold_Ax , bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG βˆ₯ bold_Ax - bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT βˆ₯ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Finding a suitable regulariser is a difficult task.In recent years a wide variety of deep learning methods have been developed that aim to learn a regulariser from a given dataset, see [2] for a review.

The statistical perspective on inverse problems identifies the regulariser with the negative log-likelihood of a prior distribution π⁒(𝐱)πœ‹π±\pi({\mathbf{x}})italic_Ο€ ( bold_x ), i.e. ℛ⁒(𝐱)=βˆ’log⁑π⁒(𝐱)β„›π±πœ‹π±\mathcal{R}({\mathbf{x}})=-\log\pi({\mathbf{x}})caligraphic_R ( bold_x ) = - roman_log italic_Ο€ ( bold_x ).In this context, learning a prior can be formulated as learning a parametrised distribution pθ⁒(𝐱)subscriptπ‘πœƒπ±p_{\theta}({\mathbf{x}})italic_p start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x ) as a proxy for π⁒(𝐱)πœ‹π±\pi({\mathbf{x}})italic_Ο€ ( bold_x ) [3].If we have access to the likelihood, we can consider the optimisation problem

𝐱^∈arg⁒min𝐱⁑{π’Ÿβ’(𝐀𝐱,𝐲δ)βˆ’log⁑pθ⁒(𝐱)},^𝐱subscriptargminπ±π’Ÿπ€π±superscript𝐲𝛿subscriptπ‘πœƒπ±\displaystyle\hat{{\mathbf{x}}}\in\operatorname*{arg\,min}_{\mathbf{x}}\{%\mathcal{D}({\mathbf{A}}{\mathbf{x}},{\mathbf{y}}^{\delta})-\log p_{\theta}({%\mathbf{x}})\},over^ start_ARG bold_x end_ARG ∈ start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT { caligraphic_D ( bold_Ax , bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT ) - roman_log italic_p start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x ) } ,(3)

as an instance of variational regularisation.Generative regularisers of this form have been extensively studied for various inverse problems [3]. However, as the negative log-likelihood of the generative model is often highly non-convex, (3) is a challenging optimisation problem with many local minima and it strongly depends on the initialisation.

We propose to tackle this problem using score-based generative models (SGMs) [4] as regularisers. SGMs learn a sequence of gradually perturbed distributions, starting at the data distribution and terminating in pure noise. It was recently observed that this sequence can be used in annealed Langevin sampling [5, 6] and graduated optimisation [7]. Graduated optimisation [8], also known as continuation methods [9], is a heuristic for dealing with non-convex problems in which the objective function is replaced with a convex surrogate which can be solved efficiently. The surrogate objective is then gradually transformed to the original non-convex objective.

In this work, we exploit the connection between SGMs and graduated optimisation to solve the underlying non-convex optimisation problem and avoid local minima.We showcase our algorithms on computed tomography.

2 Background

2.1 Graduated Optimisation

Graduated non-convexity is a heuristic global optimisation method for solving non-convex minimisation problems, which creates a sequence of surrogate optimisation problems that approximate the original problem through gradual levels of smoothing or convexification [8].Namely, for a non-convex function f:𝒳→ℝβ‰₯0:𝑓→𝒳subscriptℝabsent0f:\mathcal{X}\to\mathbb{R}_{\geq 0}italic_f : caligraphic_X β†’ blackboard_R start_POSTSUBSCRIPT β‰₯ 0 end_POSTSUBSCRIPT let F:𝒳×[tmin,tmax]→ℝβ‰₯0:𝐹→𝒳subscript𝑑minsubscript𝑑maxsubscriptℝabsent0F:\mathcal{X}\times[{t_{\text{min}}},{t_{\text{max}}}]\to\mathbb{R}_{\geq 0}italic_F : caligraphic_X Γ— [ italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ] β†’ blackboard_R start_POSTSUBSCRIPT β‰₯ 0 end_POSTSUBSCRIPT, such that F⁒(𝐱,tmin)=f⁒(𝐱)𝐹𝐱subscript𝑑min𝑓𝐱F({\mathbf{x}},{t_{\text{min}}})=f({\mathbf{x}})italic_F ( bold_x , italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ) = italic_f ( bold_x ) and that F⁒(𝐱,tmax)𝐹𝐱subscript𝑑maxF({\mathbf{x}},{t_{\text{max}}})italic_F ( bold_x , italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ) is convex with a unique minimiser.The optimisation protocol consists of Iβˆˆβ„•πΌβ„•I\in\mathbb{N}italic_I ∈ blackboard_N iterations over tmax=t1>β‹―>tI=tminsubscript𝑑maxsubscript𝑑1β‹―subscript𝑑𝐼subscript𝑑min{t_{\text{max}}}=t_{1}>\dots>t_{I}={t_{\text{min}}}italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > β‹― > italic_t start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT.In step i𝑖iitalic_i we find the minimiser 𝐱isubscript𝐱𝑖{\mathbf{x}}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of F⁒(𝐱,ti)𝐹𝐱subscript𝑑𝑖F({\mathbf{x}},t_{i})italic_F ( bold_x , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ), using 𝐱iβˆ’1subscript𝐱𝑖1{\mathbf{x}}_{i-1}bold_x start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT as the starting value.The process is terminated by minimising the original function f⁒(𝐱)𝑓𝐱f({\mathbf{x}})italic_f ( bold_x ) using 𝐱Iβˆ’1subscript𝐱𝐼1{\mathbf{x}}_{I-1}bold_x start_POSTSUBSCRIPT italic_I - 1 end_POSTSUBSCRIPT as initialisation.

The success of graduated optimisation strongly depends on the construction of the embedding function F⁒(𝐱,t)𝐹𝐱𝑑F({\mathbf{x}},t)italic_F ( bold_x , italic_t ). Under specific constraints on the type of embedding and the class of non-convex function under consideration, it is possible to get global convergence results [10].

1:Initialise: 𝐱1βˆˆβ„n,tmax=t1>…>tI=tmin,Iβˆˆβ„•,c∈(0,1)formulae-sequenceformulae-sequencesubscript𝐱1superscriptℝ𝑛subscript𝑑maxsubscript𝑑1…subscript𝑑𝐼subscript𝑑minformulae-sequence𝐼ℕ𝑐01\!\displaystyle{\mathbf{x}}_{1}\!\!\in\!\!\mathbb{R}^{n},{t_{\text{max}}}\!\!=%\!t_{1}\!\!>\!\!\dots\!\!>\!t_{I}\!=\!{t_{\text{min}}},I\!\!\in\!\!\mathbb{N},%c\!\!\in\!\!(\!0,\!1\!)bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > … > italic_t start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_I ∈ blackboard_N , italic_c ∈ ( 0 , 1 )

2:fori=1,…,Iβˆ’1𝑖1…𝐼1i=1,\dots,I-1italic_i = 1 , … , italic_I - 1do

3:𝐝i=βˆ’tiβ’βˆ‡π±F⁒(𝐱i,ti)subscript𝐝𝑖subscript𝑑𝑖subscriptβˆ‡π±πΉsubscript𝐱𝑖subscript𝑑𝑖{\mathbf{d}}_{i}=-t_{i}\nabla_{\mathbf{x}}F({\mathbf{x}}_{i},t_{i})bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_F ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT )

4:Find: Ξ»isubscriptπœ†π‘–\displaystyle\lambda_{i}italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT s.t.f⁒(𝐱i+Ξ»i⁒𝐝i)≀f⁒(𝐱i)+c⁒λiβ’βŸ¨βˆ‡π±f⁒(𝐱i),𝐝iβŸ©β„n𝑓subscript𝐱𝑖subscriptπœ†π‘–subscript𝐝𝑖𝑓subscript𝐱𝑖𝑐subscriptπœ†π‘–subscriptsubscriptβˆ‡π±π‘“subscript𝐱𝑖subscript𝐝𝑖superscriptℝ𝑛f({\mathbf{x}}_{i}+\!\lambda_{i}{\mathbf{d}}_{i})\!\leq\!f({\mathbf{x}}_{i})\!%+\!c\lambda_{i}\langle\nabla_{\mathbf{x}}f({\mathbf{x}}_{i}),{\mathbf{d}}_{i}%\rangle_{\mathbb{R}^{n}}italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≀ italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_c italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT

5: 𝐱i+1=𝐱i+Ξ»i⁒𝐝isubscript𝐱𝑖1subscript𝐱𝑖subscriptπœ†π‘–subscript𝐝𝑖\displaystyle{\mathbf{x}}_{i+1}={\mathbf{x}}_{i}+\lambda_{i}{\mathbf{d}}_{i}bold_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT = bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

6:endfor

7:Output: 𝐱Isubscript𝐱𝐼\displaystyle{\mathbf{x}}_{I}bold_x start_POSTSUBSCRIPT italic_I end_POSTSUBSCRIPT

1:Initialize: 𝐱1βˆˆβ„n,Iβˆˆβ„•,c∈(0,1),β∈(0,1),Ξ΅>0,i=1formulae-sequencesubscript𝐱1superscriptℝ𝑛formulae-sequence𝐼ℕformulae-sequence𝑐01formulae-sequence𝛽01formulae-sequenceπœ€0𝑖1\!\displaystyle\ {\mathbf{x}}_{1}\!\in\!\mathbb{R}^{n},I\!\in\!\mathbb{N},c\!%\in\!(0,\!1\!),\beta\!\in\!(0,\!1\!),\varepsilon\!>\!0,\ i\!=\!1bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_I ∈ blackboard_N , italic_c ∈ ( 0 , 1 ) , italic_Ξ² ∈ ( 0 , 1 ) , italic_Ξ΅ > 0 , italic_i = 1

2:while1≀i≀Iβˆ§β€–βˆ‡π±f⁒(𝐱i)β€–>Ξ΅1𝑖𝐼normsubscriptβˆ‡π±π‘“subscriptπ±π‘–πœ€1\leq i\leq I\land\|\nabla_{\mathbf{x}}f({\mathbf{x}}_{i})\|>\varepsilon1 ≀ italic_i ≀ italic_I ∧ βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) βˆ₯ > italic_Ξ΅do

3:Determine: 𝐝iβˆˆβ„ns.t.βŸ¨βˆ‡π±f⁒(𝐱i),𝐝iβŸ©β„n<0formulae-sequencesubscript𝐝𝑖superscriptℝ𝑛s.t.subscriptsubscriptβˆ‡π±π‘“subscript𝐱𝑖subscript𝐝𝑖superscriptℝ𝑛0\displaystyle{\mathbf{d}}_{i}\in\mathbb{R}^{n}\quad\text{s.t.}\quad\langle%\nabla_{\mathbf{x}}f({\mathbf{x}}_{i}),{\mathbf{d}}_{i}\rangle_{\mathbb{R}^{n}%}<0bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT s.t. ⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT < 0

4:Determine: Ξ»i≔max⁑{Ξ²β„“|β„“=0,1,2,3,…}≔subscriptπœ†π‘–conditionalsuperscript𝛽ℓℓ0123…\displaystyle\lambda_{i}\coloneqq\max\{\beta^{\ell}|\ell=0,1,2,3,\dots\}italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ roman_max { italic_Ξ² start_POSTSUPERSCRIPT roman_β„“ end_POSTSUPERSCRIPT | roman_β„“ = 0 , 1 , 2 , 3 , … } s.t. f⁒(𝐱i+Ξ»i⁒𝐝i)≀f⁒(𝐱i)+c⁒λiβ’βŸ¨βˆ‡π±f⁒(𝐱i),𝐝iβŸ©β„n𝑓subscript𝐱𝑖subscriptπœ†π‘–subscript𝐝𝑖𝑓subscript𝐱𝑖𝑐subscriptπœ†π‘–subscriptsubscriptβˆ‡π±π‘“subscript𝐱𝑖subscript𝐝𝑖superscriptℝ𝑛f({\mathbf{x}}_{i}+\lambda_{i}{\mathbf{d}}_{i})\leq f({\mathbf{x}}_{i})+c%\lambda_{i}\langle\nabla_{\mathbf{x}}f({\mathbf{x}}_{i}),{\mathbf{d}}_{i}%\rangle_{\mathbb{R}^{n}}italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≀ italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_c italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT

5:𝐱i+1←𝐱i+Ξ»i⁒𝐝i←subscript𝐱𝑖1subscript𝐱𝑖subscriptπœ†π‘–subscript𝐝𝑖{\mathbf{x}}_{i+1}\leftarrow{\mathbf{x}}_{i}+\lambda_{i}{\mathbf{d}}_{i}bold_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ← bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

6:i←i+1←𝑖𝑖1i\leftarrow i+1italic_i ← italic_i + 1

7:endwhile

2.2 Score-based Generative Models

Score-based generative models (SGMs) have emerged as a powerful tool for generative modelling [4, 11].SGMs consist of a (predefined) forward process, during which noise is gradually added; and a learnable reverse process, which allows transforming noise into samples.The forward process is prescribed by an ItΓ΄ stochastic differential equation (SDE)

d⁒𝐱t=𝐟⁒(𝐱t,t)⁒d⁒t+g⁒(t)⁒d⁒𝐰t,𝐱0∼p0:=Ο€.formulae-sequence𝑑subscriptπ±π‘‘πŸsubscript𝐱𝑑𝑑𝑑𝑑𝑔𝑑𝑑subscript𝐰𝑑similar-tosubscript𝐱0subscript𝑝0assignπœ‹\displaystyle d{\mathbf{x}_{t}}={\mathbf{f}}({\mathbf{x}_{t}},t)dt+g(t)d%\mathbf{w}_{t},\quad{\mathbf{x}}_{0}\sim p_{0}:=\pi.italic_d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_f ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) italic_d italic_t + italic_g ( italic_t ) italic_d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := italic_Ο€ .(4)

The drift function 𝐟:𝒳×ℝ→𝒳:πŸβ†’π’³β„π’³{\mathbf{f}}:\mathcal{X}\times\mathbb{R}\to\mathcal{X}bold_f : caligraphic_X Γ— blackboard_R β†’ caligraphic_X and the diffusion function g:ℝ→ℝ:𝑔→ℝℝg:\mathbb{R}\to\mathbb{R}italic_g : blackboard_R β†’ blackboard_R are chosen such that the terminal distribution approximates a Gaussian, i.e. pTβ‰ˆπ’©β’(0,𝐈)subscript𝑝𝑇𝒩0𝐈p_{T}\approx\mathcal{N}(0,\mathbf{I})italic_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT β‰ˆ caligraphic_N ( 0 , bold_I ). Under certain conditions there exists a reverse diffusion process

d⁒𝐱t=[𝐟⁒(𝐱t,t)βˆ’g⁒(t)2β’βˆ‡π±tlog⁑pt⁒(𝐱t)]⁒d⁒t+g⁒(t)⁒d⁒𝐰t,𝑑subscript𝐱𝑑delimited-[]𝐟subscript𝐱𝑑𝑑𝑔superscript𝑑2subscriptβˆ‡subscript𝐱𝑑subscript𝑝𝑑subscript𝐱𝑑𝑑𝑑𝑔𝑑𝑑subscript𝐰𝑑\displaystyle d{\mathbf{x}}_{t}=\left[{\mathbf{f}}({\mathbf{x}}_{t},t)-g(t)^{2%}\nabla_{{\mathbf{x}}_{t}}\log p_{t}({\mathbf{x}}_{t})\right]dt+g(t)d\mathbf{w%}_{t},italic_d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ bold_f ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) - italic_g ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] italic_d italic_t + italic_g ( italic_t ) italic_d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,(5)

which runs backwards in time[12].SGMs approximate the score function βˆ‡π±tlog⁑pt⁒(𝐱t)subscriptβˆ‡subscript𝐱𝑑subscript𝑝𝑑subscript𝐱𝑑\nabla_{{\mathbf{x}}_{t}}\log p_{t}({\mathbf{x}}_{t})βˆ‡ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) by a neural network sθ⁒(𝐱t,t)subscriptπ‘ πœƒsubscript𝐱𝑑𝑑s_{\theta}({\mathbf{x}}_{t},t)italic_s start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ), which can be trained using denoising score matching [13]

minθ𝔼t∼U⁒(0,1),𝐱0βˆΌΟ€π±t∼pt|0⁒(𝐱t|𝐱0)[βˆ₯sΞΈ(𝐱t,t)βˆ’βˆ‡π±tlogpt|0(𝐱t|𝐱0)βˆ₯22].\displaystyle\min_{\theta}\mathop{\mathbb{E}}_{\begin{subarray}{c}t\sim U(0,1)%,\,{\mathbf{x}}_{0}\sim\pi\\{\mathbf{x}}_{t}\sim p_{t|0}({\mathbf{x}}_{t}|{\mathbf{x}}_{0})\end{subarray}}%\left[\|s_{\theta}({\mathbf{x}}_{t},t)-\nabla_{{\mathbf{x}}_{t}}\log p_{t|0}({%\mathbf{x}}_{t}|{\mathbf{x}}_{0})\|_{2}^{2}\right].roman_min start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_t ∼ italic_U ( 0 , 1 ) , bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ italic_Ο€ end_CELL end_ROW start_ROW start_CELL bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_t | 0 end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG end_POSTSUBSCRIPT [ βˆ₯ italic_s start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) - βˆ‡ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t | 0 end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] .

For an affine drift 𝐟𝐟{\mathbf{f}}bold_f, the transition kernel pt|0subscript𝑝conditional𝑑0p_{t|0}italic_p start_POSTSUBSCRIPT italic_t | 0 end_POSTSUBSCRIPT is given as pt|0⁒(𝐱t|𝐱0)=𝒩⁒(𝐱t;Ξ³t⁒𝐱0,Ξ½t2⁒𝐈)subscript𝑝conditional𝑑0conditionalsubscript𝐱𝑑subscript𝐱0𝒩subscript𝐱𝑑subscript𝛾𝑑subscript𝐱0superscriptsubscriptπœˆπ‘‘2𝐈p_{t|0}({\mathbf{x}}_{t}|{\mathbf{x}}_{0})=\mathcal{N}({\mathbf{x}}_{t};\gamma%_{t}{\mathbf{x}}_{0},\nu_{t}^{2}\mathbf{I})italic_p start_POSTSUBSCRIPT italic_t | 0 end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_N ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; italic_Ξ³ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_Ξ½ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ) with parameters Ξ³t,Ξ½t>0subscript𝛾𝑑subscriptπœˆπ‘‘0\gamma_{t},\nu_{t}>0italic_Ξ³ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_Ξ½ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT > 0 [14].The SGM learns all intermediate distributions, i.e.

pθ⁒(𝐱t,t)β‰ˆpt⁒(𝐱t)=βˆ«Ο€β’(𝐱0)⁒pt|0⁒(𝐱t|𝐱0)⁒𝑑𝐱0.subscriptπ‘πœƒsubscript𝐱𝑑𝑑subscript𝑝𝑑subscriptπ±π‘‘πœ‹subscript𝐱0subscript𝑝conditional𝑑0conditionalsubscript𝐱𝑑subscript𝐱0differential-dsubscript𝐱0\displaystyle p_{\theta}({\mathbf{x}}_{t},t)\approx p_{t}({\mathbf{x}}_{t})=%\int\pi({\mathbf{x}}_{0})p_{t|0}({\mathbf{x}}_{t}|{\mathbf{x}}_{0})d{\mathbf{x%}}_{0}.italic_p start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) β‰ˆ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = ∫ italic_Ο€ ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) italic_p start_POSTSUBSCRIPT italic_t | 0 end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) italic_d bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .(6)

Intermediate distributions are similar to Gaussian hom*otopy [9], as SDE without a drift gives pt⁒(β‹…)=(Ο€βˆ—π’©β’(0,Ξ½t2⁒𝐈))⁒(β‹…)subscriptπ‘π‘‘β‹…πœ‹π’©0superscriptsubscriptπœˆπ‘‘2πˆβ‹…p_{t}(\cdot)=(\pi*\mathcal{N}(0,\nu_{t}^{2}\mathbf{I}))(\cdot)italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( β‹… ) = ( italic_Ο€ βˆ— caligraphic_N ( 0 , italic_Ξ½ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ) ) ( β‹… ).

3 Methods

3.1 Graduated non-convexity Flow

Using the SGM framework we define negative log-likelihoods

ℛ⁒(𝐱t,t)β‰”βˆ’log⁑pt⁒(𝐱t).≔ℛsubscript𝐱𝑑𝑑subscript𝑝𝑑subscript𝐱𝑑\displaystyle\mathcal{R}({\mathbf{x}_{t}},t)\coloneqq-\log p_{t}({\mathbf{x}_{%t}}).caligraphic_R ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) ≔ - roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) .(7)

We expect that ℛ⁒(β‹…,tmin)β„›β‹…subscript𝑑min\mathcal{R}(\cdot,{t_{\text{min}}})caligraphic_R ( β‹… , italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ) is highly non-convex so that the corresponding optimisation problem

min𝐱⁑{f⁒(𝐱)β‰”π’Ÿβ’(𝐀𝐱,𝐲δ)+Ξ±tmin⁒ℛ⁒(𝐱,tmin)},subscriptπ±β‰”π‘“π±π’Ÿπ€π±superscript𝐲𝛿subscript𝛼subscript𝑑minℛ𝐱subscript𝑑min\displaystyle\min_{\mathbf{x}}\left\{f({\mathbf{x}})\coloneqq\mathcal{D}({%\mathbf{A}}{\mathbf{x}},{\mathbf{y}}^{\delta})+\alpha_{{t_{\text{min}}}}%\mathcal{R}({\mathbf{x}},{t_{\text{min}}})\right\},roman_min start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT { italic_f ( bold_x ) ≔ caligraphic_D ( bold_Ax , bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT ) + italic_Ξ± start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_R ( bold_x , italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ) } ,(8)

is challenging to solve.However, due to the convolution with a Gaussian in (6), there exists a t~<∞~𝑑\tilde{t}<\inftyover~ start_ARG italic_t end_ARG < ∞ such that ℛ⁒(β‹…,t~)β„›β‹…~𝑑\mathcal{R}(\cdot,\tilde{t})caligraphic_R ( β‹… , over~ start_ARG italic_t end_ARG ) is convex for all t>t~𝑑~𝑑t>\tilde{t}italic_t > over~ start_ARG italic_t end_ARG [7]. Since π’Ÿβ’(𝐀𝐱,𝐲δ)π’Ÿπ€π±superscript𝐲𝛿\mathcal{D}({\mathbf{A}}{\mathbf{x}},{\mathbf{y}}^{\delta})caligraphic_D ( bold_Ax , bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT ) is convex, the resulting optimisation problem is also convex at tmaxsubscript𝑑max{t_{\text{max}}}italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT and can be solved using common convex optimisation methods. This motivates using SGMs as an embedding in a graduate optimisation scheme, givinga sequence of optimisation problems

𝐱^i+1∈arg⁒min𝐱⁑{F⁒(𝐱,ti)β‰”π’Ÿβ’(𝐀𝐱,𝐲δ)+Ξ±ti⁒ℛ⁒(𝐱,ti)},subscript^𝐱𝑖1subscriptargmin𝐱≔𝐹𝐱subscriptπ‘‘π‘–π’Ÿπ€π±superscript𝐲𝛿subscript𝛼subscript𝑑𝑖ℛ𝐱subscript𝑑𝑖\displaystyle\hat{{\mathbf{x}}}_{i+1}\!\in\operatorname*{arg\,min}_{\mathbf{x}%}\left\{F({\mathbf{x}},t_{i})\!\coloneqq\!\mathcal{D}({\mathbf{A}}{\mathbf{x}}%,{\mathbf{y}}^{\delta})\!+\!\alpha_{t_{i}}\mathcal{R}({\mathbf{x}},t_{i})%\right\},\!over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ∈ start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT { italic_F ( bold_x , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≔ caligraphic_D ( bold_Ax , bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT ) + italic_Ξ± start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_R ( bold_x , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } ,(9)

where Ξ±ti>0subscript𝛼subscript𝑑𝑖0\alpha_{t_{i}}>0italic_Ξ± start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT > 0 is a regularisation parameter and the optimisation is initialised with 𝐱^isubscript^𝐱𝑖\hat{{\mathbf{x}}}_{i}over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Instead of exactly solving (9) in each step, we perform one step of gradient descent,

𝐱i+1=𝐱iβˆ’Ξ»β’tiβ’βˆ‡π±F⁒(𝐱i,ti)=𝐱iβˆ’Ξ»β’tiβ’π€βˆ—β’(𝐀𝐱iβˆ’π²Ξ΄)+λ⁒ti⁒αti⁒sθ⁒(𝐱i,ti),subscript𝐱𝑖1subscriptπ±π‘–πœ†subscript𝑑𝑖subscriptβˆ‡π±πΉsubscript𝐱𝑖subscript𝑑𝑖subscriptπ±π‘–πœ†subscript𝑑𝑖superscript𝐀subscript𝐀𝐱𝑖superscriptπ²π›Ώπœ†subscript𝑑𝑖subscript𝛼subscript𝑑𝑖subscriptπ‘ πœƒsubscript𝐱𝑖subscript𝑑𝑖\begin{split}{\mathbf{x}}_{i+1}&={\mathbf{x}}_{i}-\lambda t_{i}\nabla_{\mathbf%{x}}F({\mathbf{x}}_{i},t_{i})\\&={\mathbf{x}}_{i}-\lambda t_{i}{\mathbf{A}}^{*}({\mathbf{A}}{\mathbf{x}}_{i}-%{\mathbf{y}}^{\delta})+\lambda t_{i}\alpha_{t_{i}}s_{\theta}({\mathbf{x}}_{i},%t_{i}),\end{split}start_ROW start_CELL bold_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_CELL start_CELL = bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_Ξ» italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_F ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_Ξ» italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_A start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ( bold_Ax start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT ) + italic_Ξ» italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_Ξ± start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , end_CELL end_ROW(10)

where sθ⁒(𝐱i,ti)β‰ˆβˆ‡π±ilog⁑pti⁒(𝐱i)subscriptπ‘ πœƒsubscript𝐱𝑖subscript𝑑𝑖subscriptβˆ‡subscript𝐱𝑖subscript𝑝subscript𝑑𝑖subscript𝐱𝑖s_{\theta}({\mathbf{x}}_{i},t_{i})\approx\nabla_{{\mathbf{x}}_{i}}\log p_{t_{i%}}({\mathbf{x}}_{i})italic_s start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) β‰ˆ βˆ‡ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is a pre-trained score-based model.This corresponds to Algorithm 2 in [7] with a predefined smoothing schedule.

3.2 Gradient-like Methods

The critical issue in surrogate optimisation methods is ensuring the convergence to an optimum of the original problem.For graduated non-convexity flow in Section 3.1 this boils down to update directions used in (10), which are not ensured to point in the direction of steepest descent.However, convergence of iterations of the form 𝐱i+1=𝐱i+Ξ»i⁒𝐝isubscript𝐱𝑖1subscript𝐱𝑖subscriptπœ†π‘–subscript𝐝𝑖{\mathbf{x}}_{i+1}={\mathbf{x}}_{i}+\lambda_{i}{\mathbf{d}}_{i}bold_x start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT = bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can still be shown provided 𝐝isubscript𝐝𝑖{\mathbf{d}}_{i}bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a descent direction, i.e. βŸ¨βˆ‡π±f⁒(𝐱i),𝐝iβŸ©β„n<0subscriptsubscriptβˆ‡π±π‘“subscript𝐱𝑖subscript𝐝𝑖superscriptℝ𝑛0\langle\nabla_{\mathbf{x}}f({\mathbf{x}}_{i}),{\mathbf{d}}_{i}\rangle_{\mathbb%{R}^{n}}<0⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT < 0.Iterative methods of this type are also known as gradient-like methods [15, Section 8.3, p. 75] or sufficient descent methods [16].

We consider the case when the maximal number of iterations I𝐼Iitalic_I (i.e. the fineness of the smoothing schedule) tends to infinity.Let {ti}iβˆˆβ„•βŠ‚[tmin,tmax]subscriptsubscript𝑑𝑖𝑖ℕsubscript𝑑minsubscript𝑑max\{t_{i}\}_{i\in\mathbb{N}}\subset[{t_{\text{min}}},{t_{\text{max}}}]{ italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT βŠ‚ [ italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ] be such that

t1=tmax,subscript𝑑1subscript𝑑max\displaystyle t_{1}={t_{\text{max}}},italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ,ti+1≀tiβˆ€iβˆˆβ„•,formulae-sequencesubscript𝑑𝑖1subscript𝑑𝑖for-all𝑖ℕ\displaystyle t_{i+1}\leq t_{i}\quad\forall i\in\mathbb{N},italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ≀ italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT βˆ€ italic_i ∈ blackboard_N ,limiβ†’βˆžti=tmin.subscript→𝑖subscript𝑑𝑖subscript𝑑min\displaystyle\displaystyle\lim_{i\to\infty}t_{i}={t_{\text{min}}}.roman_lim start_POSTSUBSCRIPT italic_i β†’ ∞ end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT .(11)

To show the convergence of {𝐱i}iβˆˆβ„•subscriptsubscript𝐱𝑖𝑖ℕ\{{\mathbf{x}}_{i}\}_{i\in\mathbb{N}}{ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT we use the following notion of gradient-like directions.

Definition 3.1 (Gradient-like Directions [15]).

Let f:ℝn→ℝ:𝑓→superscriptℝ𝑛ℝf:\mathbb{R}^{n}\to\mathbb{R}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT β†’ blackboard_R be continuously differentiable and {𝐱i}iβˆˆβ„•βŠ†β„n.subscriptsubscript𝐱𝑖𝑖ℕsuperscriptℝ𝑛\{{\mathbf{x}}_{i}\}_{i\in\mathbb{N}}\subseteq\mathbb{R}^{n}.{ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT βŠ† blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT . A sequence {𝐝i}iβˆˆβ„•βŠ†β„nsubscriptsubscript𝐝𝑖𝑖ℕsuperscriptℝ𝑛\{{\mathbf{d}}_{i}\}_{i\in\mathbb{N}}\subseteq\mathbb{R}^{n}{ bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT βŠ† blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is called gradient-like with respect to f𝑓fitalic_f and {𝐱i}iβˆˆβ„•,subscriptsubscript𝐱𝑖𝑖ℕ\{{\mathbf{x}}_{i}\}_{i\in\mathbb{N}},{ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT , if for every subsequence {𝐱ij}jβˆˆβ„•,subscriptsubscript𝐱subscript𝑖𝑗𝑗ℕ\{{\mathbf{x}}_{i_{j}}\}_{j\in\mathbb{N}},{ bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT , converging to a non-stationary point of f,𝑓f,italic_f , there exist Ξ΅1,Ξ΅2>0subscriptπœ€1subscriptπœ€20\varepsilon_{1},\varepsilon_{2}>0italic_Ξ΅ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_Ξ΅ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > 0 and Nβˆ—βˆˆβ„•superscript𝑁ℕN^{*}\in\mathbb{N}italic_N start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ∈ blackboard_N such that

  1. (a)

    ‖𝐝ij‖≀Ρ1,βˆ€jβˆˆβ„•formulae-sequencenormsubscript𝐝subscript𝑖𝑗subscriptπœ€1for-all𝑗ℕ\|{\mathbf{d}}_{i_{j}}\|\leq\varepsilon_{1},\ \forall j\in\mathbb{N}βˆ₯ bold_d start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ₯ ≀ italic_Ξ΅ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , βˆ€ italic_j ∈ blackboard_N and

  2. (b)

    βŸ¨βˆ‡π±f⁒(𝐱ij),𝐝ijβŸ©β„nβ‰€βˆ’Ξ΅2,for⁒jβ‰₯Nβˆ—.formulae-sequencesubscriptsubscriptβˆ‡π±π‘“subscript𝐱subscript𝑖𝑗subscript𝐝subscript𝑖𝑗superscriptℝ𝑛subscriptπœ€2for𝑗superscript𝑁\langle\nabla_{\mathbf{x}}f({\mathbf{x}}_{i_{j}}),{\mathbf{d}}_{i_{j}}\rangle_%{\mathbb{R}^{n}}\leq-\varepsilon_{2},\ \text{for}\ j\geq N^{*}.⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , bold_d start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ≀ - italic_Ξ΅ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , for italic_j β‰₯ italic_N start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT .

Under these conditions every limit point of iterates produced by a gradient-like algorithm is a stationary point of f𝑓fitalic_f (see [15, Theorem 8.9]).In order to show convergence we also require βˆ‡π±β„›β’(𝐱,β‹…)subscriptβˆ‡π±β„›π±β‹…\nabla_{\mathbf{x}}\mathcal{R}({\mathbf{x}},\cdot)βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_R ( bold_x , β‹… ) to be Lipschitz continuous, which ensures that search directions {𝐝i}iβˆˆβ„•subscriptsubscript𝐝𝑖𝑖ℕ\{{\mathbf{d}}_{i}\}_{i\in\mathbb{N}}{ bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT defined in (12) are gradient-like with respect to f𝑓fitalic_f in (8) and iterates {𝐱i}iβˆˆβ„•.subscriptsubscript𝐱𝑖𝑖ℕ\{{\mathbf{x}}_{i}\}_{i\in\mathbb{N}}.{ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT . With this, we have the following convergence result.

Convergence Properties of Score-Based Models for Linear Inverse Problems Using Graduated Optimisation (1)
Convergence Properties of Score-Based Models for Linear Inverse Problems Using Graduated Optimisation (2)
Convergence Properties of Score-Based Models for Linear Inverse Problems Using Graduated Optimisation (3)
Convergence Properties of Score-Based Models for Linear Inverse Problems Using Graduated Optimisation (4)
Convergence Properties of Score-Based Models for Linear Inverse Problems Using Graduated Optimisation (5)
Convergence Properties of Score-Based Models for Linear Inverse Problems Using Graduated Optimisation (6)
Theorem 3.2.

Take F𝐹Fitalic_F and f𝑓fitalic_f, given in (8) and (9), with F⁒(β‹…,t)𝐹⋅𝑑F(\cdot,t)italic_F ( β‹… , italic_t ) continuously differentiable for t∈[tmin,tmax]𝑑subscript𝑑minsubscript𝑑maxt\in[{t_{\text{min}}},{t_{\text{max}}}]italic_t ∈ [ italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ] with tmin>0subscript𝑑min0{t_{\text{min}}}>0italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT > 0. Assume {ti}iβˆˆβ„•subscriptsubscript𝑑𝑖𝑖ℕ\{t_{i}\}_{i\in\mathbb{N}}{ italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT satisfy (11). Let {𝐝i}iβˆˆβ„•,{𝐱i}iβˆˆβ„•βŠ†β„nsubscriptsubscript𝐝𝑖𝑖ℕsubscriptsubscript𝐱𝑖𝑖ℕsuperscriptℝ𝑛\{{\mathbf{d}}_{i}\}_{i\in\mathbb{N}},\ \{{\mathbf{x}}_{i}\}_{i\in\mathbb{N}}%\subseteq\mathbb{R}^{n}{ bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT , { bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT βŠ† blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be sequences generated with Algorithm 2, where 𝐝isubscript𝐝𝑖{\mathbf{d}}_{i}bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is determined using the adaptive smoothing schedule. Moreover, we assume that Ξ±β‹…β’βˆ‡π±β„›β’(𝐱,β‹…)subscript𝛼⋅subscriptβˆ‡π±β„›π±β‹…\alpha_{\cdot}\nabla_{\mathbf{x}}\mathcal{R}({\mathbf{x}},\cdot)italic_Ξ± start_POSTSUBSCRIPT β‹… end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_R ( bold_x , β‹… ) is Lipschitz continuous with a global Lipschitz constant Lβ‰₯0𝐿0L\geq 0italic_L β‰₯ 0 for all π±βˆˆβ„n𝐱superscriptℝ𝑛{\mathbf{x}}\in\mathbb{R}^{n}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT.Then every limit point of the sequence {𝐱i}iβˆˆβ„•subscriptsubscript𝐱𝑖𝑖ℕ\{{\mathbf{x}}_{i}\}_{i\in\mathbb{N}}{ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT is a stationary point of f.𝑓f.italic_f .

In case of the graduated non-convexity flow for the function (9) and iterations (10),the directions 𝐝isubscript𝐝𝑖{\mathbf{d}}_{i}bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT correspond to

𝐝isubscript𝐝𝑖\displaystyle{\mathbf{d}}_{i}bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT=βˆ’ti⁒(π€βˆ—β’(𝐀𝐱iβˆ’π²Ξ΄)+Ξ±tiβ’βˆ‡π±β„›β’(𝐱i,ti)),absentsubscript𝑑𝑖superscript𝐀subscript𝐀𝐱𝑖superscript𝐲𝛿subscript𝛼subscript𝑑𝑖subscriptβˆ‡π±β„›subscript𝐱𝑖subscript𝑑𝑖\displaystyle=-t_{i}\left({\mathbf{A}}^{*}({\mathbf{A}}{\mathbf{x}}_{i}-{%\mathbf{y}}^{\delta})+\alpha_{t_{i}}\nabla_{\mathbf{x}}\mathcal{R}({\mathbf{x}%}_{i},t_{i})\right),= - italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_A start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ( bold_Ax start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT ) + italic_Ξ± start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_R ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) ,(12)

which do not necessarily satisfy the descent condition

βŸ¨βˆ‡π±f⁒(𝐱i),𝐝iβŸ©β„n<0,subscriptsubscriptβˆ‡π±π‘“subscript𝐱𝑖subscript𝐝𝑖superscriptℝ𝑛0\displaystyle\langle\nabla_{\mathbf{x}}f({\mathbf{x}}_{i}),{\mathbf{d}}_{i}%\rangle_{\mathbb{R}^{n}}<0,⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT < 0 ,(13)

needed for convergence. We can address this issue through an adaptive smoothing schedule by selecting, in each iteration, the largest smoothing parameter tjβˆ—subscript𝑑superscriptπ‘—βˆ—t_{j^{\ast}}italic_t start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT end_POSTSUBSCRIPT such that the descent condition is satisfied. Such an index jβˆ—superscriptπ‘—βˆ—j^{\ast}italic_j start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT always exists as long as 𝐱isubscript𝐱𝑖{\mathbf{x}}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is not a stationary point of f𝑓fitalic_f, since the maximal index jβˆ—=Isuperscript𝑗𝐼j^{*}=Iitalic_j start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT = italic_I in Algorithm 2 leads to tjβˆ—=tminsubscript𝑑superscript𝑗subscript𝑑mint_{j^{*}}={t_{\text{min}}}italic_t start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT and

βŸ¨βˆ‡π±f⁒(𝐱i),βˆ’tjβˆ—β’βˆ‡π±F⁒(𝐱i,tjβˆ—)βŸ©β„n=βˆ’tminβ’β€–βˆ‡π±f⁒(𝐱i)β€–2<0.subscriptsubscriptβˆ‡π±π‘“subscript𝐱𝑖subscript𝑑superscript𝑗subscriptβˆ‡π±πΉsubscript𝐱𝑖subscript𝑑superscript𝑗superscriptℝ𝑛subscript𝑑minsuperscriptnormsubscriptβˆ‡π±π‘“subscript𝐱𝑖20\Big{\langle}\nabla_{\mathbf{x}}f({\mathbf{x}}_{i}),-t_{j^{*}}\nabla_{\mathbf{%x}}F({\mathbf{x}}_{i},t_{j^{*}})\Big{\rangle}_{\mathbb{R}^{n}}=-{t_{\text{min}%}}\|\nabla_{\mathbf{x}}f({\mathbf{x}}_{i})\|^{2}<0.⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , - italic_t start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_F ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT = - italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < 0 .

Theorem 3.2 can be shown using this adaptive smoothing schedule and by ensuring the properties (a) and (b) in Definition 3.1 hold.

3.3 Energy-based Parametrisation

Adaptive step-size iterative methods require evaluating the objective function in each iteration to ensure convergence.However, traditional SGMs approximate only the score function βˆ‡π±log⁑pt⁒(𝐱)subscriptβˆ‡π±subscript𝑝𝑑𝐱\nabla_{\mathbf{x}}\log p_{t}({\mathbf{x}})βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x ) and do not offer access to the likelihood.The alternative are energy-based models (EBM) [17], where the probabilistic model is parametrised as pθ⁒(𝐱)=egθ⁒(𝐱)Z⁒(ΞΈ)subscriptπ‘πœƒπ±superscript𝑒subscriptπ‘”πœƒπ±π‘πœƒp_{\theta}({\mathbf{x}})=\frac{e^{g_{\theta}({\mathbf{x}})}}{Z(\theta)}italic_p start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x ) = divide start_ARG italic_e start_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_Z ( italic_ΞΈ ) end_ARG, for a scalar-valued neural network gΞΈsubscriptπ‘”πœƒg_{\theta}italic_g start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT and a normalisation constant Z⁒(ΞΈ)π‘πœƒZ(\theta)italic_Z ( italic_ΞΈ ). EBMs can be trained using the score matching objective by defining sθ⁒(𝐱t,t)=βˆ‡π±tgθ⁒(𝐱t,t)subscriptπ‘ πœƒsubscript𝐱𝑑𝑑subscriptβˆ‡subscript𝐱𝑑subscriptπ‘”πœƒsubscript𝐱𝑑𝑑s_{\theta}({\mathbf{x}}_{t},t)=\nabla_{{\mathbf{x}}_{t}}g_{\theta}({\mathbf{x}%}_{t},t)italic_s start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) = βˆ‡ start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) [18], which can be evaluated using common automatic differentiation libraries. This comes with a higher computational cost than the score parametrisation.Following Du et al. [19], we parametrise the energy as

gθ⁒(𝐱t,t)=βˆ’12⁒‖hθ⁒(𝐱t,t)β€–22,subscriptπ‘”πœƒsubscript𝐱𝑑𝑑12superscriptsubscriptnormsubscriptβ„Žπœƒsubscript𝐱𝑑𝑑22\displaystyle g_{\theta}({\mathbf{x}}_{t},t)=-\frac{1}{2}\|h_{\theta}({\mathbf%{x}}_{t},t)\|_{2}^{2},italic_g start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG βˆ₯ italic_h start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) βˆ₯ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ,(14)

where hΞΈsubscriptβ„Žπœƒh_{\theta}italic_h start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT is implemented as a time-conditional U-Net [11]. The perturbed cost function is then given as

F⁒(𝐱,t)=12β’β€–π€π±βˆ’π²Ξ΄β€–22+Ξ±t2⁒‖hθ⁒(𝐱,t)β€–22.𝐹𝐱𝑑12superscriptsubscriptnorm𝐀𝐱superscript𝐲𝛿22subscript𝛼𝑑2superscriptsubscriptnormsubscriptβ„Žπœƒπ±π‘‘22\displaystyle F({\mathbf{x}},t)=\frac{1}{2}\|\mathbf{A}{\mathbf{x}}-{\mathbf{y%}}^{\delta}\|_{2}^{2}+\frac{\alpha_{t}}{2}\|h_{\theta}({\mathbf{x}},t)\|_{2}^{%2}.italic_F ( bold_x , italic_t ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG βˆ₯ bold_Ax - bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT βˆ₯ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + divide start_ARG italic_Ξ± start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG βˆ₯ italic_h start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x , italic_t ) βˆ₯ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .(15)
Remark 1.

Likelihood in the EBM framework is defined by

log⁑pθ⁒(𝐱t,t)=gθ⁒(𝐱t,t)βˆ’log⁑Z⁒(ΞΈ,t).subscriptπ‘πœƒsubscript𝐱𝑑𝑑subscriptπ‘”πœƒsubscriptπ±π‘‘π‘‘π‘πœƒπ‘‘\displaystyle\log p_{\theta}({\mathbf{x}}_{t},t)=g_{\theta}({\mathbf{x}}_{t},t%)-\log Z(\theta,t).roman_log italic_p start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) = italic_g start_POSTSUBSCRIPT italic_ΞΈ end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) - roman_log italic_Z ( italic_ΞΈ , italic_t ) .(16)

This introduces a time step dependency into the (intractable) normalisation constant Z⁒(ΞΈ,t)π‘πœƒπ‘‘Z(\theta,t)italic_Z ( italic_ΞΈ , italic_t ).However, to compute the step sizes we only need to evaluate the objective up to additive constants, which means that Z⁒(ΞΈ,t)π‘πœƒπ‘‘Z(\theta,t)italic_Z ( italic_ΞΈ , italic_t ) does not need to be computed.Energy-based parametrisation was used in a similar fashion to compute Metropolis correction probabilities [19].

4 Experiments

In this section we investigate the numerical performance of Algorithm 2. We start with a toy example, where the data distribution is given by a Gaussian mixture model and the score can be computed analytically. For the second experiment we investigate computed tomography reconstruction on two datasets with a trained SGM. We use the PSNR and SSIM [20] to evaluate the reconstruction quality.

4.1 2D Toy Example

To illustrate the behaviour and the convergence properties, we consider a Gaussian mixture modelconsisting of five Gaussians.The density at time step t𝑑titalic_t is a Gaussian mixture model with a perturbed covariance matrix Ξ£kt=Ξ£k+Οƒ2⁒tβˆ’12⁒logβ‘Οƒβ’πˆsuperscriptsubscriptΞ£π‘˜π‘‘subscriptΞ£π‘˜superscript𝜎2𝑑12𝜎𝐈\Sigma_{k}^{t}=\Sigma_{k}+\frac{\sigma^{2t}-1}{2\log\sigma}\mathbf{I}roman_Ξ£ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = roman_Ξ£ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + divide start_ARG italic_Οƒ start_POSTSUPERSCRIPT 2 italic_t end_POSTSUPERSCRIPT - 1 end_ARG start_ARG 2 roman_log italic_Οƒ end_ARG bold_I.The diffusion process is given by the forward SDE d⁒𝐱t=Οƒt⁒d⁒𝐰t𝑑subscript𝐱𝑑superscriptπœŽπ‘‘π‘‘subscript𝐰𝑑d{\mathbf{x}}_{t}=\sigma^{t}d\mathbf{w}_{t}italic_d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_Οƒ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT with perturbation kernel pt|0⁒(𝐱t|𝐱0)=𝒩⁒(𝐱t|𝐱0,Οƒ2⁒tβˆ’12⁒logβ‘Οƒβ’πˆ)subscript𝑝conditional𝑑0conditionalsubscript𝐱𝑑subscript𝐱0𝒩conditionalsubscript𝐱𝑑subscript𝐱0superscript𝜎2𝑑12𝜎𝐈p_{t|0}({\mathbf{x}}_{t}|{\mathbf{x}}_{0})=\mathcal{N}({\mathbf{x}}_{t}|{%\mathbf{x}}_{0},\frac{\sigma^{2t}-1}{2\log\sigma}\mathbf{I})italic_p start_POSTSUBSCRIPT italic_t | 0 end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) = caligraphic_N ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , divide start_ARG italic_Οƒ start_POSTSUPERSCRIPT 2 italic_t end_POSTSUPERSCRIPT - 1 end_ARG start_ARG 2 roman_log italic_Οƒ end_ARG bold_I ).Furthermore, we consider a simple two dimensional inverse problem with the forward operator 𝐀=(1100)𝐀matrix1100\mathbf{A}=\begin{pmatrix}1&1\\0&0\end{pmatrix}bold_A = ( start_ARG start_ROW start_CELL 1 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ) and clean measurements 𝐲=(2,0)⊺.𝐲superscript20⊺\mathbf{y}=(2,0)^{\intercal}.bold_y = ( 2 , 0 ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT . We choose a constant regularisation parameter Ξ±t=5subscript𝛼𝑑5\alpha_{t}=5italic_Ξ± start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = 5 and adjust one mean of the Gaussian mixture model to the position π±βˆ—=(1,1)⊺superscript𝐱superscript11⊺{\mathbf{x}}^{*}=(1,1)^{\intercal}bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT = ( 1 , 1 ) start_POSTSUPERSCRIPT ⊺ end_POSTSUPERSCRIPT in order to ensure that the global minimum of the cost function f𝑓fitalic_f with tmin=10βˆ’3subscript𝑑minsuperscript103{t_{\text{min}}}=10^{-3}italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT is at π±βˆ—.superscript𝐱{\mathbf{x}}^{*}.bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT .We evaluate the graduated non-convexity flow (Algorithm 1) with a constant step size Ξ»i=1subscriptπœ†π‘–1\lambda_{i}=1italic_Ξ» start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1 as well as the gradient-like method (Algorithm 2) with the adaptive smoothing schedule. The values tisubscript𝑑𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are evenly spaced between tminsubscript𝑑min{t_{\text{min}}}italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT and tmax.subscript𝑑max{t_{\text{max}}}.italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT . The goal is to analyse the algorithms in terms of their convergence properties with respect to the initialisations 𝐱1,subscript𝐱1{\mathbf{x}}_{1},bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , the initial smoothing parameter tmaxsubscript𝑑max{t_{\text{max}}}italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT and the iteration number. To do so, we run the algorithms with 1300 iterations for 104superscript10410^{4}10 start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT equally spaced initial points 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT on [βˆ’10,10]2superscript10102[-10,10]^{2}[ - 10 , 10 ] start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT as well as 100 different values of tmax∈[10βˆ’2,10]subscript𝑑maxsuperscript10210{t_{\text{max}}}\in[10^{-2},10]italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ∈ [ 10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT , 10 ], which are evenly distributed on a log scale. The resulting rate of trajectories converging to stationary points and the global minimum are shown as a pseudocolor plot in Fig.1. The dependence on the initialisations is shown by the left column, where isolines display the loss landscape of f𝑓fitalic_f and the orange paths of iterates show exemplary trajectories for different 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and tmax.subscript𝑑max{t_{\text{max}}}.italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT . The dependence on the iteration number and tmaxsubscript𝑑max{t_{\text{max}}}italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT is shown by the middle and right column.

4.2 Computed Tomography

We evaluate our approach on two datasets: Ellipses [21] and AAPM [22]. Ellipses contains 128Γ—128128128128\times 128128 Γ— 128px images of a varying number of ellipses with random orientation and size. The AAPM dataset consists of CT images from 10101010 patients. We use images of 9999 patients to train the SGM, resulting in 2824282428242824 training images. We use the remaining patient (id C027) for evaluation.We simulate measurements using a parallel-beam Radon transform with 60606060 angles and use 10%percent1010\%10 % and 5%percent55\%5 % relative Gaussian noise, respectively. For both datasets, we train SGMs using the VPSDE [4]. We use a backtracking line search using the Armijo-Goldstein condition to determine a suitable step size with Barzilai-Borwein method[23] to find a candidate step size. We choose the parameter as Ξ±t=α⁒νt/Ξ³tsubscript𝛼𝑑𝛼subscriptπœˆπ‘‘subscript𝛾𝑑\alpha_{t}=\alpha\nu_{t}/\gamma_{t}italic_Ξ± start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_Ξ± italic_Ξ½ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT / italic_Ξ³ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where Ξ½tsubscriptπœˆπ‘‘\nu_{t}italic_Ξ½ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the standard deviation and Ξ³tsubscript𝛾𝑑\gamma_{t}italic_Ξ³ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT the mean scaling of the perturbation kernel. The value α𝛼\alphaitalic_Ξ± was set according to a coarse sweep over one example image. In Fig.2 we show the result for the gradient like method on one AAPM example, including the PSNR, objective value, step size and gradient-like condition from (13). Fig.3 shows example reconstructions for both datasets.

Convergence Properties of Score-Based Models for Linear Inverse Problems Using Graduated Optimisation (7)

Initial value

The gradient-like method (Alg.1) is deterministic, except for the choice of initialisation 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. We study the effect of 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for both Ellipses and AAPM, see Table1 by comparing Alg.1 with gradient descent for optimising the objective function at tminsubscript𝑑min{t_{\text{min}}}italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT, see (8). For both datasets we use 300300300300 steps with a logarithmic time interval for Alg.1. We evaluate 10101010 random initialisations 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for 10101010 images.We compute mean PSNR and SSIM values over all reconstructions. Further, we compute the mean standard deviation value over images. We see that for both Ellipses and AAPM the mean PSNR and mean SSIM are higher, while the mean standard deviation is lower. That is, the gradient-like method achieves better reconstructions more consistently.

Gradient DescentAlg.1
EllipsesPSNRmean28.9628.9628.9628.9632.9332.9332.9332.93
std0.430.430.430.430.100.100.100.10
SSIMmean0.7710.7710.7710.7710.9330.9330.9330.933
std0.0190.0190.0190.0190.0010.0010.0010.001
AAPMPSNRmean35.8935.8935.8935.8937.3337.3337.3337.33
std0.0320.0320.0320.0320.0280.0280.0280.028
SSIMmean0.8990.8990.8990.8990.9130.9130.9130.913
std0.0320.0320.0320.0320.00040.00040.00040.0004
AAPMEllipses
PSNRSSIMPSNRSSIM
Constant step size32.9332.9332.9332.930.7480.7480.7480.74830.8230.8230.8230.820.7420.7420.7420.742
Adaptive step size37.0837.0837.0837.080.8960.8960.8960.89633.2133.2133.2133.210.9350.9350.9350.935

Adaptive step size

The energy-based parametrisation comes with a higher computational cost. However, it allows uing adaptive step sizes for Alg.2. In Table2 we show that this leads to a better reconstruction than a fixed step size. Compare also the results in Fig.2, which shows that the computed step size changes by several orders of magnitude over the iterations. Also, we need fewer iteration (300300300300 vs 1200120012001200) to get convincing reconstructions.

Convergence Properties of Score-Based Models for Linear Inverse Problems Using Graduated Optimisation (8)

5 Discussion

In the toy example in Section 4.1 it seems that the gradient-like condition in Definition3.1 hinders the ability of the algorithm to reach the global optimum, as the search direction 𝐝isubscript𝐝𝑖{\mathbf{d}}_{i}bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT always points in the same halfspace as βˆ’βˆ‡π±f⁒(𝐱i)subscriptβˆ‡π±π‘“subscript𝐱𝑖-\nabla_{\mathbf{x}}f({\mathbf{x}}_{i})- βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). One possible approach to circumvent this is to only enforce this condition after a set number of iterations in order to ensure the convergence to a stationary point of f𝑓fitalic_f.However, for the high-dimensional imaging experiments the gradient-like condition does not seem to be too restrictive. As we see in Fig.2, the condition is satisfied for all iterations and we still achieve a good reconstruction. Further, the adaptive step size rule leads to a improvement in terms of both PSNR and SSIM, while using fewer iterations, which warrants the additional computational cost of the step size search.

6 Conclusion

In this work we show that SGMs can be used in a graduated optimisation framework to solve inverse problems. Further, we propose to use an energy-based parametrisation, which enables the use of line search method for finding a suitable step size. Using the theory of gradient-like directions, we can prove convergence to stationary points. We hypothesise that this method is also able to better escape saddle points and converge to local minima. This research question is left for further work.

References

  • [1]O.Scherzer, M.Grasmair, H.Grossauer, M.Haltmeier, and F.Lenzen,Variational methods in imaging, vol. 167,Springer, 2009.
  • [2]S.Arridge, P.Maass, O.Γ–ktem, and C.-B. SchΓΆnlieb,β€œSolving inverse problems using data-driven models,”Acta Numerica, vol. 28, pp. 1–174, 2019.
  • [3]M.Duff, N.Campbell, and M.J. Ehrhardt,β€œRegularising inverse problems with generative machine learningmodels,”J. Math. Imaging Vis., vol. 66, no. 1, pp. 37–56, 2024.
  • [4]Y.Song, J.Sohl-Dickstein, D.P. Kingma, A.Kumar, S.Ermon, and B.Poole,β€œScore-based generative modeling through stochastic differentialequations,”in ICLR, 2021.
  • [5]Y.Song and S.Ermon,β€œGenerative modeling by estimating gradients of the datadistribution,”NeurIPS, vol. 32, 2019.
  • [6]Y.Sun, Z.Wu, Y.Chen, B.T. Feng, and K.L. Bouman,β€œProvable probabilistic imaging using score-based generativepriors,”arXiv preprint arXiv:2310.10835, 2023.
  • [7]E.Kobler and T.Pock,β€œLearning gradually non-convex image priors using score matching,”arXiv preprint arXiv:2302.10502, 2023.
  • [8]A.Blake and A.Zisserman,Visual reconstruction,MIT press, 1987.
  • [9]H.Mobahi and J.W. Fisher,β€œOn the link between gaussian hom*otopy continuation and convexenvelopes,”in EMMCVPR 2015. Springer, 2015, pp. 43–56.
  • [10]E.Hazan, K.Y. Levy, and S.Shalev-Shwartz,β€œOn graduated optimization for stochastic non-convex problems,”in ICML. PMLR, 2016, pp. 1833–1841.
  • [11]P.Dhariwal and A.Nichol,β€œDiffusion models beat gans on image synthesis,”NeurIPS, vol. 34, pp. 8780–8794, 2021.
  • [12]B.Anderson,β€œReverse-time diffusion equation models,”Stoc. Proc. Appl., vol. 12, no. 3, pp. 313–326, 1982.
  • [13]P.Vincent,β€œA connection between score matching and denoising autoencoders,”Neural computation, vol. 23, no. 7, pp. 1661–1674, 2011.
  • [14]S.SΓ€rkkΓ€ and A.Solin,Applied stochastic differential equations, vol.10,Cambridge University Press, 2019.
  • [15]C.Geiger and C.Kanzow,Numerische Verfahren zur LΓΆsung unrestringierterOptimierungsaufgaben,Springer-Verlag, 2013.
  • [16]X.An, S.Li, and Y.Xiao,β€œSufficient descent directions in unconstrained optimization,”Comput. Optim. Appl., vol. 48, no. 3, pp. 515–532, 2011.
  • [17]G.E. Hinton,β€œTraining products of experts by minimizing contrastivedivergence,”Neural computation, vol. 14, no. 8, pp. 1771–1800, 2002.
  • [18]T.Salimans and J.Ho,β€œShould EBMs model the energy or the score?,”in Energy Based Models Workshop - ICLR, 2021.
  • [19]Y.Du etal.,β€œReduce, reuse, recycle: Compositional generation with energy-baseddiffusion models and mcmc,”in ICML. PMLR, 2023, pp. 8489–8510.
  • [20]Z.Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli,β€œImage quality assessment: from error visibility to structuralsimilarity,”IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, 2004.
  • [21]R.Barbano and etal.,β€œAn educated warm start for deep image prior-based micro ctreconstruction,”IEEE Trans. Comput. Imaging, vol. 8, pp. 1210–1222, 2022.
  • [22]C.H. McCollough etal.,β€œLow-dose ct for the detection and classification of metastaticliver lesions: results of the 2016 low dose ct grand challenge,”Medical physics, vol. 44, no. 10, pp. e339–e352, 2017.
  • [23]J.Barzilai and J.M. Borwein,β€œTwo-point step size gradient methods,”IMA J. of numerical analysis, vol. 8, no. 1, pp. 141–148,1988.

Appendix

Proof of Theorem3.2

Theorem 3.2.

Let F𝐹Fitalic_F and f𝑓fitalic_f be given as in (8) and (9) and let F⁒(β‹…,t)𝐹⋅𝑑F(\cdot,t)italic_F ( β‹… , italic_t ) be continuously differentiable for t∈[tmin,tmax]𝑑subscript𝑑minsubscript𝑑maxt\in[{t_{\text{min}}},{t_{\text{max}}}]italic_t ∈ [ italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ] with tmax>tmin>0subscript𝑑maxsubscript𝑑min0{t_{\text{max}}}>{t_{\text{min}}}>0italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT > italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT > 0. Furthermore, let {ti}iβˆˆβ„•subscriptsubscript𝑑𝑖𝑖ℕ\{t_{i}\}_{i\in\mathbb{N}}{ italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT be as in (11) and {𝐝i}iβˆˆβ„•βŠ†β„nsubscriptsubscript𝐝𝑖𝑖ℕsuperscriptℝ𝑛\{{\mathbf{d}}_{i}\}_{i\in\mathbb{N}}\subseteq\mathbb{R}^{n}{ bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT βŠ† blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT as well as {𝐱i}iβˆˆβ„•βŠ†β„nsubscriptsubscript𝐱𝑖𝑖ℕsuperscriptℝ𝑛\{{\mathbf{x}}_{i}\}_{i\in\mathbb{N}}\subseteq\mathbb{R}^{n}{ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT βŠ† blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be the sequences generated based on the update rules of Algorithm 2, where 𝐝isubscript𝐝𝑖{\mathbf{d}}_{i}bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is determined by using the adaptive smoothing schedule. Moreover, we assume that Ξ±β‹…β’βˆ‡π±β„›β’(𝐱,β‹…)subscript𝛼⋅subscriptβˆ‡π±β„›π±β‹…\alpha_{\cdot}\nabla_{\mathbf{x}}\mathcal{R}({\mathbf{x}},\cdot)italic_Ξ± start_POSTSUBSCRIPT β‹… end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_R ( bold_x , β‹… ) is Lipschitz continuous with a global Lipschitz constant Lβ‰₯0𝐿0L\geq 0italic_L β‰₯ 0 for all π±βˆˆβ„n𝐱superscriptℝ𝑛{\mathbf{x}}\in\mathbb{R}^{n}bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, i.e. there exists a constant Lβ‰₯0𝐿0L\geq 0italic_L β‰₯ 0 such that

β€–Ξ±tβ’βˆ‡π±β„›β’(𝐱,t)βˆ’Ξ±t~β’βˆ‡π±β„›β’(𝐱,t~)‖≀L⁒|tβˆ’t~|normsubscript𝛼𝑑subscriptβˆ‡π±β„›π±π‘‘subscript𝛼~𝑑subscriptβˆ‡π±β„›π±~𝑑𝐿𝑑~𝑑\|\alpha_{t}\nabla_{\mathbf{x}}\mathcal{R}({\mathbf{x}},t)-\alpha_{\tilde{t}}%\nabla_{\mathbf{x}}\mathcal{R}({\mathbf{x}},\tilde{t})\|\leq L|t-\tilde{t}|βˆ₯ italic_Ξ± start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_R ( bold_x , italic_t ) - italic_Ξ± start_POSTSUBSCRIPT over~ start_ARG italic_t end_ARG end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_R ( bold_x , over~ start_ARG italic_t end_ARG ) βˆ₯ ≀ italic_L | italic_t - over~ start_ARG italic_t end_ARG |

βˆ€t,t~∈[tmin,tmax],βˆ€π±βˆˆβ„n.formulae-sequencefor-all𝑑~𝑑subscript𝑑minsubscript𝑑maxfor-all𝐱superscriptℝ𝑛\forall t,\tilde{t}\in[{t_{\text{min}}},{t_{\text{max}}}],\ \forall{\mathbf{x}%}\in\mathbb{R}^{n}.βˆ€ italic_t , over~ start_ARG italic_t end_ARG ∈ [ italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ] , βˆ€ bold_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT .Then every limit point of the sequence {𝐱i}iβˆˆβ„•subscriptsubscript𝐱𝑖𝑖ℕ\{{\mathbf{x}}_{i}\}_{i\in\mathbb{N}}{ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT is a stationary point of f.𝑓f.italic_f .

Proof.

As described in Section 3.2, the adaptive smoothing schedule selects in the i𝑖iitalic_i-th iteration of Algorithm 2 the largest smoothing parameter tjβˆ—subscript𝑑superscript𝑗t_{j^{*}}italic_t start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT end_POSTSUBSCRIPT such that the descent condition in equation (13) with 𝐝i=βˆ’t~iβ’βˆ‡π±F⁒(𝐱i,t~i)subscript𝐝𝑖subscript~𝑑𝑖subscriptβˆ‡π±πΉsubscript𝐱𝑖subscript~𝑑𝑖{\mathbf{d}}_{i}=-\tilde{t}_{i}\nabla_{\mathbf{x}}F({\mathbf{x}}_{i},\tilde{t}%_{i})bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = - over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_F ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) and t~i≔tjβˆ—β‰”subscript~𝑑𝑖subscript𝑑superscript𝑗\tilde{t}_{i}\coloneqq t_{j^{*}}over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≔ italic_t start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT end_POSTSUBSCRIPT holds. In other words, we choose jβˆ—superscript𝑗j^{*}italic_j start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT to be

jβˆ—β‰”min{jβˆˆβ„•|βŸ¨βˆ‡π±f(𝐱i),βˆ’tjβˆ‡π±F(𝐱i,\displaystyle j^{*}\coloneqq\min\Big{\{}j\in\mathbb{N}\ \Big{|}\ \Big{\langle}%\nabla_{\mathbf{x}}f({\mathbf{x}}_{i}),-t_{j}\nabla_{\mathbf{x}}F({\mathbf{x}}%_{i},italic_j start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ≔ roman_min { italic_j ∈ blackboard_N | ⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , - italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_F ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,tj)βŸ©β„n<0\displaystyle t_{j})\Big{\rangle}_{\mathbb{R}^{n}}<0italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT < 0
∧tj≀t~iβˆ’1}.\displaystyle\land\ t_{j}\leq\tilde{t}_{i-1}\Big{\}}.∧ italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ≀ over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT } .

Such an index always exists as long as 𝐱isubscript𝐱𝑖{\mathbf{x}}_{i}bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is not a stationary point of f𝑓fitalic_f (which is ensured by Algorithm 2 due to the stopping criterion in Line 2), as we have

limjβ†’βˆžβŸ¨βˆ‡π±f⁒(𝐱i),βˆ’tjβ’βˆ‡π±F⁒(𝐱i,tj)βŸ©β„nsubscript→𝑗subscriptsubscriptβˆ‡π±π‘“subscript𝐱𝑖subscript𝑑𝑗subscriptβˆ‡π±πΉsubscript𝐱𝑖subscript𝑑𝑗superscriptℝ𝑛\displaystyle\lim_{j\to\infty}\Big{\langle}\nabla_{\mathbf{x}}f({\mathbf{x}}_{%i}),-t_{j}\nabla_{\mathbf{x}}F({\mathbf{x}}_{i},t_{j})\Big{\rangle}_{\mathbb{R%}^{n}}roman_lim start_POSTSUBSCRIPT italic_j β†’ ∞ end_POSTSUBSCRIPT ⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , - italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_F ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT=βˆ’tminβ’β€–βˆ‡π±f⁒(𝐱i)β€–2absentsubscript𝑑minsuperscriptnormsubscriptβˆ‡π±π‘“subscript𝐱𝑖2\displaystyle=-{t_{\text{min}}}\|\nabla_{\mathbf{x}}f({\mathbf{x}}_{i})\|^{2}= - italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
<0,absent0\displaystyle<0,< 0 ,

where the equation follows from the continuity of βˆ‡π±F⁒(𝐱i,β‹…).subscriptβˆ‡π±πΉsubscript𝐱𝑖⋅\nabla_{\mathbf{x}}F({\mathbf{x}}_{i},\cdot).βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_F ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , β‹… ) . Hence, there exists a Nβˆˆβ„•,𝑁ℕN\in\mathbb{N},italic_N ∈ blackboard_N , such that

βŸ¨βˆ‡π±f⁒(𝐱i),βˆ’tjβ’βˆ‡π±F⁒(𝐱i,tj)βŸ©β„n<0subscriptsubscriptβˆ‡π±π‘“subscript𝐱𝑖subscript𝑑𝑗subscriptβˆ‡π±πΉsubscript𝐱𝑖subscript𝑑𝑗superscriptℝ𝑛0\Big{\langle}\nabla_{\mathbf{x}}f({\mathbf{x}}_{i}),-t_{j}\nabla_{\mathbf{x}}F%({\mathbf{x}}_{i},t_{j})\Big{\rangle}_{\mathbb{R}^{n}}<0⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , - italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_F ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT < 0

for all jβ‰₯N.𝑗𝑁j\geq N.italic_j β‰₯ italic_N . The above choice of the 𝐝isubscript𝐝𝑖{\mathbf{d}}_{i}bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ensures that βŸ¨βˆ‡π±f⁒(𝐱i),𝐝iβŸ©β„n<0subscriptsubscriptβˆ‡π±π‘“subscript𝐱𝑖subscript𝐝𝑖superscriptℝ𝑛0\langle\nabla_{\mathbf{x}}f({\mathbf{x}}_{i}),{\mathbf{d}}_{i}\rangle_{\mathbb%{R}^{n}}<0⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) , bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT < 0 for all iβˆˆβ„•,𝑖ℕi\in\mathbb{N},italic_i ∈ blackboard_N , which is a needed criterion for the gradient-like algorithm to converge according to [15]. The sequence {t~i}iβˆˆβ„•subscriptsubscript~𝑑𝑖𝑖ℕ\{\tilde{t}_{i}\}_{i\in\mathbb{N}}{ over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT is a subsequence of {ti}iβˆˆβ„•subscriptsubscript𝑑𝑖𝑖ℕ\{t_{i}\}_{i\in\mathbb{N}}{ italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT and fulfills the properties t~i+1≀t~isubscript~𝑑𝑖1subscript~𝑑𝑖\tilde{t}_{i+1}\leq\tilde{t}_{i}over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT ≀ over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as well as limiβ†’βˆžt~i=tmin.subscript→𝑖subscript~𝑑𝑖subscript𝑑min\lim_{i\to\infty}\tilde{t}_{i}={t_{\text{min}}}.roman_lim start_POSTSUBSCRIPT italic_i β†’ ∞ end_POSTSUBSCRIPT over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT .

Proof by contradiction: Let π±βˆ—βˆˆβ„nsuperscript𝐱superscriptℝ𝑛{\mathbf{x}}^{*}\in\mathbb{R}^{n}bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT be a non-stationary point of f𝑓fitalic_f and let {𝐱ij}jβˆˆβ„•subscriptsubscript𝐱subscript𝑖𝑗𝑗ℕ\{{\mathbf{x}}_{i_{j}}\}_{j\in\mathbb{N}}{ bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT be a subsequence, which converges to π±βˆ—.superscript𝐱{\mathbf{x}}^{*}.bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT . The aim is to show both properties (a) and (b) in Definition 3.1. We would then obtain that {𝐝i}iβˆˆβ„•subscriptsubscript𝐝𝑖𝑖ℕ\{{\mathbf{d}}_{i}\}_{i\in\mathbb{N}}{ bold_d start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT is gradient-like with respect to f𝑓fitalic_f and {𝐱i}iβˆˆβ„•.subscriptsubscript𝐱𝑖𝑖ℕ\{{\mathbf{x}}_{i}\}_{i\in\mathbb{N}}.{ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT . Hence, by [15, Theorem 8.9], we would obtain that every accumulation point of {𝐱i}iβˆˆβ„•subscriptsubscript𝐱𝑖𝑖ℕ\{{\mathbf{x}}_{i}\}_{i\in\mathbb{N}}{ bold_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ blackboard_N end_POSTSUBSCRIPT is a stationary point of f,𝑓f,italic_f , which would give the desired contradiction to the assumption that βˆ‡π±f⁒(π±βˆ—)β‰ 0.subscriptβˆ‡π±π‘“superscript𝐱0\nabla_{\mathbf{x}}f({\mathbf{x}}^{*})\neq 0.βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ) β‰  0 .

We first show that ‖𝐝ij‖≀Ρ1β’βˆ€jβˆˆβ„•,normsubscript𝐝subscript𝑖𝑗subscriptπœ€1for-all𝑗ℕ\|{\mathbf{d}}_{i_{j}}\|\leq\varepsilon_{1}\ \forall j\in\mathbb{N},βˆ₯ bold_d start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ₯ ≀ italic_Ξ΅ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT βˆ€ italic_j ∈ blackboard_N , which corresponds to property (a) of Definition 3.1. It holds that

‖𝐝ijβ€–normsubscript𝐝subscript𝑖𝑗\displaystyle\|{\mathbf{d}}_{i_{j}}\|βˆ₯ bold_d start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ₯
=β€–βˆ’t~ijβ’βˆ‡π±F⁒(𝐱ij,t~ij)β€–absentnormsubscript~𝑑subscript𝑖𝑗subscriptβˆ‡π±πΉsubscript𝐱subscript𝑖𝑗subscript~𝑑subscript𝑖𝑗\displaystyle=\|-\tilde{t}_{i_{j}}\nabla_{\mathbf{x}}F({\mathbf{x}}_{i_{j}},%\tilde{t}_{i_{j}})\|= βˆ₯ - over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_F ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) βˆ₯
=β€–t~ij⁒(π€βˆ—β’(𝐀𝐱ijβˆ’π²Ξ΄)+Ξ±t~ijβ’βˆ‡π±β„›β’(𝐱ij,t~ij))β€–,absentnormsubscript~𝑑subscript𝑖𝑗superscript𝐀subscript𝐀𝐱subscript𝑖𝑗superscript𝐲𝛿subscript𝛼subscript~𝑑subscript𝑖𝑗subscriptβˆ‡π±β„›subscript𝐱subscript𝑖𝑗subscript~𝑑subscript𝑖𝑗\displaystyle=\Big{\|}\tilde{t}_{i_{j}}\left({\mathbf{A}}^{*}({\mathbf{A}}{%\mathbf{x}}_{i_{j}}-{\mathbf{y}}^{\delta})+\alpha_{\tilde{t}_{i_{j}}}\nabla_{%\mathbf{x}}\mathcal{R}({\mathbf{x}}_{i_{j}},\tilde{t}_{i_{j}})\right)\Big{\|},= βˆ₯ over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_A start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ( bold_Ax start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT - bold_y start_POSTSUPERSCRIPT italic_Ξ΄ end_POSTSUPERSCRIPT ) + italic_Ξ± start_POSTSUBSCRIPT over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_R ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) βˆ₯ ,
≀tmax(βˆ₯βˆ‡π±f(𝐱ij)βˆ₯\displaystyle\leq{t_{\text{max}}}\Big{(}\|\nabla_{{\mathbf{x}}}f({\mathbf{x}}_%{i_{j}})\|≀ italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) βˆ₯
+βˆ₯Ξ±t~ijβˆ‡π±β„›(𝐱ij,t~ij)βˆ’Ξ±tminβˆ‡π±β„›(𝐱ij,tmin)βˆ₯)\displaystyle\hskip 34.44434pt+\|\alpha_{\tilde{t}_{i_{j}}}\nabla_{\mathbf{x}}%\mathcal{R}({\mathbf{x}}_{i_{j}},\tilde{t}_{i_{j}})-\alpha_{{t_{\text{min}}}}%\nabla_{\mathbf{x}}\mathcal{R}({\mathbf{x}}_{i_{j}},{t_{\text{min}}})\|\Big{)}+ βˆ₯ italic_Ξ± start_POSTSUBSCRIPT over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_R ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) - italic_Ξ± start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_R ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ) βˆ₯ )
≀tmax⁒(β€–βˆ‡π±f⁒(𝐱ij)β€–+L⁒|t~ijβˆ’tmin|)absentsubscript𝑑maxnormsubscriptβˆ‡π±π‘“subscript𝐱subscript𝑖𝑗𝐿subscript~𝑑subscript𝑖𝑗subscript𝑑min\displaystyle\leq{t_{\text{max}}}\left(\|\nabla_{{\mathbf{x}}}f({\mathbf{x}}_{%i_{j}})\|+L|\tilde{t}_{i_{j}}-{t_{\text{min}}}|\right)≀ italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) βˆ₯ + italic_L | over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT | )
≀tmax⁒(β€–βˆ‡π±f⁒(𝐱ij)β€–+L⁒(tmaxβˆ’tmin)).absentsubscript𝑑maxnormsubscriptβˆ‡π±π‘“subscript𝐱subscript𝑖𝑗𝐿subscript𝑑maxsubscript𝑑min\displaystyle\leq{t_{\text{max}}}\left(\|\nabla_{{\mathbf{x}}}f({\mathbf{x}}_{%i_{j}})\|+L({t_{\text{max}}}-{t_{\text{min}}})\right).≀ italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) βˆ₯ + italic_L ( italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ) ) .

As f𝑓fitalic_f is continuously differentiable and with limjβ†’βˆžπ±ij=π±βˆ—,subscript→𝑗subscript𝐱subscript𝑖𝑗superscript𝐱\lim_{j\to\infty}{\mathbf{x}}_{i_{j}}={\mathbf{x}}^{*},roman_lim start_POSTSUBSCRIPT italic_j β†’ ∞ end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT , we have that βˆ‡π±f⁒(𝐱ij)β†’βˆ‡π±f⁒(π±βˆ—)β†’subscriptβˆ‡π±π‘“subscript𝐱subscript𝑖𝑗subscriptβˆ‡π±π‘“superscript𝐱\nabla_{{\mathbf{x}}}f({\mathbf{x}}_{i_{j}})\to\nabla_{{\mathbf{x}}}f({\mathbf%{x}}^{*})βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) β†’ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ) for jβ†’βˆžβ†’π‘—j\to\inftyitalic_j β†’ ∞ and the sequence {βˆ‡π±f⁒(𝐱ij)}jβˆˆβ„•subscriptsubscriptβˆ‡π±π‘“subscript𝐱subscript𝑖𝑗𝑗ℕ\{\nabla_{{\mathbf{x}}}f({\mathbf{x}}_{i_{j}})\}_{j\in\mathbb{N}}{ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_j ∈ blackboard_N end_POSTSUBSCRIPT is bounded, i.e.there exists a constant c~β‰₯0~𝑐0\tilde{c}\geq 0over~ start_ARG italic_c end_ARG β‰₯ 0 such that β€–βˆ‡π±f⁒(𝐱ij)‖≀c~βˆ€jβˆˆβ„•formulae-sequencenormsubscriptβˆ‡π±π‘“subscript𝐱subscript𝑖𝑗~𝑐for-all𝑗ℕ\|\nabla_{{\mathbf{x}}}f({\mathbf{x}}_{i_{j}})\|\leq\tilde{c}\ \ \forall j\in%\mathbb{N}βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) βˆ₯ ≀ over~ start_ARG italic_c end_ARG βˆ€ italic_j ∈ blackboard_N.Therefore, we have

‖𝐝ijβ€–normsubscript𝐝subscript𝑖𝑗\displaystyle\|{\mathbf{d}}_{i_{j}}\|βˆ₯ bold_d start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ₯≀tmax⁒(c~+L⁒(tmaxβˆ’tmin)),absentsubscript𝑑max~𝑐𝐿subscript𝑑maxsubscript𝑑min\displaystyle\leq{t_{\text{max}}}\left(\tilde{c}+L({t_{\text{max}}}-{t_{\text{%min}}})\right),≀ italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( over~ start_ARG italic_c end_ARG + italic_L ( italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ) ) ,

which shows property (a) of Defintion 3.1. To prove property (b), we compute

βŸ¨βˆ‡π±f⁒(𝐱ij),𝐝ijβŸ©β„nsubscriptsubscriptβˆ‡π±π‘“subscript𝐱subscript𝑖𝑗subscript𝐝subscript𝑖𝑗superscriptℝ𝑛\displaystyle\langle\nabla_{\mathbf{x}}f({\mathbf{x}}_{i_{j}}),{\mathbf{d}}_{i%_{j}}\rangle_{\mathbb{R}^{n}}⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , bold_d start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
=βˆ’t~ijβŸ¨βˆ‡π±f(𝐱ij),βˆ‡π±f(𝐱ij)+Ξ±t~ijβˆ‡π±β„›(𝐱ij,t~ij)\displaystyle=-\tilde{t}_{i_{j}}\langle\nabla_{\mathbf{x}}f({\mathbf{x}}_{i_{j%}}),\nabla_{\mathbf{x}}f({\mathbf{x}}_{i_{j}})+\alpha_{\tilde{t}_{i_{j}}}%\nabla_{\mathbf{x}}\mathcal{R}({\mathbf{x}}_{i_{j}},\tilde{t}_{i_{j}})= - over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⟨ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) + italic_Ξ± start_POSTSUBSCRIPT over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_R ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
βˆ’Ξ±tminβˆ‡π±β„›(𝐱ij,tmin)βŸ©β„n\displaystyle\hskip 116.24963pt-\alpha_{{t_{\text{min}}}}\nabla_{\mathbf{x}}%\mathcal{R}({\mathbf{x}}_{i_{j}},{t_{\text{min}}})\rangle_{\mathbb{R}^{n}}- italic_Ξ± start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT caligraphic_R ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ) ⟩ start_POSTSUBSCRIPT blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_POSTSUBSCRIPT
β‰€βˆ’t~ijβ’β€–βˆ‡π±f⁒(𝐱ij)β€–2+t~ijβ’β€–βˆ‡π±f⁒(𝐱ij)β€–β‹…L⁒|t~ijβˆ’tmin|absentsubscript~𝑑subscript𝑖𝑗superscriptnormsubscriptβˆ‡π±π‘“subscript𝐱subscript𝑖𝑗2β‹…subscript~𝑑subscript𝑖𝑗normsubscriptβˆ‡π±π‘“subscript𝐱subscript𝑖𝑗𝐿subscript~𝑑subscript𝑖𝑗subscript𝑑min\displaystyle\leq-\tilde{t}_{i_{j}}\|\nabla_{\mathbf{x}}f({\mathbf{x}}_{i_{j}}%)\|^{2}+\tilde{t}_{i_{j}}\|\nabla_{\mathbf{x}}f({\mathbf{x}}_{i_{j}})\|\cdot L%|\tilde{t}_{i_{j}}-{t_{\text{min}}}|≀ - over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) βˆ₯ β‹… italic_L | over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT |
β‰€βˆ’tminβ’β€–βˆ‡π±f⁒(𝐱ij)β€–2+tmax⁒c~⁒L⁒|t~ijβˆ’tmin|≕sij.absentsubscript𝑑minsuperscriptnormsubscriptβˆ‡π±π‘“subscript𝐱subscript𝑖𝑗2subscript𝑑max~𝑐𝐿subscript~𝑑subscript𝑖𝑗subscript𝑑min≕subscript𝑠subscript𝑖𝑗\displaystyle\leq-{t_{\text{min}}}\|\nabla_{\mathbf{x}}f({\mathbf{x}}_{i_{j}})%\|^{2}+{t_{\text{max}}}\tilde{c}L|\tilde{t}_{i_{j}}-{t_{\text{min}}}|\eqqcolons%_{i_{j}}.≀ - italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_t start_POSTSUBSCRIPT max end_POSTSUBSCRIPT over~ start_ARG italic_c end_ARG italic_L | over~ start_ARG italic_t end_ARG start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT | ≕ italic_s start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

As βˆ‡π±fsubscriptβˆ‡π±π‘“\nabla_{\mathbf{x}}fβˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f is continuous and 𝐱ijβ†’π±βˆ—β†’subscript𝐱subscript𝑖𝑗superscript𝐱{\mathbf{x}}_{i_{j}}\to{\mathbf{x}}^{*}bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT β†’ bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT for jβ†’βˆžβ†’π‘—j\to\inftyitalic_j β†’ ∞ with π±βˆ—superscript𝐱{\mathbf{x}}^{*}bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT being a non-stationary point of f,𝑓f,italic_f , we have that

limjβ†’βˆžβ€–βˆ‡π±f⁒(𝐱ij)β€–2=β€–βˆ‡π±f⁒(π±βˆ—)β€–2>0.subscript→𝑗superscriptnormsubscriptβˆ‡π±π‘“subscript𝐱subscript𝑖𝑗2superscriptnormsubscriptβˆ‡π±π‘“superscript𝐱20\lim_{j\to\infty}\|\nabla_{\mathbf{x}}f({\mathbf{x}}_{i_{j}})\|^{2}=\|\nabla_{%\mathbf{x}}f({\mathbf{x}}^{*})\|^{2}>0.roman_lim start_POSTSUBSCRIPT italic_j β†’ ∞ end_POSTSUBSCRIPT βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0 .

The sequence sijsubscript𝑠subscript𝑖𝑗s_{i_{j}}italic_s start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT converges and it follows

limjβ†’βˆžsij=βˆ’tminβ’β€–βˆ‡π±f⁒(π±βˆ—)β€–2<0.subscript→𝑗subscript𝑠subscript𝑖𝑗subscript𝑑minsuperscriptnormsubscriptβˆ‡π±π‘“superscript𝐱20\lim_{j\to\infty}s_{i_{j}}=-{t_{\text{min}}}\|\nabla_{\mathbf{x}}f({\mathbf{x}%}^{*})\|^{2}<0.roman_lim start_POSTSUBSCRIPT italic_j β†’ ∞ end_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT = - italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < 0 .

By definition of the convergence, it holds βˆ€Ξ΅>0β’βˆƒN⁒(Ξ΅)β‰₯0::for-allπœ€0π‘πœ€0absent\forall\varepsilon\!>\!0\ \exists N(\varepsilon)\geq 0:βˆ€ italic_Ξ΅ > 0 βˆƒ italic_N ( italic_Ξ΅ ) β‰₯ 0 :

|sijβˆ’(βˆ’tminβ’β€–βˆ‡π±f⁒(π±βˆ—)β€–2)|<Ξ΅βˆ€jβ‰₯N⁒(Ξ΅).formulae-sequencesubscript𝑠subscript𝑖𝑗subscript𝑑minsuperscriptnormsubscriptβˆ‡π±π‘“superscript𝐱2πœ€for-allπ‘—π‘πœ€|s_{i_{j}}-(-{t_{\text{min}}}\|\nabla_{\mathbf{x}}f({\mathbf{x}}^{*})\|^{2})|<%\varepsilon\quad\forall j\geq N(\varepsilon).| italic_s start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT - ( - italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) | < italic_Ξ΅ βˆ€ italic_j β‰₯ italic_N ( italic_Ξ΅ ) .

By choosing Ρ≔tminβ’β€–βˆ‡π±f⁒(π±βˆ—)β€–2/2>0,β‰”πœ€subscript𝑑minsuperscriptnormsubscriptβˆ‡π±π‘“superscript𝐱220\varepsilon\coloneqq{t_{\text{min}}}\|\nabla_{\mathbf{x}}f({\mathbf{x}}^{*})\|%^{2}/2>0,italic_Ξ΅ ≔ italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 > 0 , we obtain in particular

sij<βˆ’tminβ’β€–βˆ‡π±f⁒(π±βˆ—)β€–2/2<0βˆ€jβ‰₯N⁒(Ξ΅),formulae-sequencesubscript𝑠subscript𝑖𝑗subscript𝑑minsuperscriptnormsubscriptβˆ‡π±π‘“superscript𝐱220for-allπ‘—π‘πœ€s_{i_{j}}<-{t_{\text{min}}}\|\nabla_{\mathbf{x}}f({\mathbf{x}}^{*})\|^{2}/2<0%\quad\forall j\geq N(\varepsilon),italic_s start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT < - italic_t start_POSTSUBSCRIPT min end_POSTSUBSCRIPT βˆ₯ βˆ‡ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT italic_f ( bold_x start_POSTSUPERSCRIPT βˆ— end_POSTSUPERSCRIPT ) βˆ₯ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / 2 < 0 βˆ€ italic_j β‰₯ italic_N ( italic_Ξ΅ ) ,

which shows property (b) of Definition 3.1.∎

Convergence Properties of Score-Based Models for Linear Inverse Problems Using Graduated Optimisation (2024)
Top Articles
Cat insurance: Insure your best friend! | Figopet.nl
Travel Insurance for a Netherlands Vacation (2024)
Average Jonas Wife
Junk Cars For Sale Craigslist
How To Do A Springboard Attack In Wwe 2K22
Team 1 Elite Club Invite
Shs Games 1V1 Lol
Miss Carramello
Flat Twist Near Me
Danielle Longet
Was sind ACH-Routingnummern?Β |Β Stripe
C Spire Express Pay
The Banshees Of Inisherin Showtimes Near Regal Thornton Place
Iu Spring Break 2024
Dcf Training Number
How to Download and Play Ultra Panda on PC ?
Air Quality Index Endicott Ny
Colonial Executive Park - CRE Consultants
Mandy Rose - WWE News, Rumors, & Updates
Ascensionpress Com Login
Xxn Abbreviation List 2023
Will there be a The Tower season 4? Latest news and speculation
Superhot Free Online Game Unblocked
Darktide Terrifying Barrage
How Do Netspend Cards Work?
Emiri's Adventures
Dumb Money, la recensione: Paul Dano e quel film biografico sul caso GameStop
Memberweb Bw
Reli Stocktwits
Devotion Showtimes Near Mjr Universal Grand Cinema 16
Pillowtalk Podcast Interview Turns Into 3Some
Arcadia Lesson Plan | Day 4: Crossword Puzzle | GradeSaver
Case Funeral Home Obituaries
Streameast.xy2
Ashoke K Maitra. Adviser to CMD&#39;s. Received Lifetime Achievement Award in HRD on LinkedIn: #hr #hrd #coaching #mentoring #career #jobs #mba #mbafreshers #sales…
Cranston Sewer Tax
301 Priest Dr, KILLEEN, TX 76541 - HAR.com
Lovein Funeral Obits
Weather Underground Cedar Rapids
If You're Getting Your Nails Done, You Absolutely Need to Tipβ€”Here's How Much
Lucyave Boutique Reviews
Costco Gas Foster City
R: Getting Help with R
Natasha Tosini Bikini
Caphras Calculator
Gw2 Support Specter
Ouhsc Qualtrics
Waco.craigslist
Mail2World Sign Up
2121 Gateway Point
Overstock Comenity Login
All Obituaries | Roberts Funeral Home | Logan OH funeral home and cremation
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 5766

Rating: 4.8 / 5 (78 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.