Unified smoothing approach for best hyperparameter selection problem using a bilevel optimization strategy.

Saved in:
Bibliographic Details
Title: Unified smoothing approach for best hyperparameter selection problem using a bilevel optimization strategy.
Authors: Alcantara, Jan Harold1 (AUTHOR) janharold.alcantara@riken.jp, Nguyen, Chieu Thanh2 (AUTHOR) ntchieu@vnua.edu.vn, Okuno, Takayuki1,3 (AUTHOR) takayuki-okuno@st.seikei.ac.jp, Takeda, Akiko1,4 (AUTHOR) takeda@mist.i.u-tokyo.ac.jp, Chen, Jein-Shan5 (AUTHOR) jschen@math.ntnu.edu.tw
Source: Mathematical Programming. Jul2025, Vol. 212 Issue 1, p479-518. 40p.
Subjects: Bilevel programming, Computational mathematics, Smoothness of functions, Mathematical functions, Problem solving
Abstract: Strongly motivated from applications in various fields including machine learning, the methodology of sparse optimization has been developed intensively so far. Especially, the advancement of algorithms for solving problems with nonsmooth regularizers has been remarkable. However, those algorithms suppose that weight parameters of regularizers, called hyperparameters hereafter, are pre-fixed, but it is a crucial matter how the best hyperparameter should be selected. In this paper, we focus on the hyperparameter selection of regularizers related to the ℓ p function with 0 < p ≤ 1 and apply a bilevel programming strategy, wherein we need to solve a bilevel problem, whose lower-level problem is nonsmooth, possibly nonconvex and non-Lipschitz. Recently, for solving a bilevel problem for hyperparameter selection of the pure ℓ p (0 < p ≤ 1) regularizer Okuno et al. discovered new necessary optimality conditions, called SB(scaled bilevel)-KKT conditions, and further proposed a smoothing-type algorithm using a specific smoothing function. However, this optimality measure is loose in the sense that there could be many points that satisfy the SB-KKT conditions. In this work, we propose new bilevel KKT conditions, which are new necessary optimality conditions tighter than the ones proposed by Okuno et al. Moreover, we propose a unified smoothing approach using smoothing functions that belong to the Chen-Mangasarian class, and then prove that generated iteration points accumulate at bilevel KKT points under milder constraint qualifications. Another contribution is that our approach and analysis are applicable to a wider class of regularizers. Numerical comparisons demonstrate which smoothing functions work well for hyperparameter optimization via bilevel optimization approach. [ABSTRACT FROM AUTHOR]
Copyright of Mathematical Programming is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Full text is not displayed to guests.
Description
Abstract:Strongly motivated from applications in various fields including machine learning, the methodology of sparse optimization has been developed intensively so far. Especially, the advancement of algorithms for solving problems with nonsmooth regularizers has been remarkable. However, those algorithms suppose that weight parameters of regularizers, called hyperparameters hereafter, are pre-fixed, but it is a crucial matter how the best hyperparameter should be selected. In this paper, we focus on the hyperparameter selection of regularizers related to the ℓ p function with 0 < p ≤ 1 and apply a bilevel programming strategy, wherein we need to solve a bilevel problem, whose lower-level problem is nonsmooth, possibly nonconvex and non-Lipschitz. Recently, for solving a bilevel problem for hyperparameter selection of the pure ℓ p (0 < p ≤ 1) regularizer Okuno et al. discovered new necessary optimality conditions, called SB(scaled bilevel)-KKT conditions, and further proposed a smoothing-type algorithm using a specific smoothing function. However, this optimality measure is loose in the sense that there could be many points that satisfy the SB-KKT conditions. In this work, we propose new bilevel KKT conditions, which are new necessary optimality conditions tighter than the ones proposed by Okuno et al. Moreover, we propose a unified smoothing approach using smoothing functions that belong to the Chen-Mangasarian class, and then prove that generated iteration points accumulate at bilevel KKT points under milder constraint qualifications. Another contribution is that our approach and analysis are applicable to a wider class of regularizers. Numerical comparisons demonstrate which smoothing functions work well for hyperparameter optimization via bilevel optimization approach. [ABSTRACT FROM AUTHOR]
ISSN:00255610
DOI:10.1007/s10107-024-02113-z