BACK TO INDEX

Publications of year 2018
Thesis
  1. Gregoire Pichon. On the use of low-rank arithmetic to reduce the complexity of parallel sparse linear solvers based on direct factorization techniques. Theses, Université de Bordeaux, November 2018. Keyword(s): Low-Rank compression.
    Abstract:
    Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For those reasons, optimizing their performance on modern architectures is critical. However, memory requirements and time-to-solution limit the use of direct methods for very large matrices. For other approaches, such as iterative methods, general black-box preconditioners that can ensure fast convergence for a wide range of problems are still missing. In the first part of this thesis, we present two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of a supernodal sparse direct solver. This flat, non-hierarchical, compression method allows to take advantage of the low-rank property of the blocks appearing during the factorization of sparse linear systems. The proposed solver can be used either as a direct solver at a lower precision or as a very robust preconditioner. The first approach, called Minimal-Memory, illustrates the maximum memory gain that can be obtained with the BLR compression method, while the second approach, called Just-In-Time, mainly focuses on reducing the computational complexity and thus the time-to-solution. In the second part, we present a reordering strategy that increases the block granularity to better take advantage of the locality for multicores and provide larger tasks to GPUs. This strategy relies on the block-symbolic factorization to refine the ordering produced by tools such as Metis or Scotch, but it does not impact the number of operations required to solve the problem. From this approach, we propose in the third part of this manuscript a new low-rank clustering technique that is designed to cluster unknowns within a separator to obtain the BLR partition, and demonstrate its assets with respect to widely used clustering strategies. Both reordering and clustering where designed for the flat BLR representation but are also a first step to move to hierarchical formats. We investigate in the last part of this thesis a modified nested dissection strategy that aligns separators with respect to their father to obtain more regular data structure.

    @phdthesis{pichon:tel-01953908,
    TITLE = {{On the use of low-rank arithmetic to reduce the complexity of parallel sparse linear solvers based on direct factorization techniques}},
    AUTHOR = {Pichon, Gregoire},
    URL = {https://hal.inria.fr/tel-01953908},
    NUMBER = {2018BORD0249},
    SCHOOL = {{Universit{\'e} de Bordeaux}},
    YEAR = {2018},
    MONTH = Nov,
    KEYWORDS = {Low-Rank compression},
    TYPE = {Theses},
    PDF = {https://hal.inria.fr/tel-01953908v2/file/PICHON_GREGOIRE_2018.pdf},
    HAL_ID = {tel-01953908},
    HAL_VERSION = {v2},
    ABSTRACT = { Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For those reasons, optimizing their performance on modern architectures is critical. However, memory requirements and time-to-solution limit the use of direct methods for very large matrices. For other approaches, such as iterative methods, general black-box preconditioners that can ensure fast convergence for a wide range of problems are still missing. In the first part of this thesis, we present two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of a supernodal sparse direct solver. This flat, non-hierarchical, compression method allows to take advantage of the low-rank property of the blocks appearing during the factorization of sparse linear systems. The proposed solver can be used either as a direct solver at a lower precision or as a very robust preconditioner. The first approach, called Minimal-Memory, illustrates the maximum memory gain that can be obtained with the BLR compression method, while the second approach, called Just-In-Time, mainly focuses on reducing the computational complexity and thus the time-to-solution. In the second part, we present a reordering strategy that increases the block granularity to better take advantage of the locality for multicores and provide larger tasks to GPUs. This strategy relies on the block-symbolic factorization to refine the ordering produced by tools such as Metis or Scotch, but it does not impact the number of operations required to solve the problem. From this approach, we propose in the third part of this manuscript a new low-rank clustering technique that is designed to cluster unknowns within a separator to obtain the BLR partition, and demonstrate its assets with respect to widely used clustering strategies. Both reordering and clustering where designed for the flat BLR representation but are also a first step to move to hierarchical formats. We investigate in the last part of this thesis a modified nested dissection strategy that aligns separators with respect to their father to obtain more regular data structure. } 
    }
    


Articles in journal, book chapters
  1. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Sparse supernodal solver using block low-rank compression: Design, performance and analysis. International Journal of Computational Science and Engineering, 27:255 - 270, July 2018. Keyword(s): Low-rank compression.
    @article{pichon:hal-01824275,
    TITLE = {{Sparse supernodal solver using block low-rank compression: Design, performance and analysis}},
    AUTHOR = {Pichon, G. and Darve, E. and Faverge, M. and Ramet, P. and Roman, J.},
    URL = {https://hal.inria.fr/hal-01824275},
    JOURNAL = {{International Journal of Computational Science and Engineering}},
    PUBLISHER = {{Inderscience}},
    VOLUME = {27},
    PAGES = {255 - 270},
    YEAR = {2018},
    MONTH = Jul,
    DOI = {10.1016/J.JOCS.2018.06.007},
    KEYWORDS = {Low-rank compression},
    PDF = {https://hal.inria.fr/hal-01824275/file/blr.pdf},
    HAL_ID = {hal-01824275},
    HAL_VERSION = {v1},
    
    }
    


Conference articles
  1. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Supernodes ordering to enhance Block Low-Rank compression in sparse direct solvers. In PMAA 2018 - 10th International Workshop on Parallel Matrix Algorithms and Applications, Zurich, Swiss, June 2018. Keyword(s): Low-rank compression.
    Abstract:
    In this talk, we present new ordering heuristics to perform block low-rank clustering in supernodes issued from the nested dissection. As kway partitioning within supernodes does not take into account interactions between supernodes, there is room to improve compression rates. We combine kway partitioning with a reordering strategy that aims at minimizing the number of off-diagonal blocks in the symbolic structure and show that both methods are limited. In addition, we propose a selection of some non-compressible vertices to handle the corresponding blocks in full-rank and reduce the burden on managing low-rank blocks with high ranks.

    @inproceedings{pichon:hal-01956960,
    TITLE = {{Supernodes ordering to enhance Block Low-Rank compression in sparse direct solvers}},
    AUTHOR = {Pichon, G. and Darve, E. and Faverge, M. and Ramet, P. and Roman, J.},
    URL = {https://hal.inria.fr/hal-01956960},
    BOOKTITLE = {{PMAA 2018 - 10th International Workshop on Parallel Matrix Algorithms and Applications}},
    ADDRESS = {Zurich, Swiss},
    YEAR = {2018},
    MONTH = Jun,
    PDF = {https://hal.inria.fr/hal-01956960/file/pmaa18.pdf},
    HAL_ID = {hal-01956960},
    HAL_VERSION = {v1},
    KEYWORDS = {Low-rank compression},
    ABSTRACT = { In this talk, we present new ordering heuristics to perform block low-rank clustering in supernodes issued from the nested dissection. As kway partitioning within supernodes does not take into account interactions between supernodes, there is room to improve compression rates. We combine kway partitioning with a reordering strategy that aims at minimizing the number of off-diagonal blocks in the symbolic structure and show that both methods are limited. In addition, we propose a selection of some non-compressible vertices to handle the corresponding blocks in full-rank and reduce the burden on managing low-rank blocks with high ranks. } 
    }
    


Internal reports
  1. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Supernodes ordering to enhance Block Low-Rank compression in sparse direct solvers. Research Report RR-9238, Inria Bordeaux Sud-Ouest, December 2018. Keyword(s): Low-rank compression.
    @techreport{pichon:hal-01961675,
    TITLE = {{Supernodes ordering to enhance Block Low-Rank compression in sparse direct solvers}},
    AUTHOR = {Pichon, G. and Darve, E. and Faverge, M. and Ramet, P. and Roman, J.},
    URL = {https://hal.inria.fr/hal-01961675},
    TYPE = {Research Report},
    NUMBER = {RR-9238},
    PAGES = {1-31},
    INSTITUTION = {{Inria Bordeaux Sud-Ouest}},
    YEAR = {2018},
    MONTH = Dec,
    KEYWORDS = {Low-rank compression},
    PDF = {https://hal.inria.fr/hal-01961675/file/RR-9238.pdf},
    HAL_ID = {hal-01961675},
    HAL_VERSION = {v1},
    
    }
    


Miscellaneous
  1. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Utilisation de la compression Block Low-Rank pour accélérer un solveur direct creux supernodal. COMPAS 2018 - SOLHAR final meeting, Toulouse, France, July 2018. Keyword(s): Low-rank compression.
    @misc{pichon:hal-01956959,
    TITLE = {{Utilisation de la compression Block Low-Rank pour acc{\'e}l{\'e}rer un solveur direct creux supernodal}},
    AUTHOR = {Pichon, G. and Darve, E. and Faverge, M. and Ramet, P. and Roman, J.},
    URL = {https://hal.inria.fr/hal-01956959},
    HOWPUBLISHED = {{COMPAS 2018} - {SOLHAR final meeting}, Toulouse, France},
    YEAR = {2018},
    MONTH = Jul,
    PDF = {https://hal.inria.fr/hal-01956959/file/slides.pdf},
    HAL_ID = {hal-01956959},
    HAL_VERSION = {v1},
    KEYWORDS = {Low-rank compression} 
    }
    


  2. G. Pichon, M. Faverge, P. Ramet, and J. Roman. Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX. JCAD 2018 - Journées Calcul et Données, Lyon, France, October 2018. Keyword(s): Low-rank compression.
    @misc{pichon:hal-01956928,
    TITLE = {{Utilisation de la compression low-rank pour r{\'e}duire la complexit{\'e} du solveur PaStiX}},
    AUTHOR = {Pichon, G. and Faverge, M. and Ramet, P. and Roman, J.},
    URL = {https://hal.inria.fr/hal-01956928},
    HOWPUBLISHED = {{JCAD 2018 - Journ{\'e}es Calcul et Donn{\'e}es}, Lyon, France},
    YEAR = {2018},
    MONTH = Oct,
    PDF = {https://hal.inria.fr/hal-01956928/file/3_Faverge.pdf},
    HAL_ID = {hal-01956928},
    HAL_VERSION = {v1},
    KEYWORDS = {Low-rank compression} 
    }
    


  3. P. Ramet. Heterogeneous architectures, Hybrid methods, Hierarchical matrices for Sparse Linear Solvers. Seminar at Stanford, April 2018.
    Abstract:
    Sparse direct solvers is a time consuming operation required by many scientific applications to simulate physical problems. By its important overall cost, many studies tried to optimize the time to solution of those solvers on multi-core and distributed architectures. In this talk, we will present recent advances on PaStiX (https://gitlab.inria.fr/solverstack/pastix), a supernodal sparse direct solver, which has been enhanced by the introduction of Block Low-Rank compression. We will compare the numerical stability, and the performance in terms of memory consumption and time to solution of different approaches by selecting when the compression of the factorized matrix occurs. Many works have also addressed heterogeneous architectures to exploit accelerators such as GPUs or Intel Xeon Phi with interesting speedup. The new implementation on top of runtime systems (PaRSEC, StarPU), will be compared with the static scheduling used in previous experiments. Among the preprocessing steps of a sparse direct solver, reordering and block symbolic factorization are two major steps to reach a suitable granularity. In this talk, we will present a reordering strategy to increase off-diagonal block sizes. It enhances BLAS kernels and allows to handle larger tasks, reducing runtime overhead. Finally, in order to improve the efficiency of the sparse update kernel for both BLR (block low rank) and HODLR (hierarchically off-diagonal low-rank), we are currently investigating the BDLR (boundary distance low-rank) method to preselect rows and columns in the low-rank approximation algorithm.

    @Misc{c:LaBRI::stanford18,
    OPTkey = {},
    author = {Ramet, P.},
    title = {Heterogeneous architectures, Hybrid methods, Hierarchical matrices for Sparse Linear Solvers},
    howpublished = {Seminar at Stanford},
    month = apr,
    year = 2018,
    OPTnote = {},
    OPTannote = {},
    ABSTRACT = { Sparse direct solvers is a time consuming operation required by many scientific applications to simulate physical problems. By its important overall cost, many studies tried to optimize the time to solution of those solvers on multi-core and distributed architectures. In this talk, we will present recent advances on PaStiX (https://gitlab.inria.fr/solverstack/pastix), a supernodal sparse direct solver, which has been enhanced by the introduction of Block Low-Rank compression. We will compare the numerical stability, and the performance in terms of memory consumption and time to solution of different approaches by selecting when the compression of the factorized matrix occurs. Many works have also addressed heterogeneous architectures to exploit accelerators such as GPUs or Intel Xeon Phi with interesting speedup. The new implementation on top of runtime systems (PaRSEC, StarPU), will be compared with the static scheduling used in previous experiments. Among the preprocessing steps of a sparse direct solver, reordering and block symbolic factorization are two major steps to reach a suitable granularity. In this talk, we will present a reordering strategy to increase off-diagonal block sizes. It enhances BLAS kernels and allows to handle larger tasks, reducing runtime overhead. Finally, in order to improve the efficiency of the sparse update kernel for both BLR (block low rank) and HODLR (hierarchically off-diagonal low-rank), we are currently investigating the BDLR (boundary distance low-rank) method to preselect rows and columns in the low-rank approximation algorithm. } 
    }
    



BACK TO INDEX




Disclaimer:

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Les documents contenus dans ces répertoires sont rendus disponibles par les auteurs qui y ont contribué en vue d'assurer la diffusion à temps de travaux savants et techniques sur une base non-commerciale. Les droits de copie et autres droits sont gardés par les auteurs et par les détenteurs du copyright, en dépit du fait qu'ils présentent ici leurs travaux sous forme électronique. Les personnes copiant ces informations doivent adhérer aux termes et contraintes couverts par le copyright de chaque auteur. Ces travaux ne peuvent pas être rendus disponibles ailleurs sans la permission explicite du détenteur du copyright.




Last modified: Tue Apr 4 11:58:35 2023
Author: ramet.


This document was translated from BibTEX by bibtex2html