In previous works, we have described an efficient static scheduling based on a mixed 1D/2D block distribution with local aggregation for a parallel supernodal version of sparse $LL^T$ factorization. In this paper, we present new algorithms suitable for overcoming architectures based on clusters of SMP nodes and also technics to keep a good memory scalability. This algorithms are implemented in the PaStiX library that achieved high performances (resolution of a $26.10^6$ unknown system on 192 ES45 quadriprocessors with 35 percent of peak performance).