面向FT-M7002平台拉普拉斯算法的优化实现

OPTIMIZED REALIZATION OF LAPLACIAN ALGORITHM FOR FT-M7002

  • 摘要: 为了充分发挥国产FT高性能处理器的平台优势,针对其对拉普拉斯算法进行并行优化,在数据迁移的基础上使用DMA数据传输机制解决数组矩阵转置、数据访问不连续以及数据传输存在时间间隙的问题,提高了程序性能,充分发掘了程序的数据级和指令级并行性。实验结果表明,优化后的向量化并行算法较优化前,获得了2.02~2.55倍的加速效果。相较于TMS320C6678处理器,FT优化之后的算法可达到其1.48~2.56倍的效果。

     

    Abstract: In order to give full play to the platform advantages of domestic FT high-performance processor, we optimize the Laplace algorithm in parallel for it. On the basis of data moving, DMA data transfer mechanism was used to solve the problems of array matrix transpose, data access discontinuity and data transfer time gap, so as to improve the performance of the program and fully explore the data level and instruction level parallelism of the program. The experimental results show that the optimized vectorization parallel algorithm achieves 2.02~2.55 times faster acceleration than before. Compared with TMS320C6678 processor, the efficiency of FT optimized algorithm can reach 1.48~2.56 times.

     

/

返回文章
返回