Assignment3: potential of sve2 application in ffmpeg

osd600

·

2 min read

SVE is Scalable Vector Extension for Arm AArch64. In comparison to Neon architecture extension, it allows for SIMD vectorization with flexible length vectors. SVE2 has all features of SVE, as well as it's own improvement of Neon features).

Enhanced vector register series Z0-Z31 can be implemented from 128 bits up to 2048 bits (with 128 increment).

This is very useful for HPC (high performance computing) and ML (machine learning), which operates on large amount of data.

Scalable Vector registers Z0-Z31

SVE2 has new instructions to further accelerate algorithms in areas of computer vision, multimedia (such as DSP ), genomics, in-memory database, web serving, and other general-purpose software.

SVE2 enables optimizations for emerging applications beyond the HPC market, for example, in Machine Learning (ML) (UDOT instruction), Computer Vision (TBL and TBX instructions), baseband networking (CADD and CMLA instructions), genomics (BDEP and BEXT instructions), and server (MATCH and NMATCH instructions).

In previous blpgpost about ffmpeg we could see example that they use vectorization in DSP (digital signal processing), for example in h264dsp_neon.S file. SVE2 offers better DSP support. Perhaps something like new Gather-load and Scatter-store feature of SVE2 could optimize video processing speed for H.246.

LDFF1D {<Zt>.D}, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #3]

Where:

Zt are the vectors, Z0-Z31
D, vector and predicate registers have known element type but unknown element numbers
Pg are the predicates, P0-P15
Z is the zeroing predication
Zm is gather-scatter or vector addressing

Also, werever ffmpeg uses ADD command to add limited size vector, it could switch to uze Z-type variable size vector with:

ADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Where:

M is the merging predication

Or, even simpler, take UADDW (neon instruction) in h264dsp_neon.S permalink to exact line on git, and examine if it could be changed to UADDLB (SVE2 instruction, Unsigned add long – bottom)

UADDL vs UADDLB

Vector Letngth Agnostic instruction set Is now provided in SCE2. Developers do not have to know the vector length implementation for their system.

Source