Xilinx provides an FIR filter compiler on their site, where Vivado appears to generate the desired code. You can find detailed explanations here:
https://docs.xilinx.com/r/en-US/pg149-fir-compiler/Coefficient-Padding?tocId=8Msr0aIMrXPOYpxOCiIYjQ
However, just as software compilers might not always optimize for minimal code output, the same holds true for FPGAs. The likelihood of pre-existing compilers generating code that fits within xc6slx9 is low.
I personally combine individual memories and multipliers, probably a strategy also used by olo111. This guy has published a code for double oversampling, which could be insightful. The coefficients appear to be 16 bits:
https://audio-diy.hatenablog.com/entry/FIR_x2_howtouse
Primarily, memory is used as a ring buffer, facilitating multiplication with the filter coefficients. During oversampling, the calculations for inserted zeros become unnecessary, reducing computational load. However, the output frequency increases, which means the computation time remains constant.
For a standard FIR filter, the principle remains similar.
It can be implemented using four multipliers, a few memories, and adders. It would not always be necessary to deliberately employ a half-band filter. Individual code implementations might slightly differ based on input and output frequencies, but the underlying principle remains the same.