Project

General

Profile

Different multiplexer strategies resource utilization

Introduction

In felix we use many multiplexers to concentrate data inside the FPGA. In the code different mux coding stiles have been used. The purpose of this study is to determine whether one VHDL representation results in a more efficient logic representation than the other.

Toplevel

At toplevel for this investigation, a top vhdl file instantiates 4 different multiplexers with the same behaviour, and the same port map. Only the generic muxtype determines the internal 8:1 mux being used. This mux is used in Felix at multiple places in the Central Router.

entity MUX8_16bit_sync is
Generic(
    muxtype: integer := 0
    );
Port ( 
    clk      : in  std_logic;
    data_rdy : in  std_logic_vector(7 downto 0);
    data0    : in  std_logic_vector(15 downto 0);
    data1    : in  std_logic_vector(15 downto 0);
    data2    : in  std_logic_vector(15 downto 0);
    data3    : in  std_logic_vector(15 downto 0);
    data4    : in  std_logic_vector(15 downto 0);
    data5    : in  std_logic_vector(15 downto 0);
    data6    : in  std_logic_vector(15 downto 0);
    data7    : in  std_logic_vector(15 downto 0);
    ---------
    sel      : in  std_logic_vector(2 downto 0);
    ---------
    data_out : out std_logic_vector(15 downto 0);
    data_out_rdy : out std_logic
    );
end MUX8_16bit_sync;

Mux instantiations

MUX0: parallel clocked mux

This is the most simple mux that consists of one clocked process that multiplexes the selected 16 bit dataword depending on sel[2:0]. This is also the implementation currently used in Felix.

mux8x: process(clk)
begin
    if clk'event and clk = '1' then
        case sel is 
            when "000" => data_out_p1_s <= data0;
            when "001" => data_out_p1_s <= data1;
            when "010" => data_out_p1_s <= data2;
            when "011" => data_out_p1_s <= data3;
            when "100" => data_out_p1_s <= data4;
            when "101" => data_out_p1_s <= data5;
            when "110" => data_out_p1_s <= data6;
            when "111" => data_out_p1_s <= data7;
            when others =>
        end case;
    end if;
end process;

MUX1, MUX2 and MUX3, different behavioral instantiations of a 1-bit 8:1 mux, inside a for-generate mux.

The code 1-bit MUX is instantiated as follows, in order to represent the same as last paragraph a pipelining stage is added.

g_mux16x8: for i in 0 to 15 generate
begin
    mux: entity work.MUX8(behavioral1)
    port map (
        sel    => sel,
        data0   => data0(i),
        data1   => data1(i),
        data2   => data2(i),
        data3   => data3(i),
        data4   => data4(i),
        data5   => data5(i),
        data6   => data6(i),
        data7   => data7(i),
        data_out => data_out_p0_s(i)
    );    
end generate;

process(clk)
begin
    if clk'event and clk = '1' then
       data_out_p1_s   <= data_out_p0_s;
    end if;
end process;

The different architectures of MUX8 are given here:

behavioral in a process

architecture behavioral1 of MUX8 is
begin

process(data0,data1,data2,data3,data4,data5,data6,data7,sel)
begin

    case sel is 
        when "000" => data_out <= data0;
        when "001" => data_out <= data1;
        when "010" => data_out <= data2;
        when "011" => data_out <= data3;
        when "100" => data_out <= data4;
        when "101" => data_out <= data5;
        when "110" => data_out <= data6;
        when "111" => data_out <= data7;
        when others =>
    end case;

end process;

end behavioral1;

behavioral using with select

architecture behavioral2 of MUX8 is
begin
  with sel select
    data_out <= data0 when "000",
                data1 when "001",
                data2 when "010",
                data3 when "011",
                data4 when "100",
                data5 when "101",
                data6 when "110",
                data7 when others;
end behavioral2;

low_level MUX8

architecture low_level_MUX8 of MUX8 is

signal lut0_out, lut1_out : std_logic;

begin

---------------
lut0_inst: LUT6
generic map (INIT => X"FF00F0F0CCCCAAAA")
port map(     
    I0 => data0,
    I1 => data1,
    I2 => data2,
    I3 => data3,
    I4 => sel(0),
    I5 => sel(1),
    O  => lut0_out
    );
---------------
lut1_inst: LUT6
generic map (INIT => X"FF00F0F0CCCCAAAA")
port map(     
    I0 => data4,
    I1 => data5,
    I2 => data6,
    I3 => data7,
    I4 => sel(0),
    I5 => sel(1),
    O  => lut1_out
    );
---------------
combiner0_muxf7: MUXF7
port map(     
    I0 => lut0_out,
    I1 => lut1_out,
    S  => sel(2),
    O  => data_out
    );
---------------

end low_level_MUX8;

Conclusion

Name Slice LUTs Slice Registers F7 Muxes
Mux0: Clocked parallel process 35 51 16
Mux1: 1-bit combinatorial process 35 51 16
Mux2: 1-bit with select 35 51 16
Mux3: low level instantiation 35 51 16

Bottom line is that independent on how you write down the 16-bit 8:1 MUX, Vivado will understand that it is a MUX and create exectly the same logic cells out of it after implementation.

Also the schematic representation of the different architectures after Vivado implementation looks exactly the same:
MUX16 8:1 comparison after implementation

Add picture from clipboard (Maximum size: 244 MB)