When I was checking the source code of ARM7-TDMI, a very unusual state machine caught my attention. It was used in the codes of the ALU's multipliers. After doing some researches, I finally figured out that it was the so-called 'One-hot State Machine'. I have heard about this kind of state machine when I was in school, but all I had known about it was nothing beyond a coding style of state machine, somehow just like what Gray Code is to Binary Code, nothing special but different. However, such superficial knowledge was soon overturned after I had read volumes of technical resources about it.
I. What is One-hot Encoding
One-hot Encoding is not mysterious at all. For instance, 4'b0001, 4'b0010, 4'b0100 and 4'b1000 form up a very basic group of One-hot codes. The common feature of this encoding style is that, only one bit in the code is 1, while the rest bits are 0s. It is genuinely self-explaning, right?
II. Why do we need One-hot Encoding
First of all, obviously, this encoding method is extremely resource consuming. Let's make it clearer with a simple example. Provided that we are going to design a state machine with 16 states in all. By using normal Binary Encoding, we need 4 bits in total for encoding, where 4 bits reflects 4 registers in term of physical resources. Similar result should be expected when it comes to the Gray Code. By contrast, with One-hot Code applied, 16 registers have to be used to signal 16 states, 4 times of that taken by Binary or Gray Code. Predictably, when the amount of states climbs up, the number of registers that wasted (if take Binary Code as the resource-tight solution) by One-hot State Machine would surge exponentially, and thus became an absurdly unpractical method when compared with its two counterparts. So far, the knot remains: why we still need One-hot Code after all?
To answer this question, let's ask another question instead: Why did the Binary Code save that many of registers? Still remember the first example, in which we used only 4 registers to indicate 16 states. How did it make it? This is because that very complex and lumpy combinational logics are wired in betweens of these 4 registers to decode the very complicated transition relationships. More specifically, in regards to any bit of 1, the possible target are 8 state. Thus, the combinational logic before the input (or D port in standard register model) must cover 8 different cases in total. When states increased or similar patterns reduced, the complexity of the combinational logics became even severer. Result from this effect, the latency between registers boost up, and eventually the maximum frequency of the design crashed. The story of One-hot Code is reversed. For in the case of One-hot Code, each bit of 1 only reflects to 1 state. As a result, the decoding parts are much simpler and not influenced by the number of states. In short, this kind of design could be extremely fast at the cost of size and other things. You should be always very aware of using it, because a tradeoff between speed, power and size are to make when you want an optimal outcome.
III. How to design One-hot State Machine Correctly
Now we are going to talk about the correct coding style for a real One-hot State Machine. Shall we just replace the normal binary encoded states with one-hot encoded states? Of course NOT!!! I have seen people doing such wrong thing in industrial standard of designs, but such wrong work usually performs even poorer than normal Binary Code State Machine--It wastes many times of registers and delivers unpredictable performance (all depending on the optimisation level of the synthesis tools). So, it is very important to master the correct coding style. Let's start with the following standard verilog source code.
[source lang="verilog"]
localparam IDLE = 0;
localparam RUN = 1;
localparam DONE = 2;
reg [2:0] state;
always @ (posedge clk or negedge rst_n) begin
if (!rst_n) begin
state <= 3'd1;
end
else begin
state <= 3'd0;
case (1'd1) // synthesis parallel_case full_case
state[IDLE]: begin
if (go) state[RUN] <= 1'd1;
else state[IDLE] <= 1'd1;
end
state[RUN]: begin
if (finished) state[DONE] <= 1'd1;
else state[RUN] <= 1'd1;
end
state[DONE]: begin
state[IDLE] <= 1'd1;
end
default: begin
state[IDLE] <= 1'd1;
end
endcase
end
end
[/source]
To make the purpose clearer, the statement ‘state <= 3’ is added to the very beginning. Actually, 3'd0 is the only illegal state (Assertion monitoring such state is strongly recommended) for a One-hot State Machine, thus 1 bit in the 3 must be set to 1'd1 in the following case statement. As there is no need to set priority to these cases, a synthesis control statement "// synthesis parallel_case" should be added to reach the best result.
You might also be confused by the ‘case(1'd1)’ statement. To make your life easier, you could just memorise it. If you really need to know why, my answer would be: it is just an expression. For normal case(signal) statement, we use case(signal) 1'd0: event_a; 1'd1: event_b; endcase to tell the compiler that our purpose is to compare the value of the signal to 1'd0 and 1'd1, then make corresponding event happen when any case matches. Put the same logic onto the phrase case(1'd1) signal_a: event_a; signal_b: event_b; endcase, the compiler is told to compare signal_a and signal_b's values against 1'b1 respectively, and then make the chosen event happen when any signal matches.
And, last but not the least, the tool provider has optimised their software regarding to such 'standard expression', please just follow it, your creativity should never be wasted at this.
第十二行的case(1’d1)是什么意思?
@muhu,
普通状态机的典型写法是
case(signal_name)
1’d0: begin end
1’d1: begin end
表示,当signal_name = 1’d0时怎样,signal_name = 1’d1时怎样
而独热码是反过来的
case(1’d1)
signal_name[0]: begin end
signal_name[1]: begin end
表示,当为1’d1的信号是signal_name[0]时怎样,当为1’d1的信号是signal_name[1]时怎样。由于独热码的编码结构决定了signal_name[0]和[1]不可能同时为1,因此他是可以正常工作的。
请问是不是写成两段式的状态机这种编码方式才有效。一段式的话,如何确保在状态跳转时,之前的状态索引位清0了?例如状态从IDLE跳到RUN,那么state[IDLE]这时候还是1吗
这个跟一段式还是两段式关系不大,只要always块里用的是<=赋值,就可以保证状态跳转之后,前一个状态的索引位被清零。else begin下面的第一句state <= 3'd0是用来完成清零的,而所有的位的下一个状态都会在下一个时钟沿同时生效,不管是从0到1还是从1到0。
谢谢你的回答,原来在非阻塞里,state <= 3'd0也可以实现清零。^_^