How does the second flip-flop in a naive synchronizer "prevent a metastable state from propagating"?

Question

In this very nice answer it's explained that, fundamentally, a two flip-flop synchronizer's basic operation is to prevent the propagation of a metastable state (effectively, an invalid logic level) from propagating down a system. I still don't totally follow how this actually works though.

Suppose the worst happens and FF1 enters a metastable state. Its output is floating somewhere near mid-rail but (slowly) moving one way or the other. Now based on the required MTBF we arrange for some \$T_c\$ in the receiving system such that it is very likely that the output has settled before launching the FF1 output to the FF2 output.

I have two questions:

Q1) If we didn't have FF2, what would go wrong? Is the issue that driving combinational logic with a potentially invalid input burns power? That is, I just want to be clear that the role of a synchronizer is not (necessarily) to ensure that the correct value is captured so much as that some valid logic value is captured.

Q2) Since the argument above is probabilistic, it's possible that FF1's output will not have settled when FF2 launches and we thus get an invalid D2 input when FF2 launches. Does this basically mean FF2's Q2 output will be (potentially) metastable/mid-rail and thus we're led to whatever problem answers Q1? If this is correct, is the precise statement to be made that FF2 "prevent[s] a metastable state from propagating [with a high degree of probability]", whereas a single-flip-flop solution would be guaranteed to have a metastable state propagate?

Give a moment to this page. Especially as it applies to Cliff Cummings. — periblepsis, Jan 14 '24 at 00:24

score 2 · Accepted Answer · answered Jan 14 '24 at 01:48

2

Q1 -- likely the additional power from the signal being in the metastable state is not significant. The main unwanted effect of metastability is that the perceived value (by a subsequent logic gate) may change between clock cycles which is not what is intended by using a DFF in the first place. This may upset the performance (logic correctness) of the downstream circuits.

Similar to the above, different logic gates being driven by the same metastable signal may have different opinions on what the logic value is -- one may thing it's TRUE, while the other may consider it FALSE -- which may happen depends on the electrical 1/0 voltage threshold of that individual gate sample (and this also depends on the gate type -- inverter, NOR, ... etc.)

What DFF2 does it reduce (but not 100.0 % eliminate) the probability of a metastable start propagating. Consider that a single DFF may reduce the probability by a factor of 10^9; then two cascaded ones will reduce the probability by another factor of 10^9, for a total of 10^-18. So a 1 GHz signal clocked continuously would then have an MTBF of about 10^9 s -- over 30 years.

answered Jan 14 '24 at 01:48

jp314

18,902
18
46

And cryptographers use 3 or 4 because they know it's good to have several orders of magnitude of safety margin :) 10^-18 events happen once per second somewhere in the world. – user253751 Jan 14 '24 at 08:42
Thank you for this very nice answer. Two quick follow ups before I accept: 1) I'm not sure I totally follow what you mean by "The main unwanted effect of metastability is that the perceived value (by a subsequent logic gate) may change between clock cycles which is not what is intended by using a DFF in the first place. " Would it be possible to give a quick example? That is, let's suppose we only use FF1 and it captures a metastable output state. Now the combinational circuit behind FF1 will see this invalid level at its (the combinational circuit's) inputs and you say that this will... – EE18 Jan 14 '24 at 14:57
...potentially ruin the logical correctness of the outputs of this combinational circuit. But how is this any different than having FF2 which, we acknowledged, may or may not capture the intended logic level? That is, I'm still not sure I follow what the "problem" is (if not power) if whether we get an invalid logic level or a valid logic level, we may not have captured the intended logic level? 2) I think I perfectly understand your answer here but just want to confirm -- is the idea that if we only have FF1 then our probability of a metastable output given a metstable output of FF1... – EE18 Jan 14 '24 at 15:00
... is exactly 1, whereas if we add FF2 (or FF3 etc. as you say) then that conditional probability (again given a metastable output of FF1) is, say, $10^{-9}$ (or better)? – EE18 Jan 14 '24 at 15:01
If you allow a metastable state to propagate (say by only using a single DFF), then if that signal drives (e.g. fan-out) to separate logic gates, then EACH of those may have a different interpretation of the logic level. This 'illogical' condition may not have been expected in the logic design and can cause erroneous behavior. – jp314 Jan 14 '24 at 18:05
Got it, thank you! – EE18 Jan 14 '24 at 23:00
1

@EE18 An example. The classic problem is an interrupt arriving at an MCU. Should the next instruction address come from the program counter, or the interrupt unit? If those two units read a interrupt latch output differently due to metastability, then neither, or both, might attempt to drive the address bus, resulting in a crash. This happened a lot to early PCs before it was 'fixed'. – Neil_UK Jan 15 '24 at 09:44
@Neil_UK Fantastic, thank you so much for that example. I've noted it away in my text here :) – EE18 Jan 15 '24 at 16:08
Hi Neil, hope all is well. Just returning to this here. Would it be possible to confirm with respect to your example why exactly two units might read the metastable latch output (i.e. the output in the "forbidden zone") differently. Both see the same (analog) voltage right? Are you just saying that it's possible for some reason or another that propagation of that input (analog) voltage could resolve to a 1 or 0 (HIGH or LOW voltage) in each unit? @Neil_UK – EE18 Feb 15 '24 at 18:42
@EE18 There are two ways two ways this can happen, voltage, and time. Say the output stays at 50% VCC for a while. This is precisely where different receiving gates can have different decision thresholds. It's why '1' and '0' have such a margin between them, CMOS only guaranteeing >70% and <30% respectively. It would be unremarkable for one gate to switch around 45% and another 55%. Then time. Say the metastable output resolves to a solid logic level, after another gate has read it in the other state. That way, even the same receiver can read it differently on two different clock cycles. – Neil_UK Feb 15 '24 at 19:49
Crystal clear, thank you so much as always! @Neil_UK – EE18 Feb 15 '24 at 23:13

Neil_UK · Answer 2 · 2024-01-15T09:37:38.570

How does the second flip-flop in a naive synchronizer "prevent a metastable state from propagating"?

It doesn't. It just makes it more unlikely, usually much more unlikely.

If you have the potential for metastability (ie you can't meet your setup and hold times, or your input logic levels), then the only thing you can do is wait long enough so that the probability of metastability propagating into your system is low enough to be ignored. You might set your acceptability threshold at one event per the age of the universe.

One flip-flop by itself is not really worthy of the name synchroniser. The badly timed edge could appear at any time before the clock pulse. On average, it will appear half way between clock pulses. In the worst case, it will appear right at the clock edge, resulting in zero time to allow the situation to resolve. One flip-flop by itself offers no minimum guarranteed waiting time.

If you add a second flip-flop after the first, then you have guarranteed a whole clock cycle to reduce the probability of a metastability event propagating.

It turns out that most flip flops families are fast enough internally that they can reduce the probability of metastability propagation by many, many orders of magnitude in one cycle of their fastest usable clock. Often a single extra flip-flop is enough to satisfy a casual designer that they have 'fixed the problem'.

If you now go looking for metastable events, you should be able to find them at measurable rates if you stress the system. Rather than having edges arrive randomly, control them using delay feedback to the tiny fraction of time around the clock edge where they will cause trouble. This will increase the probability of seeing an event by orders of magnitude. Secondly, increase the clock rate of the system to reduce the resolution time. Plot the rate of bad events versus the system clock frequency to quantify how quickly metastability is resolving in your system. This will allow you to extrapolate a measurable rate down to a ridiculously low target such as once in the age of the universe.

If after adding a second flip-flop, you find that some problem events are getting through, add a third, or a fourth. It only costs you latency.

Thank you for this very nice and helpful answer. I've accepted the other just because it came first, but this was very useful for me. — EE18, Jan 14 '24 at 23:04

How does the second flip-flop in a naive synchronizer "prevent a metastable state from propagating"?

2 Answers2