1.We simply move from using a linear-chain factor graph to a more general factor graph, and from forward-backward to more general (perhaps approximate) inference algorithms(compute Z).
2.Inference:The are two inference problems that arise. First, after we have trained the model, we often predict the labels of a new input x using the most likely labeling y* = arg max(y) p(y|x). Second, as will be seen in Chapter 4, estimation of the parameters typically requires that we compute the marginal distribution for each edge p(yt , yt−1 |x), and also the normalizing function Z(x).