Natural Language Processing (NLP) Fundamentals: Finite State Transducers (FSTs)
This blog post is part of a series, titled Natural Language Processing (NLP) Fundamentals.
(Disclaimer: I am a developer at Amazon.com for their Alexa technologies, on devices such as Amazon Echo. As a consequence, there may be hints of Amazon sprinkled throughout this blog series. However, I should note that this blog series is not sponsored by Amazon.com, and represent my opinions only, and not those of the company.)
Do you like graphs? So do I! High five! Let’s ease into NLP Fundamentals by talking about Finite State Transducers (FSTs).
Wikipedia describes Finite State Transducers (FSTs) as “a finite state machine with two tapes: an input tape and an output tape.” If that makes sense to you, great! Feel free to stop reading. If you were just as confused as I was, then awesome! Read on.
Let’s start with a visual, because I like images. Here’s an example of a finite state transducer:
I like to think of Finite State Transducers (FSTs) as directed graphs, with a few special properties:
- Edges (“Transitions”) have input/output labels
- Sometimes, labels are “empty” (indicated here as lowercase epsilon ε)
- Traversing through to the end of an FST implies the generation (or indication of) a set of string relations
- There is a defined “initial state” (indicated here in green) and 1 or more “final states” (indicated here in gray)
FSTs are used for a variety of different applications:
- Word Inflections. For example, pluralizing words (cat -> cats, dog -> dogs, goose -> geese, etc.).
- Morphological Parsing; i.e., extracting the “properties” of a word (e.g., computers -> computer + [Noun] + [Plural])
- Simple Word Translation, e.g., translating US English to UK English.
- Simple commands made to a computer
Wait, “commands made to a computer”?
“Simple commands made to a computer” is not as vexing as it might sound. Imagine that I used my iPhone to ask Apple’s Siri to call someone on my contacts list: “Hey Siri, call Guybrush” (Guybrush Threepwood happens to be a dude from this awesome game series). Using my contacts, Siri may have built FSTs to identify my intention to call someone, and specifically, that I want to call Guybrush Threepwood, a contact on my iPhone.
First, Siri might have a giant FST that distinguishes various commands:
We start this FST by branching out to different commands, like “Call Someone”, “Get the Weather”, or “Nickname Myself”. We use the empty label epsilon (ε) to branch out, which is kind of like expressing, “Any of these commands are possible, from the start of the interaction, before the user says anything”. Let’s look at what an FST to “Call Someone” might look like:
Multiple contacts, like LeChuck and Elaine, would branch out to different end states.
I’ll briefly note here that FSTs can be transformed using different operations, like unions, concatenations, and replacements. I won’t talk about them in-depth, but just know that FSTs can be built off of smaller FSTs, via several different operations.
You might be wondering, “this is great Nade, but how does this get expressed in code?” OpenFst is an open source library that is used for “constructing, combining, optimizing, and searching weighted finite-state transducers (FSTs)”. The website has some great documentation on getting started.
We could express the “Call Someone” FST using C++:
// Create a simple mutable FST (a "vector FST") StdVectorFst fstCallSomeone; // Adds state 0 to the initially empty FST and make it the start state. fstCallSomeone.AddState(); // 1st state will be state 0 (returned by AddState) fstCallSomeone.SetStart(0); // Set state 0 as the start state // Adds one arc exiting state 0. // Arc constructor args: input label, output label, weight (0 for no weight), destination state ID. StdArc arcCallOther(3, 1, 0, 1); // Label 3: "Call", Label 1: "Other" fstCallSomeone.AddArc(0, arcCallOther); // From state 0, add the Call:Other arc // Add arcs for each contact - Guybrush:Contact, LeChuck:Contact, and Elaine:Contact StdArc arcGuybrushContact(4, 2, 0, 2); // Label 4: "Guybrush", Label 2: "Contact" StdArc arcLeChuckContact(5, 2, 0, 3); // Label 5: "LeChuck" StdArc arcElaineContact(6, 2, 0, 4); // Label 6: "Elaine" // Adds Contact arcs after "Call". fstCallSomeone.AddArc(1, arcGuybrushContact); fstCallSomeone.AddState(); // Add a state (#2) after the arc for Guybrush fstCallSomeone.SetFinal(2, 0); // Set state #2 as a final state with no weight (0) fstCallSomeone.AddArc(1, arcLeChuckContact); fstCallSomeone.AddState(); // Add a state (#3) after the arc for LeChuck fstCallSomeone.SetFinal(3, 0); // Set state #3 as a final state with no weight (0) fstCallSomeone.AddArc(1, arcElaineContact); fstCallSomeone.AddState(); // Add a state (#4) after the arc for Elaine fstCallSomeone.SetFinal(4, 0); // Set state #4 as a final state with no weight (0)
Alternatively (and this might look nicer to most!), we could use a text file to express this FST:
# arc format: sourceID destinationID input output # final state format: stateID $ cat >callSomeone.fst <<EOF 0 1 Call Other 1 2 Guybrush Contact 2 1 3 LeChuck Contact 3 1 4 Elaine Contact 4 EOF
That’s all from me, folks! If you’re hungry for more on FSTs, you might be ready for a more formal treatment on the subject. So here are a couple of resources to get you started:
(Book) Speech and Language Processing, Jurafsky and Martin, 2nd Ed. (Chapter 3, “Words and Transducers”)
(Software) OpenFst Library
(Software) Thrax (grammars as FSTs, using OpenFst)
Please feel free to leave a question or a comment below. Also, I’d be happy to correct any errata here (I am still a newbie, after all!). Thanks everyone for reading!