Because I am interested in Conversational User Interfaces and I am working with them in my job and my university studies, I wanted to start a series about them on my blog. This is the first part of it to start with the basic and some initial definitions.
A conversational user interface, short CUI is a type of user interface in which the user and the system communicate with one another similarly to how two humans would communicate. The two primary modalities for CUIs are text and voice, but there are also different multi-modal variations.
The most prominent CUIs are Amazon Alexa, Google Assistant, and Apple’s Siri. These three commercial offerings are present on millions of mobile devices and resemble the category of so-called “smart assistants” and are multi-model CUIs. Other categories would be for example chatbots with which you interact on websites or messenger apps and Voice User Interfaces focusing on voice-based interactions e.g. over a phone call.
General CUI Architecture
I would divide a CUI into the following components that can be found in basically any CUI in some form or another.

In the following, I will write some words about the different components and their functionalities from my point of view.
Channel Gateway
The Channel Gateway is the entry point of a CUI. It is responsible for handling the communication between the user and the Dialogue Control of the system.
Dialogue Control
The Dialogue Control is the heart of a CUI that is responsible to determine how the CUI should react to the user input. This can be done in various ways from rule-based approaches over machine learning models to rule and ML hybrid models. The first step is here always to classify what the user wants to achieve with his or her input. This step is usually called Intent Classification. Depending on the use case, entities may be extracted as well (an entity would be e.g. a number or a street name). These results are then used to determine how the system should respond to the user.
NLU
Natural Language Understanding (NLU) is a sub-category of Natural Language Processing focused on Intent Classification and Named Entity Extraction tasks. The NLU component of a CUI is responsible to handle these two tasks.
Response Generation
The Response Generation component of a CUI is responsible as the name suggests, for the generation of the responses that the CUI returns back to the user to answer his inputs. The literature often also calls this the NLG component which stands for Natural Language Generation. However, because most systems use static responses that have been manually defined instead of “actual” NLG as it is defined in the research, I prefer the term Response Generation.
Custom Code
A CUI also needs some way to interact with other software components e.g. to fetch data from a database or to call a REST API. The ability to create such functionalities is what I mean by the Custom Code component. Different CUI frameworks have different names for this feature.
Events
Events are a good way to monitor the interactions of your CUI with its users and to measure its performance e.g. with KPIs. For me having access to events is essential if you want to run your CUI in a production environment.
Sources and Resources
“The Conversational Interface” from Michael F. McTear https://link.springer.com/book/10.1007/978-3-319-32967-3
“Designing Voice User Interfaces” from Cathy Pearl https://www.cathypearl.com/book