The field of spoken language processing (SLP) typically treats speech as a stimulus-response process, hence there is strong interest in the SLP community in using the latest machine learning techniques to estimate the assumed static transforms. This is especially true at the present time as evidenced by the huge growth in research using deep neural nets. However, in reality, speech is not a static process - rather it is a sophisticated joint behaviour resulting from actively managed dynamic coupling between speakers, listeners and their respective environments. Multiple layers of feedback control play a crucial role in maintaining the necessary communicative stability, and this means that there are significant dependencies that are overlooked in contemporary SLP approaches. This talk will address these issues in the wider context of intentional behaviour, and will give an insight into the implications of such a perspective for the next generation of computational models for spoken language processing.