This session will be about intuitive grammar. Our speech continues to change and from time to time grammarians try to fit the current state into a framework. This order was for many centuries the prescriptive grammar used for learning Latin. This still works very well for some purposes, such as PIE syntax. In session eight we found that Aspect had to be added to Tense and Modality to properly describe English verbs. Modern grammar theories are attempts to tease out preconscious syntax rules, for example by breaking the rule.
The Latin idea of “tense” as part of a verb sometimes works extremely badly. Japanese has adjectives that conjugate. Arabic treats the “imperfect” as meaning present and future. Chinese has no tenses at all, so no conjugations, and no morphology, so no declensions. Eight parts of speech can be described, but Chinese words often cross the boundaries. “Shàng” is basically “up” and my tutor describes it as a “preposition”, though it is often a postposition. More problematically, it is often a verb or noun! “Shàng kè” means “start a class”. Trying to impose a Latin-based grammar is really no use at all in this situation. If our ancestral language had been Mandarin, we would never have heard of conjugations or declensions.
At Manchester I was offered many competing grammar theories, with Chomsky’s Minimalism being the most favoured. It is a rigorous theory of syntax, but can exclude semantics; for example “Colourless Green Ideas Sleep Furiously” is syntactically OK. Halliday’s Systemic Functional Linguistics is a theory of meanings, but a large amount of terminology is needed to understand it. I remain perplexed about grammar theories, but favour approaches that work on speech, not text.
Text uses sentences: text may be scanned repeatedly: they mark boundaries with spaces, a capital letter and full stop; they must have subject and verb; standard spelling is required.
Speech uses utterances: sound that rapidly disappears from a 7-bit working memory; boundary markers absent; phrases may have no subject and verb; variable phonology.
The incoming speech stream has to be processed by a tiny working memory and transferred to semantic memory. Here is a diagram:

Speech comprehension in the brain occurs partly in Wernicke’s area. It is possible it works like submarine detectors, which compare lots of ocean sounds against a library, without assuming internal structures. The submarine computer needs only to select the best fit, e.g. “incoming torpedo”, “seals mating” or “toilet flushing”. ASR was developed from recognising enemy submarines and does not assume a grammar capability in the listener. It seems more likely that Wernicke’s area picks out the 44 phonemes of English from the acoustic stream.

Phonological processing has to contend with approximations to the phoneme. Schwa is never used in the written form but it replaces 50% of vowels in speech. The lack of word boundaries in speech allows this well-known humour: “we’ll need fork ‘andles, cos we’ll be digging and it’ll be dark”.
Distinctive Feature Theory proposes that recognition applies below the phoneme level. We are familiar with Voicing as a distinctive feature: a velar stop can be either +VOICE in which case we hear “g”, or -VOICE, so we hear “k”. DFT treats all sound features as either present as or absent, e.g.: [+/− ROUND], [+/− SYLLABIC], [+/− NASAL]. There is a lecture on this at http://www.youtube.com/watch?v=8Pa0pm8a_dY.
Assembling phonemes into candidate words is the next step up in decoding an utterance. Speech processing in the brain may be like a speech to text processor. https://machinelearning.apple.com/research/siri-voices shows how a computer can “deep learn” acoustic sonograms and statistically predict what sound will come next. Its algorithm is a Finite State Machine (see e.g. http://www.gleech.org/accommodation). An FSM for checking if phoneme strings are English-like is shown om the diagram below. shows allowed English phoneme transitions..

The seven item WM buffer probably means that utterances are processed before they are complete. This contrasts with Latin or PIE text, where we have to wait for the final verb. The brain may run parallel candidate utterances. Most of what we are likely to say is predictable. If the phoneme string seems to be “bread ‘n . . “, the most likely candidate will probably be “. . . butter”. However, “. . . Marmite” and ” . . marmalade” have to run alongside it until these can be excluded to get a best fit. This left-to-right, competing hypotheses, approach is called a Markov process.
A “garden-path” sentence is a grammatically correct sentence that starts in such a way that a reader’s most likely interpretation will be incorrect; the reader is lured into a parse that turns out to be a dead end or yields a clearly unintended meaning:
“The old man the boat.”
“The complex houses married and single soldiers and their families.”
“The horse raced past the barn fell.”
Other candidate utterances have to be considered if a “garden path” is detected. The preconscious speech area will continue until it can present the conscious brain with “what the speaker said”.