Large language Models: A Calculator for Words

Published in

The Modern Scientist

8 min readJul 25, 2023

It’s the early 17th century, a mathematician and astronomer named Edmund Gunter faced an astronomical challenge like no other. Calculating the intricate movements of planets and predicting eclipses demanded more than just intuition — it called for the mastery of complex logarithmic and trigonometric equations. So, just as any savvy innovator would do, Gunther decided to build it from scratch! He created an analog calculating device that would eventually become what is known as a slide rule.

A rectangular wooden block, measuring 30cm in length, the slide rule comprises of two components: the fixed frame and the sliding portion. The fixed frame holds the stationary logarithmic scales while the sliding portion houses the movable scale. To use a slide rule, you needed to understand the basic principles of logarithms and how to align the scales for multiplication, division, and other mathematical operations. You had to slide the movable portion to set the numbers in alignment, read the result, and account for decimal point placement. Yikes, that’s really complicated!

Approximately 300 years later, the first electronic desktop calculator, the “ANITA Mk VII,” was introduced by the Bell Punch Company in 1961. Over the next couple of decades, electronic calculators became more sophisticated with additional features. Jobs that previously required extensive manual calculations experienced a significant reduction in labor hours, allowing employees to focus on more analytical and creative aspects of their work. As a result, the modern electronic calculator not only reshaped job roles but also paved the way for increased problem-solving capabilities.

The calculator was a step change for how math is done. What about language?

Think about how you generate sentences. You first need to have an idea. Next, you need to know a bunch of words (vocabulary). Then you need to be able to put them in a proper sentence (grammar). Yikes, again, pretty complicated!

How we generate words for language has been fairly consistent going back 50,000 years, which is around the time when modern Homo-sapiens first created language.

It’s fair to say we’re still in Gunther’s era of using a slide rule when it comes to generating sentences!

If you think about it, using proper vocabulary and grammar is basically just adhering to rules. The rule of language.

This is similar to math. It’s filled with rules. Hence why I can be certain that 1+1=2 and why calculators work!

What we need is a calculator but for words!

Yes, different languages follow different rules, but rules need to be somewhat followed for it to be comprehensible. One obvious difference between language and mathematics is that math has fixed answers whereas the number of plausible words that can fit into a sentence can be large.

Try completing the following sentence: I ate a _________. Imagine the possible words that can come next. There are roughly 1 million words in English. A lot of them can be used here but definitely not all.

Answering “black hole” would be equivalent to saying 2+2=5. Also, answering “apple” would not be accurate as well. Why? Because grammar!

In the last couple of months, Large Language Models (LLM) has taken the world by storm. Some call it a breakthrough in natural language processing, while others see it as the dawn of a new era in artificial intelligence (AI).

LLMs have proven to be remarkably adept at generating human-like text, raising the bar for language-based AI applications. With a vast knowledge base and contextual understanding, LLMs can be employed across diverse fields, from language translation and content generation to virtual assistants and customer support chatbots.

The question is: are we currently at the same inflection point with LLMs as we were in the 1960s with the electronic calculator?

Before we answer that question, how do LLMs work? LLMs are based on transformer neural networks, which are used to calculate and predict the word that would best follow next. To build a powerful transformer neural network, you need to train it on enormous amounts of text data. This is why the “predict next word/token” approach has worked so well: there’s lots of easily obtained training data. An LLM takes, as input, the entire sequence of words, and predicts which word is most likely to come next. In order to learn what is most likely to come next, they swallow all of Wikipedia as a warmup exercise, before moving on to piles of books and finally the whole internet.

We’ve established earlier that language contains rules and patterns. The model implicitly learns these rules from going through all these sentences for which it would use to accomplish the task of predicting the next word.

After a singular noun, there’s an increased probability that the next word will be a verb ending in “s”. Similarly, when reading Shakespeare, there’s an elevated chance of seeing words like “doth” and “wherefore”.

During training, the model learns these patterns in language, eventually becoming an expert!

But is that enough? Is learning linguistic rules enough?

Language is complicated, a single word could mean multiple things based on context.

Ergo, self-attention. In simple terms, self-attention is a technique used by LLMs to understand the relationships between different words in a sentence or a piece of text. Just like you pay attention to different parts of the story to make sense of it, self-attention allows LLMs to give more importance to certain words in a sentence while processing information. This way, the model can better understand the overall meaning and context of the text, rather than blindly predicting the next word solely based on linguistic rules.

If LLMs are calculators for words, just predicting the next word, how can it answer all my questions?

When you ask a Large Language Model to do something clever — and it works — there is a really good chance that you have asked it to do something that it has seen thousands of examples of. And even if you come up with something really unique like:

“Write me a poem about an orca eating a chicken”

Amidst the waves, a sight unseen, An orca hunts, swift and keen, In ocean’s realm, the dance begins, As a chicken’s fate, the orca wins.
With mighty jaws, it strikes the prey, Feathers float, adrift away, In nature’s way, a tale is spun, Where life and death become as one.
~ ChatGPT

Pretty good huh? Thanks to its self-attention mechanism, it can effectively blend and match relevant information to construct a plausible and coherent response.

During the training process, LLMs learn to recognize patterns, associations, and relationships between words and phrases in the data they are exposed to. As a consequence of this extensive training and fine-tuning, LLMs can exhibit emergent properties, such as the ability to perform language translation, summarization, question-answering, and even creative writing. These capabilities often go beyond what was explicitly programmed into the model and can be amazingly good!

Are Large Language Models intelligent?

The electronic calculator has been around for more than six decades now. The tool itself has improved leaps and bounds but it was never deemed intelligent. Why?

The Turing Test — The Turing Test is a deceptively simple method of determining whether a machine exhibits human-like intelligence: If a machine can engage in a conversation with a human in a manner indistinguishable from a human, it is considered to possess human intelligence.

The calculator was never subjected to the Turing test because it doesn’t communicate in the same language humans do, only the language of mathematics. On the other hand, LLMs generate human language. Its entire training process revolves around mimicking human language. Therefore, there shouldn’t be any surprise that it can “engage in a conversation with a human in a manner indistinguishable from a human”.

So, it’s a bit tricky to describe LLMs using the word “intelligent” because there’s no clear agreement on what intelligence really means. One way to consider if something is intelligent is if it does stuff that’s interesting, useful, and not super obvious. LLMs do fall into this category. Unfortunately, I completely disagree with this explanation.

I define intelligence as the ability to expand the frontiers of knowledge.

As of the time of this writing, a machine trained to predict the next token/word is still not able to expand the frontiers of knowledge.

It can, however, interpolate on data that is has been trained on. There is no explicit understanding of the logic behind the words nor the knowledge tree that exists. As a result, it will never be able to generate outlier ideas and make leaps of insight. It will always provide coherent answers that are somewhat the average response.

So, what does this mean for us humans?

We should treat LLMs more like a calculator for words. Never completely outsource your thinking to a language model.

At the same time, as these models get exponentially better, we may feel increasingly overwhelmed and insignificant. An antidote to that would be to always be curious about seemingly disconnected ideas. Ideas that seem incoherent on the surface but make sense based on our interactions with the environment around us. The goal is to live at the edge of knowledge, creating and connecting new dots.

If you function at this level, all forms of technology, whether a calculator or a large language model, becomes a tool at your disposal rather than an existential threat you need to be worried about.

Large language Models: A Calculator for Words

Written by Darveen Vijayan