Posted in Feature Dazed MENA issue 00

Algorithms and computational linguistics

What is the language of tomorrow? Communication has changed and it’s not in the ways you expect. Diving into algo-speak we explore the systems behind how and why that is

Text Dee Sharma

Language is like an egg yolk. Just like the yolk; its shape-shifting nature is a direct result of its material conditions. Yolk is also the most unpredictable part of an egg. Its properties can range from fluid to jelly-like, to rigid. The texture of the yolk changes significantly according to how one cooks it. Similarly, language changes meaning when influenced by factors such as censorship, surveillance or any terms and conditions. What temperature is to the egg yolk, freedom of expression is to language. If we want to imagine what the future of our languages looks like, we must explore the effects of its digitisation.  

Defining what language is might seem like such a simple question. For most of us, it is primarily a way of communication. Of expression. It’s a gathering place to share a piece of our minds with people. To share our stories and imaginations. Our love, grief and suffering too. But what happens when this fundamental communicative nature of language is restricted? Just like how a barbed wire restricts access to lands near and far; terms and conditions on language, especially on digital platforms, bind our collective capacity to share. 

Terms and conditions on social media platforms are like carceral systems. If the algorithm doesn’t agree with what language one is using, it places them in a non-space of ‘being shadowbanned.’ On the surface, it might seem like a preventative measure to safeguard hate speech but, as many users across different platforms have reported, it is frequently being used to maliciously suppress any dissent against the political classes and the platforms which we use. In times where most of our communication is via online forums and platforms, sometimes, to really share what one means, words and grammar have to be twisted. Just like the yolk, for some language to seep through the iron hands of the algorithm, it has to shift forms. 

The answer to this bottleneck is algospeak. The newest dialect on the block, it’s a melange of digital counter-semantics where words don’t spell like the dictionary teaches us. Instead, they include creative alphanumeric properties. Sometimes they are not even words but visual codes, like how certain emojis of fruits now symbolise sovereign nation-states.

 Listed below are some examples of algospeak iterations of words which often face the wrath of the algorithmic gods. 

Genocide → G3n0cid3

Algorithm → Al G0 Rhythm

LGBTQ → Leg Booty Queue

Tism → Autism

YT → White 

Unalive → Kill/To be killed 

Not only does the algorithm challenge the way we disseminate language, but it also automates its expression, affecting our very thinking patterns. In other words, the algorithm chooses our words and thoughts, masquerading as autofill or the for you page. It rejects new expressions if they do not subscribe to the language models considered acceptable by the algorithm. It wants to systematise and categorise our language, our thoughts and our imaginations. It wants to be the father figure in controlling our desires. It wants to be Oedipus. What the printing press was to language at the peak of the Industrial Revolution is what the algorithm is to the future of our languages at present. 

That said, algospeak is not novel, it has roots of resistance we once saw amongst writers of the printing press generations. One such example is of an American bestselling novel titled The Naked and The Dead by Norman Mailer. After finishing the manuscript, Mailer realised he had used the word ‘fuck’ quite a lot throughout the book. To avoid his book from not being published, he changed the word ‘fuck’ to ‘fugg.’ The book was not only published and circulated in swathes, but he also received critical acclaim for the text. One could say Mailer was indulging in algospeak. We see such modifications of words across social media platforms every day. Many users on TikTok have reported an algorithmic umbrella suppression of the word ‘sex’ due to any possible community guideline violations. Which has resulted in users bypassing this suppression by repeating what Mailer did almost many decades ago; changing the phonetics of the surveilled word. The word ‘seggs’ is now adopted by everyone on social media, from doctors and educators to comedians and NSFW content makers. Not only is there a stark similarity in this phonetic manipulation, but the semantic process is quite similar too. Users online know instantly what ‘seggs’ means, just how they knew what ‘fugg’ meant. It’s quite remarkable to see these overlapping instances where linguistic manipulation has helped bypass media and literary censorship.

 The only difference is that the algorithm was a few people in some offices back then. Now it’s a cybernetic entity quite literally programmed to register and process as much data as possible. The large language models programmed to power these softwares are being constantly trained to categorise and sort data. In Silicon Valley, they consider our digital footprints to be more valuable than gold. Which is precisely why algospeak now changes commonly used words, every day. It forces many of us to change how we perceive languages, every day.

These linguistic changes are a direct response to crash this surveillance system. Moreover, to bypass this systemic surveillance on social media platforms, many users worldwide resort to unseating English usage to disseminate important information. Many activists and organisers,  such as 7amleh, a Haifa-based data analysis network, rely on digital platforms for communication. They mostly use local dialects to share communiques online, and also report on the many instances of algorithmic suppression these dialects face. The intentional unseating of English as a ‘global’ language is bolstering anti-imperial sentiments amongst people of the Global Majority. 

From the active reclamation of these dialects emerges a new hope for languages to bypass algorithmic obstructions. It is inspiring data scientists and linguists alike to excavate possibilities of such a divestment. There are now softwares (such as arabic-services.ml) dedicated to designing these alternative typefaces and fonts; where the sole purpose of this modification is to achieve freedom of expression through one’s native language. These changes have direct and lasting implications on fonts and typography in general, where omitting certain vowels and consonants from words, to not flag the algorithms, can morph these words forever. Online or in real life, via text or handwritten letters. 

The ever-expanding networked and digital future of language also holds a unique shift in our perception of self. It is an abdication of the self from a singular ‘I’. The ‘I’ now exists in the deep trenches of online cores and communities. The language one uses in daily ‘IRL’ speech is like a signifier of what online ‘in-group’ one prescribes to. In 1996, Jacques Derrida gave an interview titled Word Processing. According to his analysis of the ‘World Wide Web’, he postulated that there is a departure from understanding how these digital systems of language operate. The historical newness of language being disseminated online gives us plenty of room to speculate its impact on our futures linguistically. Meaning morphs and bends like signals inside our computer chips. The algorithm through our screens stares right back at us every day like a partially conscious deity, intelligent enough to conceal its presence but clumsy at the same time (for now anyway) to fall victim to algospeak. 

What would our digital language look like if it didn’t follow a European alpha-numeric dataset? What if the backend of our social media application software was written in the native language of the region? The colonial remnants of lingua franca have a strong hold on machine learning models, resulting in epistemological violence that often gets overlooked. Computer programs share many obvious and superficial characteristics with western ideas of printed texts (for example: being composed of discrete, alpha-numeric characters; like books in the west, read from left to right and top to bottom; like in the English language, containing symbols such as “print,” “and,” “list”). 

There’s also a latent sense of permanence behind the digital expression of language. What is displayed (or not) on our screens does not correspond to the ‘real.’ An erased phrase on a piece of paper implies the material erasure of ink. That erasure is permanent. However, a deleted phrase on a computer does not necessarily mean the phrase is permanently erased from the disk. The ‘backspaced’ sentence lives on, maybe even multiplying across different storage disks. Erasure, in this sense, no longer nullifies; it bears witness. Backspace is a site of non-erasure. It has the possibility to resurface. This non-erasure is used to train algorithms or flag the intent to conceal certain language or sentiments. These dynamic materialities of digital inscription of language opens up these networks to wide-ranging political consequences. 

These algorithmic biases are sugar-coated and hidden like Easter eggs in a platform’s user guidelines. The guidelines section of many platforms reads just like a political communique. It holds notions of being a non-political entity, all the while suppressing speech and language from a particular group of people western governments deem insignificant. It’s a condition where two meanings of code—as

governance and as a machine instruction— coincide. Code equals code. Code is political. The future of languages is anything but static. It is just like the egg yolk, changing forms and texture depending on how it is prepared. The membrane of languages is only getting more permeable due to these digital ramifications. The more surveilled and censored our digital opinions become, our lived realities will also reflect such sentiments in our languages. It is evident already how algospeak slips into our daily interactions. The real/digital dichotomy of linguistic expression is shrinking, and we can observe these interventions in how spoken language has started co-opting its digitised counterparts. Words emerge online on our screens first, then make their way into our ‘real’ social interactions. 

Is the interconnected nature of our social realities and languages a mirage? Is algospeak an unprecedented upheaval of our linguistic traditions? Is the ‘For You Page’ just state-backed violence wrapped up in sugar-coated UI/UX? Does language have a future? Does the future have a language?  If language is subjected to digitisation, will its preservation be manipulated? So are coders the new linguists hiding behind the frameworks of our screens?

Originally published in Dazed MENA Issue 00 Order Here

No more pages to load

Keep in touch with
Dazed MENA