Train your AI’s, people!

Posted by Khushman Patel in best, tech

The world of information is big and scary. It has beautifully illuminated corners, while also rumoring about niches where if you wander in by mistake, and choose to struggle on, you’re greeted by mountains of incomprehensible terminology, and sentences of slippery schrodingerish meanings which seem to elude you while being right within your grasp. For many people, they face this monster at the bright young age of Math, while some face it when colliding with an unknown territory far into the future.

But over time I have realized that there are very few true monsters alive. And if you’re battling them, I salute you. You know the mountain you’ve chosen to climb, and I can help you with more oxygen, but the climb is all yours. Most of these monsters left today that you or I will experience are distortions of the monsters that were slain before, artificially erected for a purpose, nefarious or not .

And I don’t accept these monsters. These shadows are but weak illusions which can easily by pierced by depth. Why?, and How? Are two powerful weapons I have come to respect immensely. And they usually help cut through all of the bullshlaka.

One such field I am lucky enough to understand which a lot of people don’t know the inner workings of, is statistical methods. Specifically machine learning and its derivatives. While the average person that I talk to(STEM academics/professionals, designers, product managers, mathematicians) has some kind of vague understanding about how it functions, the average person who I talk to outside my network has no idea how any of these systems even breathe.

And that’s completely OK. Why would would you want to learn so much advanced math when smart people will set up these systems to help you out already. Youtube has an amazing recommendation system, despite its flaws. Amazon has found ways to entrench themselves into almost every basic flow with which I shop. And these are just some of the big, obvious ones we know about. It’s when the smart people’s purpose differs from your benefit, and suddenly those resources are used against you that is concerning. Because often you won’t even know these systems are getting to know you and impacting your life. What about the one at an ad company quietly aggregating all your varied digital footprints into one cohesive whole so that they can advertise political ads to you? Or the one at the credit bureau which is learning how to better identify which parameters of your life influence how beneficial you might be the company, which might reject you in the future because, e.g. you don’t meet the one variable it cares about: history of previous debt?

One of my bigger problems with having statistical models define so many parts of our lives is that, by design, it will often lead to the best statistical outcome(often for the group) while losing the essence of individuality. But I cannot argue that these systems do not help our world. Only that maybe they can be made better .

This post today is about how we can stop being just mute observers to these statistical systems(or SS, as I will refer to them from now on. Laypeople often equate this term with AI). There are some fundamentals which most of these systems function on. Most importantly:

1 – ALL OF THEM NEED DATA.

A typical SS system. The output is either a statistical: prediction based on new data, an agent that has learnt some behaviour on that data, a classifier that tells you what class some data belongs to, or a generator which generates new patterns based on the data that it has learned.

They might require data in different forms, to do different things. They might require data to directly process and predict something, or they might need data to create a model where an SS could train itself to perform certain behaviors. It doesn’t matter. All of them require data, and you are providing it to them every minute you carelessly interact with these systems. Stop doing that.

Every interaction you make with an app/website/system, you provide them with a juicy little tidbit about you. And its not always bad. Its just bad when you forget about the things that you have given up. Would you like it if Chrome knew your dark, dirty secrets despite using incognito mode? I don’t even want to alarm you about the amazing things you can find from the data you’re leaking (here‘s an example from 1997, and they just use large scale demographic data). Just think about all the things you don’t want known to any entity out there.

But we’re not helpless. What we can do, is large scale black box experiments on these AIs. Sure, we don’t know how these AI function internally. We don’t know what they’re finding out about us. We don’t even know what they’re doing with our data after figuring out whatever they are about us.

But we control the source.

Everything we do online is converted into a data point and fed into these systems, and you know what other system works in that manner? Society. Everything we do in public is continuously being scrutinized by other humans around us at all times, though they might not care or actively notice it. We actively behave differently in public spaces, as compared to when we’re comfortable or vulnerable. How is being online any different? Except that we aren’t being tracked by humans, we’re being tracked by super-tracker-humans(SS), who know more about us everytime we are in their sights, and keep getting better at getting to know us. Even though we feel like we’re accessing the internet through a dark tube in our safe room, we’ve forgotten about the spy flies flying around slyly exhuming our safe cocoon.

But don’t we know the solution to this already? What do we do in public? We wear a mask(I mean metaphorically, though this is a pun these days). You might be an absolute psychopath otherwise, but you will behave appropriately in front of a crowd so as to be left unharmed after the encounter. So why aren’t we wearing a mask online? I guess I kind of know why. Our phones are very much an extension of ourselves these days. With so much of our time spent online, it is but natural that certain parts of our identity will be intimately intertwined with our SS’es.

It’s me and my phone against the world. Or is it?

A long time back, you used to own your phone. Your phone communicated with the outside world using your phone’s network, the OS was completely closed off to random apps, and only your network provider had a chance to begin any maliciousness. But in the current scenario, the network provider is just a pipeline, as they should rightly be, and it is the services we subscribe to and we chose, that get to listen in onto us. But why should my choosing to use Youtube to look at a particular video mean that they get to know my browsing history of today(an example)? The cost of using Youtube? Sure. But then I should be able to negotiate that cost, or atleast have some modicum of control over it. But I don’t.

So I have to find ways to control what data Youtube gets from me so that:

I can get these systems to do what I want them to do (or atleast try)
They don’t know absolutely everything about me(this is too complex a topic and I don’t know where to begin breaking it down)

This brings me to our primary manipulation method. Controlling what you do on these platforms. Wear a mask online. I don’t know the specifics of each company, but I know that user behaviour is tracked in extreme detail(I don’t want to go down explaining this path because I know what’s possible from a technical perspective, and if I can imagine making a system that does this, I can assure you, someone out there has made it and is using it): Where your mouse moved to, what you hovered over, how long during a scroll did you pause over something, how fast were you scrolling, what you clicked on, etc.

Those are elements you can control. The individual aspects of your behaviour online. Once you know what the inputs are that they require from you, you can start seeing what happens when you change/mess with those inputs. And when you can see some effect of a particular input that you’re messing with, you start to have a better understanding of what function/power that input you perform holds. Using this method of interacting with the systems, and the scientific method, I have successfully trained my android keyboard’s predictor to type English words in English, and Hindi words in Devanagri script, often in the same sentence while intermixing both, which is often how I chat with my subcontinental friends. My Youtube feed is almost never filled with garbage, and when it is, I can pinpoint almost exactly what behaviour of mine led to this recommendation, and what I might need to do to get rid of it permanently. An external example of this is my friend who only scrolled at an even pace through his instagram for a few weeks, causing an increase in the range of content he was shown as the algorithm couldn’t figure out what would attract and keep his attention. I have a facebook profile with no information that I keep just for logins, and I have noticed the friends you may know feature grow crazy over the years as it keeps expanding the range of people who it thinks I might befriend. Play with whatever you’ve got and mess with their messing around with you.

Don’t let their messing grow to be untangleable.

1.1 – Use privacy protections

I know, I know, it’s such a hassle to think about privacy and all the annoying things you have to do to be private these days, but at the bare minimum, you can install ad blockers, or privacy protectors. You can rest easy that some of the data about what you’re accessing is being blocked from being sent out.

2 – WHY ARE YOU JUST ONE PERSON?

You’re not just one person in real life. If we take my example, I’m a software engineer guy, a wannabe designer guy, an machine learning guy, sometimes a poetry guy, and sometimes a finance guy. And these all are guys I have been in the past week. In real life, there is often very little intersect between these different facets of my life, though all those things are intrinsically related in my head. My poetry people probably don’t know what kind of a nerd I am on the weekends, pouring over statistical convergence proofs because my bot is not working properly. Conversely, my software engineering friends who bitch about designers on a daily basis probably don’t know too much about my discussions with designers on why they think design shouldn’t be a function of form. And I think it works. It allows me some self-respect in each individual circle, and allows me to thrive as that persona in that field. (sorry guys)

Why have we forgone that distinction in ourselves online? Our ad SS’s know ALL aspects of us at once, unless you’ve been taking precautions to keep those lives separate. It’s like incognito mode. Those occasions when you don’t want the stain of looking up whatever you were looking at to stick with your permanent self. You don’t even know who’s watching when you do it. You’re just worried that someone might be. But you gotta do this with all your selves. I’m gonna to refer to one self of yourself as a persona.

Firefox’s Multi account containers have been amazing at helping me separate out different facets of my life. Each of my personas is able to get a different browser(almost, it gets a container) where it can exist independently.

My personal container contains my personal gmail, whatsapp, and other super data sucking websites. It’s rarely used except when involving those entities, and usually auto redirects those URLs to this container, so data leaking is minimized.

My work container contains, well, work.

My finance container contains all my banking and finance information and any of those operations are carried out in this container. Again, has auto-redirects.

My leisure container gets all my memeboards and other crap.

My developer container gets access to my github, gitlab, other servers that I maintain. Yes, I protect myself from myself: the only true security available nowadays.

And I have an uncontained container which is my default browsing persona online. My uncontainer persona is a mix of an audiophile, an open source library hunting, programming error searching, Dota 2 watching freak. But that’s OK. Because that’s the persona I am OK with sharing uncontained, and it is not directly tied to any of my actual accounts, which tie into my real life identity.

e.g. Blue is work, yellow is personal(yes they are reversed), pink is learning and uncoloured are uncontained in the image below.

Update: Firefox now has Total Cookie Protection, which I HIGHLY recommend you enable, along with strict mode.

You could take this idea forward and make one for news. I have often thought about how to navigate the extremely polarized news spaces on the internet. The bubble effect is often strong enough that you don’t even know what you’re missing. One of the experiments I’m gonna do in the near future is look at the news from two different personas, one left-wing, and one right-wing to see if that provides a full story of the news.

Of course, all of this could be a figment of my imagination and the ad networks might have caught up and know all my secrets, but the recommendations and experience I get online say otherwise. They could possibly be smart enough to know this and behave with each persona innocuously while knowing that they are interacting with the same identity, but I highly doubt that. Or in any case, I am helpless in the face of such technology for now.

3 – YOU ARE BEING GAMIFIED

Yup, you heard it right. These SS and models created around you would have been fine if they were just targeting advertising for products. Unfortunately, the jump between persuading you to buy a product, and to buy into an idea, is tiny. When you have been reduced to a statistical data point which a model is learning to manipulate most effectively, you are left susceptible to being “tweaked” on your parameters without knowing that you are being biased.

To give you an example of what’s possible, the infamous Trump election campaign which used extremely advanced SS, relied on it to choose whom to target. Post that, it was traditional advertising through modern channels that led to them being able to influence this demographic. The SS could pick out what kind of people were on the fence about their political stance on that election from their data points. Now imagine if this system was taken one step further and taught on your behaviour data. It might figure out how you were prone to go into watching binges after work every evening for an hour, and that your choice in content got more volatile the longer you binged, which meant you were more susceptible to more extreme ideas later in the binge, which is where it chose to show you more politically subversive material. Such a system already exists, e.g. in how Youtube chooses to show you ads, how many ads, and how long they should be. They are trying to optimize for converting your view into a click. Which is the same way I’d try to optimize how many gold coins I’d collect in a role-playing video game.

Of course, this is a small glimpse into what’s possible. I don’t have the vision to envision what such massive SS’es as big tech have are capable of doing. On a broader scale, All I know is that I am being served to in the hopes that I myself will serve, and that is something I’d rather not do. I can take action, and it is as simple as behaving online the way I behave in public.

At its most basic, this fight is about how you can reduce your data footprint whenever you do something. Because the less you let out, the less they know you intimately, and the less they can influence your behaviour. It’s not possible for most of us to go completely off the grid, and phones are a notorious data suck, but why not take control of the parts we can control? Some browsers(**cough cough** not Chrome) have been working in the right direction generally, and that might be an easy switch to make. I know it’s such a BIG decision, but really, what’re you gonna lose? Some of your history? Some logins you might have to do again?

But these ideas should help begin your journey of stopping being a puppet from these complex systems which are intertwining with us evermore. This is a cat and a mouse game akin to the ones cybersecurity professionals and crackers play on a daily basis with each other, except that we’re playing it against the very institutions that we thought we were the beneficiaries of. And they don’t always hold your best interest at heart. Which is why, again: TRAIN YOUR AI’S, PEOPLE!

The last private sanctuary of humans has become their own mind, and I hope you are able protect yours :).

I for one, love mine.

P.S. If you find any flaws, or holes please let me know. If you have any more experiments you’ve done with a network and have seen an effect, let me know and I’ll add it to the list here. If we have enough, I’ll make a separate list.