Essential Machine Learning Vocabulary for Designers
Common AI and machine learning definitions and roles a product designer should know.
I identified what I think are the most common AI/ML terms a designer should be familiar with. To be able to talk with confidence about AI and ML is important because you want to be seen as competent. I try to keep my explanations as close to layman's language as possible.
Basic Terms
Traditional programming: A programmer uses input commands to create specific actions in the software. AI and ML do not use specific input commands but instead rely on input data.
Artificial Intelligence (AI): AI is when a machine can perform tasks just like a human would. You won't be able to tell the difference if a person completed the task or the computer.
Machine Learning (ML): Uses data as input to build a decision model by looking for patterns in your provided data. It can self-learn and detect patterns on its own.
Statistics: A method of ML that describes problems using math.
Roles
Machine Learning Engineer: A software engineer who understands software and ML techniques. An ML engineer might create data models. However, they are responsible for building and maintaining the development environment and pipeline. This includes the infrastructure to handle data and algorithms and train and integrate models in large systems.
Data Engineer: Data engineers focus on data collection, storage, and management. They design and maintain systems that allow for the efficient handling and access of large amounts of data. This is a very important role because you could not build usable data models without good data to train and test an algorithm.
Machine Learning Scientist: An ML Scientist primarily concentrates on developing new algorithms and techniques in machine learning and artificial intelligence. Typically, they work in research-oriented roles.
Data Scientist: Scientists analyze complex data to help organizations make informed decisions. A product manager and product designer would work with a data scientist to gather insights from the data that would have been analyzed.
Data
Input data: Refers to the information you feed into a computer system for processing and learning. This data is often collected into datasets, which are large collections of specific examples.
- X variables: Represents the input data. This is the data you have or gather to predict, understand, or influence something else. For example, in a study to understand how studying time affects exam scores, the number of hours a student studies (X variable) is the input data.
Output data: Output data is the result or prediction made by a machine learning model after analyzing the input data.
- Y variables: Represents the outcome or the output data you want to predict or explain. In the same example, the exam score would be the Y variable, as it's the result you're trying to understand or predict based on the studying time.
Dataset: A dataset is a collection of information. Think of it as a big file containing many specific data pieces. This data is usually organized so computers and programs can understand and use it. It's like a giant spreadsheet where each row and column has different kinds of information. People use datasets to train computers in machine learning, where the computer learns from this data to do tasks or make predictions.
Data scrubbing: A dataset cannot be loaded into a developer environment without scrubbing it first. You must clean it up by modifying, editing, and removing incomplete or incorrectly formatted data. A lot of algorithms cannot process text-based data. You would have to change to numeric data for the dataset to work.
- Data points: A data point is a single piece of information in a dataset. If you consider a dataset a giant spreadsheet, it's like one cell in a spreadsheet. For example, if you have a dataset of people's heights and weights, each person's height and weight would be individual data points. In machine learning, data points give the computer examples of what it needs to learn. Each data point helps the computer understand patterns or differences in the dataset.
Data model: A data model in machine learning is a structure created by the computer using training data, which it then uses to make predictions or decisions. Then, its prediction/decision-making accuracy is checked by using the test data. You now have a functional data model if there are no massive errors.
- Training data: Initial chunk of data you use to develop a data model. Once you are satisfied with the accuracy of its predictions, then you take the algorithm and your data and test it on the test data you kept in reserve.
- Test data: Test data is a small, separate part of your data that you don't use for training the data model. You use it later to check how well the data model has learned to predict an outcome.
Algorithms
There are three buckets of ML categories. Each category determines how the input data (Variable X) and output data (Variable Y) are treated. Algorithms are grouped together by category. Here is a quick overview of the three buckets of ML categories.
Supervised learning: Focuses on the relationship between input and output data to learn the underlying patterns. Imitates our own ability to extract patterns from known examples. (For example, reverse engineer a car by taking apart a competitor car)
- Common algorithms include regression analysis, decision trees, k-nearest neighbors, neural networks, and support vector machines.
Unsupervised learning: Focuses on analyzing relationships between inputs (X variables) and uncovering hidden patterns that can be extracted to create new labels regarding possible new outputs (Y variables).
- Common algorithms include k-means clustering, social network analysis, and descending dimension algorithms.
Reinforcement learning: Focuses on learning through random trial and error and leveraging insights from previous iterations. It's the most advanced category of ML.
- Common algorithms include Q-learning, deep Q-networks (DQN), policy gradient methods, actor-critic methods, proximal policy optimization (PPO), advantage actor-critic (A2C/A3C), Monte Carlo tree search (MCTS), temporal difference (TD) learning, and SARSA.
Algorithm: An algorithm is a procedure or formula used to solve a problem you apply to your provided data.
Designer Perspective
This is a lot to take in. Take the time to familiarize yourself with these terms. AI/ML is technical; you want to be able to talk with the data science team and understand how they work.
A common scenario in the future will be you and the product manager talking with a data scientist about the insights they've uncovered through ML. You can only work effectively around data if you understand how they got to this insight.
Essentially, you are learning to speak a new language. Embrace it, and it will be no problem.
References
- Machine Learning For Absolute Beginners (Third Edition)
- Machine Learning for Dummies (2nd Edition)
- ChatGPT
Written by Leo Vroegindewey, B2B CX Consultant
Get in touch to improve your customer experience and increase sales. Let's talk about how I can help your business grow. Email me.