AI Fundamentals

AI, Generative AI, LLMs, RAG, and Neural Networks

Generative AI

Generative artificial intelligence (AI) is a type of deep learning model that can produce text, images, computer code, and audiovisual content in response to prompts. Generative AI models are trained on vast quantities of raw data — generally, the same kinds of data they are built to produce. From that data, they learn to form responses, when given arbitrary inputs, that are statistically likely to be relevant for those inputs.

For example, some generative AI models are trained on large amounts of text, in order to be able to respond to written prompts in a seemingly organic and original manner. In simpler terms, generative AI can react to requests much like human artists or authors, but more quickly. Whether the content these models generate can be considered "new" or "original" is up for debate, but in many cases they can match or exceed certain human creative abilities.


Vector Database

A vector database is a collection of data stored as mathematical representations. Vector databases make it easier for machine learning models to remember previous inputs, allowing machine learning to be used to power search, recommendations, and text generation use-cases. Data can be identified based on similarity metrics instead of exact matches, making it possible for a computer model to understand data contextually.

When one visits a shoe store, a salesperson may suggest shoes that are similar to the pair one prefers. Likewise, when shopping in an ecommerce store, the store may suggest similar items under a header like "Customers also bought..." Vector databases enable machine learning models to identify similar objects, just as the salesperson can find comparable shoes and the ecommerce store can suggest related products.

To summarize, vector databases make it possible for computer programs to draw comparisons, identify relationships, and understand context. This enables the creation of advanced artificial intelligence (AI) programs like large language models (LLMs).

What is a Vector?

A vector is an array of numerical values that expresses the location of a floating point along several dimensions. In more everyday language, a vector is a list of numbers, like: {12, 13, 19, 8, 9}. These numbers indicate a location within a space, just as a row and column number indicates a certain cell in a spreadsheet (e.g. "B7").

Each vector in a Vector Database corresponds to an object or item, whether that is a word, an image, a video, a movie, a document, or any other piece of data. These vectors are likely to be lengthy and complex, expressing the location of each object along dozens or even hundreds of dimensions.
  • Machine learning and deep learning: The ability to connect relevant items of information makes it possible to construct machine learning (and deep learning) models that can do complex cognitive tasks.
  • Large language models (LLMs) and generative AI: LLMs, like that on which ChatGPT and Bard are built, rely on the contextual analysis of text made possible by vector databases. By associating words, sentences, and ideas with each other, LLMs can understand natural human language and even generate text.

LLM - Large langauge Model

A large language model (LLM) is a type of artificial intelligence (AI) program that can recognize and generate text, among other tasks. LLMs are trained on huge sets of data — hence the name "large." LLMs are built on machine learning: specifically, a type of neural network called a transformer model. In simpler terms, an LLM is a computer program that has been fed enough examples to be able to recognize and interpret human language or other types of complex data. Many LLMs are trained on data that has been gathered from the Internet — thousands or millions of gigabytes' worth of text. But the quality of the samples impacts how well LLMs will learn natural language, so an LLM's programmers may use a more curated data set.

LLMs use a type of machine learning called deep learning in order to understand how characters, words, and sentences function together. Deep learning involves the probabilistic analysis of unstructured data, which eventually enables the deep learning model to recognize distinctions between pieces of content without human intervention. LLMs are then further trained via tuning: they are fine-tuned or prompt-tuned to the particular task that the programmer wants them to do, such as interpreting questions and generating responses, or translating text from one language to another.

How LLMs Work

At a basic level, LLMs are built on machine learning. Machine learning is a subset of AI, and it refers to the practice of feeding a program large amounts of data in order to train the program how to identify features of that data without human intervention. LLMs use a type of machine learning called deep learning. Deep learning models can essentially train themselves to recognize distinctions without human intervention, although some human fine-tuning is typically necessary. Deep learning uses probability in order to "learn."
  • For instance, in the sentence "The quick brown fox jumped over the lazy dog," the letters "e" and "o" are the most common, appearing four times each.
  • From this, a deep learning model could conclude (correctly) that these characters are among the most likely to appear in English-language text.
Realistically, a deep learning model cannot actually conclude anything from a single sentence. But after analyzing trillions of sentences, it could learn enough to predict how to logically finish an incomplete sentence, or even generate its own sentences.


Deep Learning

Deep learning is a type of machine learning that can recognize complex patterns and make associations in a similar way to humans. Its abilities can range from identifying items in a photo or recognizing a voice to driving a car or creating an illustration. Essentially, a deep learning model is a computer program that can exhibit intelligence, thanks to its complex and sophisticated approach to processing data.

Deep learning is one kind of artificial intelligence (AI), and it is core to how many AI services and models function. Large language models (LLMs) such as ChatGPT, Bard, and Bing Chat, and image generators such as Midjourney and DALL-E, rely on deep learning to learn language and context, and to produce realistic responses. redictive AI models use deep learning to gain conclusions from sprawling collections of historical data.


Embeddings

Embeddings are vectors generated by neural networks. A typical vector database for a deep learning model is composed of embeddings. Once a neural network is properly fine-tuned, it can generate embeddings on its own so that they do not have to be created manually. These embeddings can then be used for similarity searches, contextual analysis, generative AI, and so on, as described above.

Advantages

Querying a machine learning model on its own, without a vector database, is neither fast nor cost-effective. Machine learning models cannot remember anything beyond what they were trained on. They have to be the context every single time (which is how many simple chatbots work). Passing the context of a query to the model every time is very slow, as it is likely to be a lot of data; and expensive, as data has to move around, and computing power has to be expended repeatedly having the model parse the same data. And in practice, most machine learning APIs are likely constrained in how much data they can accept at once anyway.
This is where a vector database comes in handy: a dataset goes through the model only once (or periodically as it changes), and the model's embeddings of that data are stored in a vector database.
  • This saves a tremendous amount of processing time. It makes building user-facing applications around semantic search, classification, and anomaly detection possible, because results come back within tens of milliseconds, without waiting for the model to crunch through the whole data set.
  • For queries, developers ask the machine learning model for a representation (embedding) of just that query. Then the embedding can be passed to the vector database, and it can return similar embeddings — which have already been run through the model.
  • Those embeddings can then be mapped back to their original content: whether that is a URL for a page, a link to an image, or product SKUs.
To summarize: Vector databases work at scale, work quickly, and are more cost-effective than querying machine learning models without them.


Neural Network

A neural network, or artificial neural network, is a type of computing architecture that is based on a model of how a human brain functions — hence the name "neural." Neural networks are made up of a collection of processing units called "nodes."

These nodes pass data to each other, just like how in a brain, neurons pass electrical impulses to each other. Neural networks are used in machine learning, which refers to a category of computer programs that learn without definite instructions. Specifically, neural networks are used in deep learning — an advanced type of machine learning that can draw conclusions from unlabeled data without human intervention. For instance, a deep learning model built on a neural network and fed sufficient training data could be able to identify items in a photo it has never seen before.
Neural networks make many types of artificial intelligence (AI) possible. Large language models (LLMs) such as ChatGPT, AI image generators like DALL-E, and predictive AI models all rely to some extent on neural networks.

There is no limit on how many nodes and layers a neural network can have, and these nodes can interact in almost any way. Because of this, the list of types of neural networks is ever-expanding. But, they can roughly be sorted into these categories:
  • Shallow neural networks usually have only one hidden layer
  • Deep neural networks have multiple hidden layers
Shallow neural networks are fast and require less processing power than deep neural networks, but they cannot perform as many complex tasks as deep neural networks.

Different Types of Commonly Used Neural Networks
  • Perceptron neural networks are simple, shallow networks with an input layer and an output layer.
  • Multilayer perceptron neural networks add complexity to perceptron networks, and include a hidden layer.
  • Feed-forward neural networks only allow their nodes to pass information to a forward node.
  • Recurrent neural networks can go backwards, allowing the output from some nodes to impact the input of preceding nodes.
  • Modular neural networks combine two or more neural networks in order to arrive at the output.
  • Radial basis function neural network nodes use a specific kind of mathematical function called a radial basis function.
  • Liquid state machine neural networks feature nodes that are randomly connected to each other.
  • Residual neural networks allow data to skip ahead via a process called identity mapping, combining the output from early layers with the output of later layers.

RAG - Retrieval Augmented Generation

Retrieval-Augmented Generation (RAG) is an innovative approach in natural language processing that integrates retrieval mechanisms with generative models to enhance text generation. By incorporating external knowledge from pre-existing sources, RAG addresses the challenge of generating contextually relevant and informative text. This integration enables RAG to overcome the limitations of traditional generative models by ensuring that the generated text is grounded in factual information and context. RAG aims to solve the problem of information overload by efficiently retrieving and incorporating only the most relevant information into the generated text, leading to improved coherence and accuracy. Overall, RAG represents a significant advancement in NLP, offering a more robust and contextually aware approach to text generation.

Examples for application of these technique includes for instance customer service chat bots that use a knowledge base to answer support requests. In the context of Retrieval-Augmented Generation (RAG), knowledge seeding involves incorporating external information from pre-existing sources into the generative process, while querying refers to the mechanism of retrieving relevant knowledge from these sources to inform the generation of coherent and contextually accurate text.

RAG Architecture / Cheatsheets

Credits: Internet


AI and DevOps / MLOps

Even the most advanced deep learning models require access to massive data sets to obtain accurate results. One fundamental requirement to solve in MLOps is Data Egress. Cloud storage is ideal for saving these big data sets, since cloud computing is almost infinitely scalable. However, accessing that data often results in egress fees: charges from cloud providers for transferring data from storage.

Another essential requisite for an MLOps engineer is configuring and allocating the necessary and relavent Compute power and infrastructure. Machine learning, and especially deep learning, requires a lot of computational power. Machine learning models require the use of specialized, and expensive, hardware or cloud services — for instance, multiple fast, GPU-powered servers. (A GPU or graphical processing unit is more powerful than a traditional CPU.)

Generative AI in DevOps


Written April 14, 2024