Rise of the AI PC & local LLMs

en-usJune 04, 2024

Practical AI: Machine Learning, Data Science

Podcast Summary

Local offline AI and AI PCs: Local offline AI and AI PCs allow users to access AI capabilities on their devices without an internet connection, providing benefits for privacy, security, and efficiency. Companies like Microsoft, Google, and Ollama are investing in this area, and it has the potential to democratize AI access.
There's been a growing interest in local offline AI and AI PCs, which refer to running artificial intelligence models on personal computers without an internet connection. This concept gained more attention with recent announcements and developments. Chris and Daniel discussed this topic on the Practical AI podcast, clarifying any confusion from their previous episode about GPT 4.0's voice interface. They also acknowledged their own mistakes in the excitement to discuss the latest advancements in AI. Local offline AI and AI PCs enable users to access AI capabilities on their devices without relying on the internet. This can be beneficial for privacy, security, and efficiency. Relevant models and optimizations include TensorFlow Lite, Hugging Face's Transformers, and even large models like GPT-3. Companies like Microsoft, Google, and Ollama are also investing in this area. Understanding the differences between these offerings and their use cases can be challenging for newcomers to the field. As the technology advances, AI PCs have the potential to democratize AI access, enabling more people to leverage its power for various applications, such as language translation, image recognition, and even scientific research. The future of AI is not limited to centralized servers; local offline AI and AI PCs are an essential part of the evolving landscape.
AI at the edge: AI development is shifting towards edge computing due to privacy, security, inconsistent networks, and real-time processing needs. Hardware advancements are also driving this trend. Both local and cloud models will continue to coexist and serve different use cases.
AI at the edge or locally is becoming increasingly popular due to the natural evolution of software development and the availability of new hardware capabilities. This trend is a natural shift in the flow of software development, which has historically seen a back-and-forth between local and cloud capabilities. There are various reasons for this shift, including privacy and security concerns, inconsistent networks, and the need for real-time processing in offline environments. The hardware side is also undergoing a revolution, with many new capabilities being developed specifically for low-power, disconnected environments. This means that both local AI models and cloud-hosted models will continue to coexist and serve different use cases. For those new to AI models, there are easy ways to get started, such as using applications like LM Studio to run models locally on a laptop. Overall, the future of AI development will involve a balance between local and cloud capabilities, and the ability to seamlessly switch between the two as needed.
Local LLM considerations: Local LLMs require user-friendly systems, optimization libraries, and infrastructure for seamless data integration and automation to run multiple models efficiently.
As the use of local models for running large language models (LLMs) continues to grow, there are several key considerations. One is the availability of user-friendly systems like Ollama, which can be used as a local server or Python library. Another is the use of optimization and compilation libraries to run larger models locally. These libraries are hardware-specific and aim to optimize models for local environments. However, they may not be as general-purpose as local model systems. Another important aspect is the need for infrastructure and middleware to enable inferencing across multiple models, both in the cloud and locally, without human intervention. While it may not be feasible to run a large number of models simultaneously on a local machine, the market is showing a demand for using multiple models for various purposes. This necessitates the development of solutions for seamless data integration, automation, and pipelining. The current difference between cloud-hosted models and local models is that running a large number of models locally all at once is not practical due to memory constraints. However, there is a growing trend towards using multiple models for specific purposes, and this can be achieved by a combination of cloud and local models. Companies that have integrated automation, data pipelining, and data integration into their generative models are leading the way in the AI space.
AI systems integration: The true value of AI comes from the systems built around it, including data integration, automation, and potentially routing between local and cloud models. The future involves finding a standardized, open framework for these interactions.
While AI models offer significant value, the real advantage comes from the systems built around them. This involves data integration, automation, and potentially routing between local and cloud models. The future of local AI interactions is an area of interest, with a need for a standard approach to structuring these interactions. Delaying setting up term coverage life insurance is not an option, but when it comes to exploring local AI models and their systems, there are increasingly more choices, such as Intel's AI PCs and NVIDIA's GeForce RTX AI PCs. In summary, the true value lies in the interconnectedness of AI models and their systems, and the future lies in finding a standardized, open framework for these interactions.
AI-ready laptops: AI-ready laptops, like Apple's M1 and Intel's Core Ultra and M2, have built-in optimization for executing machine learning models, making them suitable for AI workloads. MLCommons aims to provide benchmarks for comparing different AI PCs, helping users make informed decisions.
With the rise of AI-integrated processors in laptops, such as Apple's M1 and Intel's Core Ultra and M2, the line between traditional laptops and AI PCs is becoming increasingly blurred. These new processors have built-in optimization for executing machine learning models, making them "AI ready." ML Commons, an organization focused on benchmarking AI workloads, has recently announced the MLPerf Client working group, which aims to provide benchmarks for desktop, laptop, and workstation machines running various operating systems. This will help users compare different AI PCs and make informed decisions. As more laptops integrate AI capabilities, it's possible that the distinction between an AI laptop and a regular laptop may become redundant. However, the high cost of these advanced laptops may create a divide between those who can afford them and those who cannot. Additionally, there will be continued development in optimizing models for local processing and in creating more efficient models for cloud-based processing. It's essential to understand the differences between base models and fine-tuned models when working with AI applications.
Optimized LLMs for better performance: Fine-tuned models and community-built optimized versions offer better performance for instruction following and chat compared to base models like Meta 3 or Llama 3. Each optimized version serves a specific purpose, such as making the model smaller and more efficient for CPU environments or calibrating the quantization for GPUs.
While the base models like Meta 3 or Llama 3 are a good starting point for using large language models (LLMs), they are not the most effective for instruction following or chat. Fine-tuned models, such as Meta 3 instruct, and community-built optimized versions offer better performance. These optimized versions can be run on various environments, including CPUs and GPUs, and come in different forms like ggml, gguf, GPTQ, QAT, and AWQ. Each of these optimized versions serves a specific purpose, such as making the model smaller and more efficient for CPU environments or calibrating the quantization for GPUs. The CPU-derived models, while capable, do not match the throughput of their GPU counterparts. For those interested in exploring the use of LLMs with knowledge graphs and vector search, the Neo 4j team has shared insights on their podcast, episode 23 at graphstuff.fm.
Local AI processing: Despite the common use of cloud-based solutions for AI training and inference, local processing is still relevant for certain use cases. Techniques like RAG, quantization, and federated learning can help optimize local AI processing.
While AI models can be run locally on laptops or even in disconnected environments, the size and complexity of models usually require cloud-based solutions for training and large-scale inference. However, there are use cases, such as disaster relief scenarios or mobile platforms with limited connectivity, where local AI processing can be sufficient. The future may bring more advanced local training capabilities, but for now, most AI workloads involve inference on local machines and data integration using techniques like RAG (Rewriting, Agents, and Graphs). It's also worth noting that there are ongoing efforts to explore federated learning and parameter efficient updates to train models across multiple client devices. While some believe that fine-tuning models locally will become less necessary as models improve, it's essential to stay informed about the latest developments in this rapidly evolving field. If you're interested in experimenting with local AI processing, consider exploring resources like quantization methods, systems like Ollama and LM Studio, and getting hands-on experience to understand the performance of these models.
Transfer Learning: Use pre-trained models as starting points for new models to save time and resources, but fine-tune on smaller task-specific datasets for high accuracy. Large, diverse datasets are essential for pre-trained model training.
In today's episode of Practical AI, Daniel discussed the importance of using transfer learning for building accurate machine learning models in less time. Transfer learning is a type of machine learning where a pre-trained model is used as the starting point for a new model. This approach saves time and resources by using the knowledge learned from the pre-trained model to improve the performance of the new model. Daniel also emphasized the importance of using a large and diverse dataset for training the pre-trained model, as this will help the model learn more generalizable features. Furthermore, he mentioned that fine-tuning the pre-trained model on a smaller dataset specific to the task at hand is crucial for achieving high accuracy. Overall, transfer learning is a powerful technique for building accurate machine learning models in a more efficient way, and it's a must-have tool for any machine learning engineer or data scientist. Don't forget to subscribe to Practical AI for more insights on machine learning and AI, and join our free Slack community at practicalai.fm/community to connect with other like-minded individuals. Thanks for listening!

Recent Episodes from Practical AI: Machine Learning, Data Science

Vectoring in on Pinecone

Daniel & Chris explore the advantages of vector databases with Roie Schwaber-Cohen of Pinecone. Roie starts with a very lucid explanation of why you need a vector database in your machine learning pipeline, and then goes on to discuss Pinecone’s vector database, designed to facilitate efficient storage, retrieval, and management of vector data.

Practical AI: Machine Learning, Data Science

en-usJuly 10, 2024

artificial intelligence

Stanford's AI Index Report 2024

We’ve had representatives from Stanford’s Institute for Human-Centered Artificial Intelligence (HAI) on the show in the past, but we were super excited to talk through their 2024 AI Index Report after such a crazy year in AI! Nestor from HAI joins us in this episode to talk about some of the main takeaways including how AI makes workers more productive, the US is increasing regulations sharply, and industry continues to dominate frontier AI research.

Practical AI: Machine Learning, Data Science

en-usJuly 02, 2024

artificial intelligence

Apple Intelligence & Advanced RAG

Daniel & Chris engage in an impromptu discussion of the state of AI in the enterprise. Then they dive into the recent Apple Intelligence announcement to explore its implications. Finally, Daniel leads a deep dive into a new topic - Advanced RAG - covering everything you need to know to be practical & productive.

Practical AI: Machine Learning, Data Science

en-usJune 25, 2024

artificial intelligence

The perplexities of information retrieval

Daniel & Chris sit down with Denis Yarats, Co-founder & CTO at Perplexity, to discuss Perplexity’s sophisticated AI-driven answer engine. Denis outlines some of the deficiencies in search engines, and how Perplexity’s approach to information retrieval improves on traditional search engine systems, with a focus on accuracy and validation of the information provided.

Practical AI: Machine Learning, Data Science

en-usJune 19, 2024

artificial intelligence

Using edge models to find sensitive data

We’ve all heard about breaches of privacy and leaks of private health information (PHI). For healthcare providers and those storing this data, knowing where all the sensitive data is stored is non-trivial. Ramin, from Tausight, joins us to discuss how they have deploy edge AI models to help company search through billions of records for PHI.

Practical AI: Machine Learning, Data Science

en-usJune 13, 2024

artificial intelligence

Rise of the AI PC & local LLMs

We’ve seen a rise in interest recently and a number of major announcements related to local LLMs and AI PCs. NVIDIA, Apple, and Intel are getting into this along with models like the Phi family from Microsoft. In this episode, we dig into local AI tooling, frameworks, and optimizations to help you navigate this AI niche, and we talk about how this might impact AI adoption in the longer term.

Practical AI: Machine Learning, Data Science

en-usJune 04, 2024

artificial intelligence

AI in the U.S. Congress

At the age of 72, U.S. Representative Don Beyer of Virginia enrolled at GMU to pursue a Master’s degree in C.S. with a concentration in Machine Learning. Rep. Beyer is Vice Chair of the bipartisan Artificial Intelligence Caucus & Vice Chair of the NDC’s AI Working Group. He is the author of the AI Foundation Model Transparency Act & a lead cosponsor of the CREATE AI Act, the Federal Artificial Intelligence Risk Management Act & the Artificial Intelligence Environmental Impacts Act. We hope you tune into this inspiring, nonpartisan conversation with Rep. Beyer about his decision to dive into the deep end of the AI pool & his leadership in bringing that expertise to Capitol Hill.

Practical AI: Machine Learning, Data Science

en-usMay 29, 2024

artificial intelligence

First impressions of GPT-4o

Daniel & Chris share their first impressions of OpenAI’s newest LLM: GPT-4o and Daniel tries to bring the model into the conversation with humorously mixed results. Together, they explore the implications of Omni’s new feature set - the speed, the voice interface, and the new multimodal capabilities.

Practical AI: Machine Learning, Data Science

en-usMay 22, 2024

artificial intelligence

Full-stack approach for effective AI agents

There’s a lot of hype about AI agents right now, but developing robust agents isn’t yet a reality in general. Imbue is leading the way towards more robust agents by taking a full-stack approach; from hardware innovations through to user interface. In this episode, Josh, Imbue’s CTO, tell us more about their approach and some of what they have learned along the way.

Practical AI: Machine Learning, Data Science

en-usMay 15, 2024

artificial intelligence

Autonomous fighter jets?!

Yep, you heard that right. Autonomous fighter jets are in the news. Chris and Daniel discuss a modified F-16 known as the X-62A VISTA and autonomous vehicles/ systems more generally. They also comment on the Linux Foundation’s new Open Platform for Enterprise AI.

Practical AI: Machine Learning, Data Science

en-usMay 08, 2024

artificial intelligence