Transforming AI: the Future of Hardware from Positron’s Cameron McCaskill

Introduction:

The rapid advancement of hardware tailored for artificial intelligence “AI” marks a significant milestone in technological evolution, with many who believe AI will transform every business. Nvidia stands at the forefront of this transformation, revolutionizing the field with its robust GPUs that enhance deep learning operations. This hardware progression has significantly expanded the capabilities of AI applications, from natural language processing to image recognition, by delivering the essential computational power required to efficiently manage intricate algorithms and voluminous datasets.  Although Nvidia is the dominant market leader, this is a growing market and there are a few competitors and several upstarts that are seeking to take market share.  Vocap is lucky to have a front row view as one of our long-term advisors, Cameron McCaskill is in the middle of this evolution. 


Cameron’s Bio: 

Cameron McCaskill has served as a key advisor to Vocap Investment Partners for several years, leveraging his expertise in business scaling, strategic sales, and deep domain knowledge in artificial intelligence. Cameron and Vocap’s Managing Director, Vinny Olmstead, first crossed paths during their MBA studies, where Vinny was impressed by Cameron’s entrepreneurial drive. He has just joined his 7th startup, and has an IPO and 3 acquisitions under his belt, including a notable sale to Qualcomm. Cameron remained pivotal at Qualcomm for over a decade, focusing particularly on intelligent edge technologies in his later years.

Following his tenure at Qualcomm, Cameron joined Groq, a leading player in AI, where he made significant contributions until the company pivoted away from enterprise system sales. This transition proved beneficial for Positron.ai, an emerging competitor in the AI space, celebrated for its innovative AI/ML (Machine Learning) solutions that offer superior performance per dollar and per watt for Large Language Model (LLM) inference tasks. Currently, Cameron spearheads Positron.ai's Go-To-Market strategy, driving the company's commercial success and expanding its market presence. 


Interview: 

Vinny sat down with Cameron to get his view on the AI hardware and overall market, coupled with his role at AI “startup” Positron. 

Vinny: Obviously, AI is not only in every tech sector conversation, but also ubiquitous in most facets of life. Although Vocap has been investing in AI technology for many years, it has exploded with the introduction and rapid acceleration of Large Language Models (LLMs). 

The software side of AI requires substantial computing power which translates to the hardware side which can easily be understood by watching the stock price of NVIDIA alone.  Seems like Wall Street is focused on NVIDIA, but the landscape is broader. Can you describe the current hardware landscape for AI?

Cameron: Today, NVIDIA operates as a near-monopoly, holding a significant share of approximately 85% in the data center AI server market. Their dominance stems from demonstrating vastly superior performance in training AI models on GPUs (Graphics Processing Units) compared to CPUs (Central Processing Units). Training represents the initial phase in AI model development, where NVIDIA's GPUs stand out as best-in-class.

Moreover, NVIDIA has established a robust CUDA software ecosystem tailored for AI model development. CUDA (Compute Unified Device Architecture) serves as their strategic advantage, often referred to as their "moat." This ecosystem has fostered a global community of millions of developers proficient in CUDA, all dedicated to developing and optimizing AI models specifically for NVIDIA hardware.

Beyond NVIDIA, there are essentially three categories of AI hardware competitors.  

  1. The incumbents: AMD with their GPUs (e.g. MI300) and Intel with their CPUs (e.g. Gaudi 3).  

  2. The “hyperscalers” developing AI server chips for their own data centers: Amazon’s Trainium and Inferentia ICs, Google’s TPU, Meta’s MTIA family of AI SoCs, and Microsoft’s MAIA 100 IC 

  3. The Startups: A large number of well-funded AI startups including Cerebras ($723M raised), Groq (made famous by the “ALL-IN” podcast), Tenstorrent, Untether AI, and of course, Positron. 

All of these companies may eat into NVIDIA’s dominant share over time, but there are a couple of challenges worth noting:

  • Everyone keeps trying to reinvent the wheel on software.  Most of the well-funded startups spent man-years developing their own compilers, yet none of them have made Jensen (Nvidia CEO) lose even a wink of sleep with their slow market traction.  AMD has been touting their MI300 GPU AI server chips, but they decided to create a proprietary ROCm software ecosystem rather than leverage the CUDA ecosystem.  We have spoken with developers who spent many months porting AI workloads created using CUDA to run on ROCm.  Most AI developers are unwilling to do all this extra S/W porting just to stand up as a legitimate second source to NVIDIA.

  • The hyperscaler’s AI server chips continue to trail the NVIDIA runaway train.  This is akin to Samsung continuing to design and build in-house mobile phone ICs while purchasing Qualcomm chips for their flagship phones.

So, in short Nvidia is the clear leader, the other incumbents’ strategy don’t seem to be working, and some of the Upstarts show promise. 

Vinny: Can you describe the broader landscape and the connection between hardware, cloud service providers (CSPs), and Tokens-as-a-Service (TaaS) companies?

Cameron: Here is a LLM (Large Language Model) market map we have developed in-house at Positron.

So here is how it works today…

  • NVIDIA sells hardware, primarily ~$350K DGXH100 servers, to the Hyperscalers and CSPs

  • The Hyperscalers and CSPs rent access to hosted GPUs to the TaaS (Tokens-as-a-Service) providers like Together AI, Fireworks AI, etc. (~$16/hour/server for long-term contracts)

  • The TaaS players create a “one-stop shop” for developers and enterprises who want to leverage LLMs in their applications.  These companies keep up to date with the latest open source models and offer solutions including training-as-a-service, fine tuning-as-a-service and inference-as-a-service.  The inference service prices vary by LLM model, but pretty much all of them are less than $1/million tokens.  A “token” in LLMs is roughly 1.3 words.  So a 1,000 word response to a LLM query would equate to ~1,300 tokens.

Positron AI has set out to improve CAPEX (Capital Expenditure) and OPEX (Operating Expenditure) for AI inference infrastructure.  We deliver 3X the performance/watt at less than 1/2 the up-front CAPEX as compared with NVIDIA These product attributes have a dramatic impact on Positron’s customers’ profit margins.

Vinny: Tell us about your new company, Positron.ai, and how it fits into the ecosystem. Could you include a high-level value proposition and the potential for disruption?

Cameron: Positron has seen a dramatic shift in memory/compute ratio over time as AI models evolve.  “Transformer” AI models have been around for several years, however since the launch of Open AI’s ChatGPT in November of 2022, transformers have quickly become the dominant class of AI models.  Just a few years ago, AI models had tens of millions to hundreds of millions of parameters.  Transformer models now have tens to even hundreds of billions of parameters, demanding significantly higher memory/compute ratio than what GPU-based servers have today.

Positron took a unique approach in our roadmap from the time we founded the company at the end of March 2023.  Rather than starting with a multi-year ASIC (Application-Specific Integrated Circuit) plan that costs $10s of millions, we opted for a “tick-tock” product strategy.  We “tick” with an FPGA-based solution to get to market much faster and cheaper, engage with customers, and let our revenue help to fund the ASIC.  FPGA stands for “Field Programmable Gate Array”, which is a type of integrated circuit (IC) that can be reprogrammed to suit a particular use case.  As a follow-on to our FPGA-based solution, Positron will “tock” with an ASIC that is purpose-built to meet customer needs. This tock phase is based on first-hand knowledge of customers’ biggest challenges and how AI use cases and models are evolving.

Another key aspect of our strategy is our emphasis on precision and speed to market. By leveraging an advanced, off-the-shelf FPGA from Altera, we enhance our market entry speed, programming it specifically for AI inference optimization. We deliberately avoid overextending by not competing across all facets of AI acceleration; instead, we concentrate solely on inference for transformer models, excluding training and older AI model types like CNNs, LSTMs, and RNNs. Additionally, we prioritize refining the most widely used open-source transformer models hosted on HuggingFace, the largest repository of LLMs. This focused approach significantly narrows our scope and expedites our time to market.

Finally, we addressed the significant challenge of memory utilization, a known issue in the industry. NVIDIA's AI inference systems typically achieve only 15-35% of their potential memory bandwidth utilization, with the percentage dropping as the AI model batch size increases. Rather than address this inefficiency, NVIDIA's solution often involves encouraging customers to purchase additional hardware. In contrast, Positron designed the Atlas server to maximize memory bandwidth utilization, achieving rates as high as 95% even with larger batch sizes. The chart below illustrates these performance metrics.

Vinny: Let’s segue over to Vocap’s portfolio companies, which includes Series A through Series D stage software companies. Can you provide some advice as they leverage their large data sets and move beyond the LLM stage of AI?


Cameron:  First of all, I am not sure we are ready to “move beyond the LLM stage of AI.”  We are honestly just getting started with LLMs and they are likely around for a long time!

With LLMs, I would encourage your portfolio companies to think of every way LLMs could help them do more with less:

  1. Software coding - you still need seasoned S/W architects, but a lot of the actual coding, testing, bug fixing, and optimization can be done quite effectively by LLMs (e.g. GitHub CoPilot, Codestral, etc.)

  2. Documentation - LLMs can dramatically speed up the process of generating software documentation, product data sheets, pitch decks, blog posts, outbound marketing emails and press releases (e.g. Nuclino, Tabnine, etc.)

  3. Customer Service - LLM-based chatbots are improving all the time.  While you will still need human beings to help customers with the more complex challenges, the blocking and tackling of online and even phone support can be handled quite effectively by LLMs (e.g. ChatGPT, Microsoft CoPilot. etc.)

Note that prompt “engineering” is becoming a critical function.  Poorly written prompts generate suboptimal responses.  The LLM prompts need to be specific and directive.  Consider paying to have your employees trained to create the most effective prompts or even hiring dedicated prompt engineers.

Beyond LLMs, there are several new multimodal tools coming out.  These tools can not only generate text from a word prompt, but can also accept in and/or generate out pictures, videos, audio clips, etc.  Recent multimodal releases include OpenAI’s GPT-4o, Google’s Gemini and Anthropic’s Claude 3.5 Sonnet.

From a business perspective, these multimodal tools can save significant time and money for startups. 

Two ways Multimodal AI can positively impact startups include:

  1. Creating product training videos without needing actors, videographers, or editors.

  2. Consuming AI news in various formats (YouTube videos, newspapers, podcasts) and summarizing the most impactful information for your business.

These only scrape the surface of how Multimodal AI can have a positive impact on startups.

Vinny: In closing, use your crystal ball and make three bold predictions for the next two or three years in the future.

Cameron:

  1. Smaller local models running on devices (e.g. smart phones) will actually drive the need for more compute in the cloud to run larger models in the background.  Even Apple’s recent “Apple Intelligence” announcements included a new AI data center they are establishing to work in tandem with the AI-based features they are adding to iPhones, iPads, etc.

  2. Companies using humans for the bulk of their customer support needs will be at a disadvantage to those using AI chatbots within 2 years.  AI chatbots and phone support agents will have an extensive knowledge database to pull from, significantly shorter wait times, and the ability to change language and/or dialect on the fly.

  3. NVIDIA will lose significant AI data center market share over the next few years (perhaps as much as 15-20%), but will still gain $trillions in market cap.

Previous
Previous

Emery Waddell Promoted to General Partner

Next
Next

“Only in YPO”: a Lifeline for Young CEOs