GCP – SmarterX leans on Google Cloud to deliver custom LLMs for its customers
Editor’s note: Today we hear about SmarterX, which helps retailers, manufacturers, and logistics companies minimize regulatory risk, maximize sales, and protect consumers and the environment by giving them AI-driven tools to safely and compliantly sell, ship, store, and dispose of their products. SmarterX uses BigQuery, Gemini, and Vertex AI to collect, process, and analyze vast amounts of unstructured regulatory and product data from across the web, using it to train custom, highly accurate large language models (LLMs) to help large consumer packaged goods brands and retailers sell, ship, store, and dispose of regulated products compliantly. Read on to learn about how Google Cloud’s integrated, easy-to-use toolset is helping them accelerate product development.
EVP for product and technology at SmarterX, Russell Foltz-Smith views the world of retail through search-colored glasses.
“If universal product codes were really universal, looking for a product and all the information directly related to it would be a one-step process,” he proposes. “But in the real world, the ideal of universality just doesn’t exist.”
It’s a reality we all deal with dozens of times a day when we type something into a browser’s search bar: There are very few queries guaranteed to return a single answer. Thus the need for what data scientists call “probabilistic search backed by algorithmic indexing and ranking strategies” ( what most of us call “googling”) was born.
“In many ways, all data science and LLM-building boils down to accurate information retrieval,” adds Foltz-Smith. And he’s well-positioned to understand why.
SmarterX customers — consumer packaged goods brands, third-party retailers, distributors, and logistics companies — rely on SmarterX to make sense of the overwhelming volume of regulatory product data online. The platform helps ensure the way products are sold, shipped, stored, and disposed of complies with all applicable laws and regulations.
“SmarterX collects and indexes data, triangulates for missing data points, and provides a queryable interface that helps our customers minimize regulatory risk while maximizing sales,” Foltz-Smith explains. To do so, SmarterX hunts down regulatory information, using crawlers enabled by machine learning and natural language processing to locate, scrape, and parse data from websites, research papers, safety data sheets, and other nooks and crannies of the web where it may be tucked away.
“Google Cloud technologies are a perfect fit for our needs,” Foltz-Smith states. “At their core is the ability to surface the right search results from an inconceivably vast expanse of data where the inputs and outputs are not predetermined and the data itself is unstructured.”
Real-time data processing and fast, accurate model-building
To collect and store all that data, SmarterX employs BigQuery and Cloud Storage. “Our data sources are disparate and the formats unpredictable,” he continues. “BigQuery accommodates unstructured and semi-structured data, then functions as a job engine, recursively cleansing, normalizing, schematizing, and classifying that data at runtime.”
Google Cloud’s scalable compute resources and storage also enable real-time data processing. “We never have to worry about whether we have enough servers in a data center or adequate bandwidth,” Foltz-Smith adds. “Google Cloud hides all that complexity, so it’s handled automatically and cost-effectively.”
Further accelerating data processing is BigQuery’s integration with Gemini, which manages data-processing job queues and also forms the basis of many of the large language models, or LLMs, SmarterX builds for its clients. “Gemini is in part a collection of everything Google has already crawled, so we don’t need to re-crawl it ourselves,” Foltz-Smith notes. That makes model-building faster.
Built-in grounding — the ability to connect model output to verifiable information sources — makes Gemini a safer, more conscientious way to assemble data for SmarterX customers. And retrieval-augmented generation, or RAG, allows SmarterX to connect Gemini with customers’ proprietary databases, enhancing the LLMs’ accuracy and relevance while helping ensure the security of his customers’ data.
We never have to worry about whether we have enough servers in a data center or adequate bandwidth. Google Cloud hides all that complexity, so it’s handled automatically and cost-effectively.
Russell Foltz-Smith
Executive Vice President for Product & Technology, SmarterX
Keeping up with ecommerce and regulatory compliance
For each of its clients, SmarterX builds several discrete LLMs on Vertex AI, many of which are updated as a customer’s business requirements change.
“Vertex AI not only enables us to access Gemini directly but also provides links to smaller, publicly available AI models specific to narrowly defined topics like chemical formulas” he says. SmarterX’s Gemini-based models can even perform complex computations such as chemistry calculations to determine flashpoints, boiling points, and pH levels. This data is then used to automatically triangulate missing data, augment existing data, or update out-of-date information.
Vertex AI also operates at scale, a necessity for a company whose clients include eight major retailers, each of which has thousands of suppliers of regulated consumer packaged goods. SmarterX’s customers include those same suppliers, each of which sells their products on third-party marketplaces like Amazon and TikTok.
“Gone are the days when a brand sold its merchandise exclusively in brick-and-mortar stores they owned,” Foltz-Smith explains. “The proliferation of retail websites and marketplace-specific product variations adds tremendous complexity to our work.” On any given day, SmarterX is processing millions of SKUs and must update each customer-specific LLM with any new compliance data, which can affect its customers’ entire supply chain — from product formulation to sales and marketing to product disposal.
It’s the integration of SQL into BigQuery, and the interoperability of the entire Google Cloud technology constellation that Foltz-Smith credits with allowing SmarterX to keep pace with that volume.
“We no longer have to maintain separate workflows, learn multiple tools, and constantly jump between them,” he notes. “We can crawl the web, land the data in BigQuery, process it, write code programmatically or in SQL statements, massage training data, build new LLMs, and evaluate, deploy, and update them all within one coherent, well-orchestrated system with the same familiar interfaces throughout. Google Cloud workflows were built for high-volume data science.”
Empowering subject matter experts
Google workflows were built for democratized data science as well, with features that enable non-technical subject matter experts who are not trained data scientists to work with data directly, and even to deploy models on their own.
Among those features, according to Foltz-Smith, are the ability to easily swap in and out new sets of training data, an assistive decision-making feature for parameterization, easy-to-understand out-of-the-box visualizations for model evaluation, and templates for formatting evaluation frameworks.
“In the past, you’d need to know how to use a modeling tool, a database tool, and an API deployment tool, as well as understand the math underlying a particular model and how to write code in order to build and deploy a model,” he says. “Having it all in a single environment with familiar user interfaces enables people without a data science background to be much more productive. It’s incredibly freeing and empowering for them.”
That freedom translates into accelerated product development.
SmarterX team members with industry-specific knowledge of regulatory requirements can now evaluate, correct, and deploy the models that provide that knowledge to SmarterX customers; previously, they had to wait for a data scientist to help translate that know-how into a model for them.
“Google’s mission to organize all the information in the world and make it universally available is apparent in the tools it offers today, and that mission dovetails precisely with the way SmarterX employs data science in service to our customers,” Foltz-Smith concludes. “I’ve been a data scientist for over two decades, and the tools in Google Cloud continually exceed my expectations.”
Read More for the details.