GCP – Applying Generative AI to product design with BigQuery DataFrames
For any company, naming a product or service is complex and time-consuming. This process is particularly challenging in the pharmaceutical industry. Typically, companies start by brainstorming and researching thousands of names. They must ensure that the names are unique, compliant with regulations, and easy to pronounce and remember. With so many factors to consider, multiplied across an entire product catalog, the process must be designed to scale.
In this blog post, we will show how the power of data analytics and generative AI can help unleash the creative process, and accelerate testing. We will provide a step-by-step guide on how to generate potential drug names using BigQuery DataFrames. Please note that this blog post simply illustrates the concepts and does not address any regulatory requirements.
Background
Our goal in this demonstration is to generate a set of 10 brand names that can be reviewed by a panel of experts for an imaginary generic drug called “Entropofloxacin”. Drugs with the suffix -floxacin belong to the fluoroquinolones class of antibiotics.
We’ll use the text-bison model, a large language model that has been trained on a massive dataset of text and code. It can generate text, translate languages, write different kinds of creative content, and answer all kinds of questions.
We will also provide these indications & usage to the model: “Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial infections, including: pneumonia, streptococcus infections, salmonella infections, escherichia coli infections, and pseudomonas aeruginosa infections. It is taken by mouth or by injection. The dosage and frequency of administration will vary depending on the type of infection being treated. It should be taken for the full course of treatment, even if symptoms improve after a few days. Stopping the medication early may increase the risk of the infection coming back.”
Getting started
In case you want to follow along, we will use code from this Drug Name Generation notebook in this blog post. We will highlight key steps here, leaving some details in the notebook.
We will be using BigQuery DataFrames to perform generative AI operations. It’s a brand new way to access BigQuery, providing a DataFrame interface that Python developers and data scientists are familiar with. It brings compute capabilities directly to your data in the Cloud, enabling you to process massive datasets. BigQuery DataFrames directly supports a wide variety of ML use cases, which we will showcase here.
Zero-shot learning
Let’s start with a base case, where we simply ask the model a question, through a prompt. No examples, no chains, just a simple request and response scenario.
First, we will need to create a prompt template. You will notice that the prompt guides the model toward the precise outcomes we’re looking for. Also, it is parameterized, so that we can easily update the parameters to try out different scenarios and settings.
We can submit our prompt to the model using the `model.predict()` function. This function takes a dataframe input. In our simple scenario with a 1 string input and a 1 string output, I’ve created a helper function. This function creates a dataframe for the input string, and also extracts the string value from the returned dataframe. The function includes an optional parameter for temperature, to control the degree of randomness, which can be helpful in a creative context.
To get a response, we first need to create a model reference using a BigQuery connection. Then we can pass the prompt to our helper method.
And now, the exciting part. Here are several responses we get:
These names might work! You might notice that the names are very similar. Well, that might not actually be a problem. According to “The art and science of naming drugs”: “The letters “X,” “Y” and “Z” often appear in brand names because they give a drug a high-tech, sciency sounding name (Xanax, Xyrem, Zosyn). Conversely, “H,” “J” and “W” are sometimes avoided because they are difficult to pronounce in some languages.”
Few-shot learning
Next, let’s try expanding on this base case by providing a few examples. This is referred to as few-shot learning, in which the examples provide a little more context to help shape the answer. It’s like providing some training data without retraining the whole model.
Fortunately, there is a public BigQuery FDA datasetavailable at bigquery-public-data.fda_drug that can help us with this task!
We can easily extract a few useful columns from the dataset into a dataframe using BigFrames:
And it’s straightforward to sample the dataset for a few useful examples. Let’s run this code and peek at what we want to include in our prompt.
We can create a more sophisticated prompt with 3 components:
General instructions (e.g. generate 𝑛 brand names)
Multiple examples generated above
Information about the drug we’d like to generate a name for (entropofloxacin)
Our prompt will now look like this, truncating some sections for readability:
Provide 10 unique and modern brand names in Markdown bullet point format, related to the drug at the bottom of this prompt.
Be creative with the brand names. Don’t use English words directly; use variants or invented words.
First, we will provide 3 examples to help with your thought process.
Then, we will provide the generic name and usage for the drug we’d like you to generate brand names for.
Generic name: BUPRENORPHINE HYDROCHLORIDE
Usage: 1 INDICATIONS AND USAGE BELBUCA is indicated for the management of pain…
Brand name: Belbuca
Generic name: DROSPIRENONE/ETHINYL ESTRADIOL/LEVOMEFOLATE CALCIUM AND LEVOMEFOLATE CALCIUM
Usage: 1 INDICATIONS AND USAGE Safyral is an estrogen/progestin COC containing a folate…
Brand name: Safyral
Generic name: FLUOCINOLONE ACETONIDE
Usage: INDICATIONS AND USAGE SYNALAR® Solution is indicated for the relief of the inflammatory and pruritic manifestations of corticosteroid-responsive dermatoses.
Brand name: Synalar
Generic name: Entropofloxacin
Usage: Entropofloxacin is a fluoroquinolone antibiotic that is used to treat a variety of bacterial…
Brand names:
With this prompt, we see a much different set of brand names generated. With the examples included, we see that the model is anchored on the generic name.
Bulk generation
Now that we’ve learned the fundamentals of prompts & responses with BigQuery DataFrames, let’s explore generating names at scale. How can you generate candidate names when you have thousands of products? We can perform multiple operations in the Cloud without bringing the data into local memory within the notebook.
Let’s start with querying for drugs that don’t have a brand name in the FDA dataset. Technically, we are querying for drugs where the brand name and generic name match.
We’ll pass a whole dataframe column of prompts to BigFrames instead of a single string prompt. Let’s look at how we could construct this column.
Next, let’s create a new helper function for batch prediction. We’ll use the column as-is without any transformation from/to strings.
After the operation completes, let’s take a look at one of the generated brand names for “alcohol free hand sanitizer”:
**Sani-Tize**
This is a modern and unique brand name for an alcohol-free hand sanitizer. It is derived from the words “sanitize” and “tize”, which give it a scientific and technical feel. The name is also easy to spell and pronounce, making it memorable and easy to market.
In this scenario, we saw that Generative AI is a powerful tool for accelerating the branding process. While we walked through a pharmaceutical drug name scenario, these concepts could be applied to any industry. We also saw that BigQuery puts all of the tools in one place for multiple prompting styles, all with an intuitive DataFrame interface.
Enjoy applying these creative tools to your next project! For more information, feel free to check out the quickstart documentation.
Read More for the details.