IDC estimates that by 2025, there will be 175 zettabytes of data in the world, and 80% of that data will be unstructured. However, 90% of unstructured data is never analyzed. That’s because it can be cumbersome, expensive and risky to extract and transform unstructured data, requiring multiple tools. As such, it’s rarely used in organizations’ data pipelines.
Google Cloud’s recent innovations in generative AI, including foundation models for text and vision, open up various avenues for data teams to harness this untapped unstructured data. Object tables, a new table type in BigQuery, provides a structured record interface for unstructured data stored in Cloud Storage, unlocking additional possibilities.
Today, we are taking it one step further with the integration of BigQuery and Vertex AI foundation models, making it simple and easy for you to analyze unstructured data from right inside BigQuery. With the integration of BigQuery and Vertex AI foundation models, we are bringing generative AI directly to where your data resides. This approach has numerous benefits:
Eliminates the need to build and manage data pipelines between BigQuery and generative AI model APIs
Streamlines governance and helps reduce the risk of data loss by avoiding data movement
Reduces the need to write and manage custom Python code to call AI models
Enables you to analyze data at petabyte-scale without compromising on performance
Can lower your total cost of ownership with a simplified architecture
All this is made possible with BigQuery ML inference engine, which offers machine learning capabilities right inside BigQuery, and which recently became generally available. For each of the last two years, BigQuery ML has seen over 250% YoY query growth. This year, customers have run over 300 million prediction and training queries in BigQuery ML.
Starting with the first supported foundation model, text analysis via PaLM 2 (text-bison), you can now write just a few lines of SQL in BigQuery ML to analyze unstructured data for advanced text processing tasks such as summarization or sentiment analysis, retrieve results in a structured format, and use it with other data for further analysis.
How does it work?
Under the hood, BigQuery ML’s inference engine uses ML.GENERATE_TEXT function to call Vertex AI text-bison models from the Model Garden. Here are two simple steps to use this feature:
1. Register the model as a remote model