GCP – Mercari leverages Google’s vector search technology to create a new marketplace
Mercari is one of the most successful marketplace services in recent years, with 5.3 million active users in the US and 20 million active users in Japan. In Oct 2021, the company launched a new service Mercari Shops in Japan that allows small business owners and individuals to open their e-commerce portal in 3 minutes. At the core of the new service, Mercari introduced Google’s vector search technology to realize the crucial part: creating a new marketplace for small shops using “similarity”.
Mercari has 5.3M active users in the US
The Challenge: collection of shops doesn’t make a marketplace
At the time of the launch, Mercari Shops was just a collection of small e-commerce sites where shoppers could only see the items sold by each shop one by one. For the shoppers, it was a somewhat painful experience to go back to the top page and choose a shop each time. This loses the most important value of the service; an enjoyable shopping experience for the shoppers.
The challenge of Mercari Shops:
shoppers were only able to browse the items from the selected shop
Shoppers would love something like “a real marketplace on smartphones” where they can easily browse hundreds of items from a wide variety of shops with a single finger gesture. But how do you manage the relationships across all the items to realize the experience? You would need to carefully define millions of item categories and SKUs shared across the thousands of sellers, and keep maintaining it all by manual operation of support staff. It also requires the sellers to search and choose the exact category for each item to sell. This is the way traditional marketplace services are built, involving much operational cost, and also losing another key value of Mercari Shops that anyone can build an e-commerce site within 3 minutes.
How about using a recommendation system? The popular recommendation algorithm such as collaborative filtering usually requires large purchase or click histories to recommend other items, and doesn’t work well for recommending new items or long-tail items that don’t have any relationship with existing items. Also, collaborative filtering only memorizes the relationships between the items, such as “many customers purchase/view these other items also”. Meaning, it doesn’t actually make any recommendations with insights by looking at the item descriptions, names, images or many other side features.
So Mercari decided to introduce a new way: using “similarity” to create a marketplace.
A new marketplace created by similarity
What does it mean by similarity? For example, you can define a vector (a list of numbers) with three elements (0.1, 0.02, 0.03) to represent an item that has 10% affinity to the concept of “fresh”, 2% to “vegetable”, and 30% to “tomato”. This vector represents the meaning or semantics of “a fresh tomato” as an item. If you search near vectors around it, those items would also have similar meaning or semantics – you will find other fresh tomatoes (note: this is a simplified explanation of the concept and the actual vectors have much complex vector space).
This similarity between vectors exemplifies the marketplace in Mericari Shops that allows the shopper to browse all the similar items collected on a page. You don’t need to define and update item categories and SKUs manually to connect between the millions of items from thousands of sellers. Instead, machine learning (ML) algorithms extract the vectors from each item automatically, every time a seller adds a new item or updates an item. This is exactly the same way Google uses for finding relevant contents on Search, YouTube, Play and other services; called Vector Search.
Enabled by the technology, now the shoppers of Mercari Shops can easily browse relevant items sold by different shops on the same page.
The marketplace created with the similarity:
shoppers can easily browse the relevant items
Vector search made easy with Matching Engine
Let’s take a look at how Mercari built the marketplace using the vector search technology. With analytics results and experiments, they found that the item description written by the sellers represents the value of each item well, compared to other features such as the item images. So they decided to use item description texts to extract the feature vector of each item. Thus, the marketplace of Mercari Shops is organized by “how items are similar to each other in the text description”.
Extracting feature vectors from the item description texts
For extracting the text feature vector, they used a word2vec model combined with TF-IDF. Mercari also tried other models such as BERT, but they decided to use word2vec as it’s simple and lightweighted, suitable for production use with less GPU cost for prediction.
There was another challenge. Building a production vector search infrastructure is not an easy task. In the past, Mercari built their own vector search from scratch for an image search service. It took for them to assign a dedicated DevOps engineer, let them build Kubernetes servers, design and maintain the service. Also, they had to build and operate a data pipeline for continuous index update. To keep the search results fresh, you need to update the vector search index every hour with newly added items using the data pipeline. This pipeline had some incidents in the past and consumed DevOps engineers’ resources. Considering these factors, it was almost impossible for Mercari Shops to add a new vector search under a limited resource.
Instead of building it from scratch, they introduced Vertex AI Matching Engine. It’s a fully managed service that shares the same vector search backend with the major Google services such as Google Search, YouTube and Play. So there is no need to implement the infrastructure from scratch, maintain it, and design and run the index update pipeline by yourself. Yet, you can quickly take advantage of the responsiveness, accuracy, scalability and availability of Google’s latest vector search technology.
The feature extraction pipeline
Mercari Shops’ search service has two components: 1) feature extraction pipeline and 2) vector search service. Let’s see how each component works.
The feature extraction pipeline is defined with Vertex AI Pipelines, and is invoked by Cloud Scheduler and Cloud Functions periodically to initiate the following process:
Get item data: The pipeline makes a query BigQuery to fetch the updated item data
Extract feature vector: The pipeline runs predictions on the data with the word2vec model to extract feature vectors
Update index: The pipeline calls Matching Engine APIs for adding the feature vectors to the vector index. The vectors are also saved to Cloud Bigtable
The following is the actual definition of the feature extraction pipeline on Vertex AI Pipelines:
The feature extraction pipeline definition on Vertex AI Pipelines
Vector search service
The second component is the vector search service that works in the following manner:
Client makes a query: a client makes a query to the Cloud Run frontend specifying an item id
Get the feature vector: get a feature vector of the item from Bigtable
Find similar items: using Matching Engine API, find similar items with the feature vector
Returns the similar items: returns item ids of the similar items
By introducing Matching Engine, Mercari Shops was able to build the production vector search service within a couple of months. As of one month after launching the service, they haven’t seen any incidents. From development to production, only a single ML engineer (the author) implements and operates the whole service.
Looking ahead
With the successful introduction, Mercari Shops is now working on adding more functionalities and extending the service to future shop projects. For example, Matching Engine has a filter vector match function that applies simple filters to the search results. With this function, they may only show “on sale” items, or exclude items from specific shops. Also, Matching Engine will support a streaming index update soon that would allow the users to find items as soon as they are added by the sellers. Vertex AI Feature Store looks attractive too as a replacement for the Cloud Bigtable as the repository of feature vectors with its additional functionality including feature monitoring for better observability on the service quality.
With those Google Cloud technologies and products, Mercari can turn their new ideas into reality with less time and resources, adding significant value to their business.
Read More for the details.