This post was originally published on the AWS Startups Blog.
Guest post by Ammar Syatbi, Data Scientist, Fares Hasan, Data Scientist, Piyush Palkar, Chief Data Officer, Carsome
Carsome is Southeast Asia’s largest integrated car ecommerce platform. With operations across Malaysia, Indonesia, Thailand, and Singapore, we aim to digitize the region’s used car industry by reshaping and elevating the car buying and selling experience. We provide end-to-end solutions to consumers and used car dealers, from car inspection to ownership transfer to financing, promising a service that is trusted, convenient and efficient. Carsome currently transacts around 100,000 cars annually and has more than 2,000 employees across all our offices.
The value Carsome wants to achieve by innovation in this sector goes in two ways. First, it’s important for us that our customers have an excellent experience when they use our service, so the innovation we bring must keep the good work up and carry on making it better. Second, our operations will be more efficient and evolve to manage our growth plans.
Streamlining Inspections with Machine Learning
At Carsome, we understand the importance of thorough checks to offer the best prices and build buyer confidence. Typically, a car takes up to 30 minutes of inspection where our professional inspector will generate 175 points and take pictures annotating various notes about the car condition and appearance. As our business has grown rapidly, it’s critical to streamline this process while maintaining high quality.
Inspection is a strategic step in the car journey, and we are aware of the sheer innovation this step deserves to pioneer and transform the industry. For instance, among the images taken will be front and back images of the car where the inspector has to manually blur the car plate numbers to preserve the privacy of our customers. This small part takes time though a few minutes only but it adds up when you have 100 cars and execution takes place on the small screen of a phone, which is less than ideal.
There are several ongoing efforts to transform this segment of the journey leveraging artificial intelligence (AI) and machine learning (ML). The team has proposed automation of the process with deep learning, completely freeing the inspector from the arduous work of blurring the car plate numbers from images manually and saving the time spent doing this. This will increase the inspection capacity and efficiency in terms of the number of cars the inspector can inspect. There have been several use cases on this front and our small prototype propelled the plans to go ahead and bring artificial intelligence into the center of the inspection process.
However, due to Carsome growing rapidly, it was challenging to predict the adequate compute resources needed for this project deployment. We needed something that is easy to scale up and scale down depending on market demand and operation plans. The conventional method of building everything directly into a vanilla Amazon EC2 instance cannot adequately address the needs. Though we can manage the resources efficiently using Amazon EKS, we needed a solution that is quick and easy to set up and maintain so we can focus more on data science and machine learning instead of managing the infrastructure.
We ran our experiment on Amazon SageMaker, which is a fully managed service for the whole workflow from training a deep neural network model into inference. It provides us with just what we need. SageMaker helps data scientists and developers to prepare, build, train, and deploy ML models quickly by bringing together a broad set of capabilities purpose-built for ML. We can build, train, tune, and deploy our model without having to think much about managing the infrastructure. The diagram below shows the high-level architecture.
Figure 1: High level model training architecture
The evaluation of the results can be carried out in multiple ways based on the use case. For this particular case, we have used the SageMaker Batch Transform to run an evaluation for the model performance. The evaluation that’s carried out looked at both aspects of the accuracy and the actual image masking quality. Since the model is stored in S3, we can always come back to it when we need to or in the case of comparing improvements.
For our modeling approach, we initially tried YOLOv3 to quickly streamline the process and build a successful test run using Tensorflow framework. YOLOv3 is a real-time object detection system. It takes the matrix representation of the image as input and generates a list of detected objects in the image, along with their coordinates and bounding boxes. In our context, the image refers to car images and the objects refer to the car plates. The image below illustrates some of the use cases of the YOLOv3 being limited in producing aesthetically pleasing results, which was the reason for us exploring another algorithm.
Figure 2: Result samples for YOLOv3 model
You can look at the plate area and recognize the following limitations:
- Inability to scale to car plate number size without masking neighboring regions that are not part of the plate.
- Inability to work with diagonally aligned objects which is a problem because that’s how most people take pictures naturally.
The limitations in the YOLOv3 model were obvious based on the aesthetic of the results we have obtained, which were always in bounding boxes instead of polygonal with unconstrained orientation. This means that we have to use a model that provides better results. We decided to adopt the Mask R-CNN (Masked Region Based Convolutional Neural Network), as it is proven to excel in solving such problems. In our context, Mask R-CNN goes a step above the bounding boxes into segmentation based output which has proven to be solving the limitations we observed working with YOLOv3.
Since Mask R-CNN is not among the built-in algorithms in SageMaker yet, there comes the need to build a custom container for it. In our case, we prefer using our own container image since we find it easier to adopt the existing code with many dependencies rather than using SageMaker prebuilt container images. The custom container feature adds value to us where some of the models might be new or innovated by teams working on various problems. Having the ability to build customized containers for them to be orchestrated in SageMaker is a real advantage. The following image compares the result from both models for the same car and you will see how Mask R-CNN scores high in all aspects of comparison. You almost will think the images were edited by software.
Figure 3: Car plate blurring comparison between YOLOv3 and Mask R-CNN
The image on the left is masked by YOLOv3 while the image on right is masked by Mask R-CNN. You can see the difference in the size of the blurred region and that precision and the ability to zoom into pieces and details are achieved by using Mask R-CNN segmentation capabilities.
SageMaker provides us with various options for deploying the model for inference at scale including SageMaker Hosting Services, SageMaker Asynchronous Inference, and SageMaker Batch Transform. The SageMaker Hosting Services simply means that we will have a persistent endpoint that will receive masking requests at any time and fulfill those requests. For our use case, we opted for SageMaker Batch Transform. We don’t need the car plate to be processed in real-time. Furthermore, we can save costs by leveraging the pay-as-you-go pricing of the SageMaker Batch Transform, which is based on the duration of the resources being used. Batch processing moves along with our operations so we can respond in time for requests and support business in sufficient time.
Workflow & Performance
Leveraging our automation infrastructure and Apache Airflow orchestration we have built the workflow as a Directed Acyclic Graph (or DAG). The task starts with reading all car images that will be due in the next window and preparing them for the model. Since the workflow is scheduled, we trigger it at times agreed upon beforehand with our business and product stakeholders. So batch transformation will be invoked and all the images qualified will be masked and stored to be available for retrieval at any time needed. The diagram below shows the inference pipeline architecture.
Figure 4: The inference pipeline architecture
We have tested the model on Malaysian car plates. Over 100 car images, 99 of the images were correctly blurred. Only one image showed a slight issue as a result of the sun glare in the background which made the car plate region dim. However, such a case is niche and in our research, we only found one image taken in the aforementioned condition. Overall performance is impressive and we are confident in the results obtained.
The impact of such a solution in terms of time is spectacular. We have allowed inspectors to perform more by trimming time off such tasks. In our moderate estimations, our inspector spends at least 1 minute masking the plate number of a car (times evaluated in the ground based on 2 images per car performed by an experienced inspector). On the other hand, our automated system takes 2.4 seconds to mask images of one car. To put this in perspective we have reduced processing time by approximately 29x.
Growth in Carsome is multiplying and therefore, we are building systems that multiply human productivity. We have a wide range of problems to solve via deep learning and artificial intelligence to improve our products and services. Our dealer auction recommendation engine went live earlier this year and with it, we set the tone of innovation with data. The vision is to embed intelligence in every stage of the business operations. This is our first round of optimizing the inspection process and we aim to build diagnostics that aid the human inspector.
Carsome’s operations focus on 3 countries: Malaysia, Indonesia, and Thailand. This makes it a requirement for any solution we build to be able to perform at scale in all 3 countries. Operating in these regions with the potential to grow beyond brings about the locality challenge where the same solution needs to be tweaked further before it can work on a different country or region. It’s always the case where we build solutions locally with the intention of deploying this solution regionally to touch every business branch and reap the positive impacts in all our operations.
The time is ticking for us to bring this solution to our regional operations and that is our priority now. This solution gave us a long view of the opportunities we can bring into our process and how AWS services like SageMaker aids in reducing the complexity often faced when building machine/deep learning solutions.
Ammar Syatbi is a Data Scientist at Carsome. He started as a Software Engineer to be a full-stack Data Scientist leveraging deep neural networks in providing a business solution. He always has Python for breakfast and loves solving problems with machine learning.
Fares Hasan is an impact-driven lead data scientist with an affinity for building recommendation engines and customer-facing data products. Enjoys working with startups and building pioneering data teams.
Piyush Palkar is a data thought leader and has an experience of covering a range of areas and responsibilities, both technical and non-technical in nature in order to create and execute data & analytics programs to drive business value and embed data driven innovations in products. He holds the position of Chief Data Officer at Carsome and heads 4 departments viz. Business Intelligence, Data Science & Advanced Analytics, Insights, and Data Engineering. He is also the Country CDO Ambassador for the nation of Malaysia as part of the global partnership representing the Massachusetts Institute of Technology CDOIQ, the International Society of Chief Data Officers, Institute for Chief Data Officers, and CDO Magazine.
A self-proclaimed foodie and gamer whose passion for food and games have recently extended to cars.