From Access Patterns to Architecture: Optimizing the Advertisement Image Service


Access Patterns of Advertisement Images in Farsi Directory

Data access patterns refer to the ways in which users store, update, or retrieve data—in this case, images. These patterns are influenced by the frequency and method of image requests by users. For example, in a social network, the retrieval of a user's profile pictures is directly related to their followers and activity level. Similarly, in a listing service like Farsi Directory, user access patterns are based on the recency of ads. Newer ad images are viewed more frequently than older ones.

In Farsi Directory, the concept of recency plays a pivotal role in how users interact with ads. Typically, newer ads receive more views because they appear at the top of the list and are more visible. This means a small subset of images related to newer ads is viewed repeatedly, while the majority of stored images associated with older ads are viewed less frequently. Understanding this pattern allows us to optimize our storage strategy: storing images of newer posts in faster data storage for quicker access and better user experience, while older, less-used images can be stored in slower, more cost-effective storage solutions to balance performance and cost.

 

Redesigning the Image Service

In our previous design, uploaded images were stored in a temporary bucket named temp before an ad was submitted. After the ad submission, we created three copies of the image, deleted the current image from the temp bucket, and stored it in the main ad images bucket.

The three versions were:

  1. Thumbnail Images: Sized approximately 200x200 pixels, displayed on the homepage of Farsi Directory.
  2. Post Images: Displayed on the ad page, resized proportionally up to 1500 pixels, watermarked, and compressed by reducing quality by 10% to decrease file size.
  3. Manage Images: Visible only to the ad poster on their ad management page, resized proportionally up to 3000 pixels, stored without watermark and in original quality.

However, this design presented several issues:

  • Dependency of Ad Submission on Image Processing: This one-way dependency meant that if the image service was slow or unresponsive, the ad submission service would also face problems. Processing ads with many images slowed down their submission.
  • High Volume Due to Multiple Image Qualities: Storing three versions of each image consumed a significant amount of storage space.

After researching available tools and based on benchmarks, we decided to use Image Proxy. Image Proxy is a tool that performs image processing online without storing images in memory. One of its key features is that it reads images on the fly and performs image manipulations (such as reducing image quality, resizing, adding watermarks, etc.) in real-time.

By utilizing Image Proxy, we no longer needed to store three versions of each image. Instead, we kept a single version, and Image Proxy handled the image processing. By setting up three modes—Thumbnail, Post, and Manage—the desired images were processed by Image Proxy. With this tool, instead of storing three versions of images, we processed and displayed them upon request.

With this change, two significant improvements occurred:

  • No Negative Impact on User Experience During Ad Submission: Image processing no longer affected the ad submission experience.
  • Elimination of Pre-Processing and Storing Multiple Versions: We now stored only the original version of the image and processed it as needed upon each request.

This raised the question: by using Image Proxy and significantly reducing our storage space, we shifted the load onto our processors. How could we alleviate the pressure on the processors?

 

Caching the Output of Image Proxy

The answer lies in using a Content Delivery Network (CDN) and Cache Proxy. Essentially, we didn't eliminate storing images on disk; instead, we avoided storing all processed versions of images in object storage by leveraging CDN and Cache Proxy. One reason for this decision is the product nature of images in Farsi Directory.

Newer images or ads are always viewed more than older ones. Therefore, by using a CDN, we can cache recent images while older images naturally get purged from CDN nodes. This approach allows us to display images without processing them each time, using a self-managing cache system that requires less space and offers easier management, without needing to store three versions of every image for an extended period.

Another data access pattern in Farsi Directory is based on the geographical location of ads. For instance, if someone posts an ad in Mashhad, most views of that ad will come from users in Mashhad. Using a CDN with edge servers distributed across various geographical locations means that images of ads from each city are mostly requested from the edge server in that city (or the nearest one). This optimizes the caching of ad images.

Each CDN edge might send a request to Image Proxy to load images. So, for three versions of an image and multiple CDN edges, numerous requests could flood Image Proxy. To prevent excessive load on processors, we implemented a Cache Proxy. This means that only the first request from the CDN reaches Image Proxy; subsequent requests are handled by the Cache Proxy. We used Nginx as our Cache Proxy and configured the cache to remove old images based on policy.

The diagram below illustrates the new architecture of the image service:

This solution ensures that each image sends only one request to Image Proxy and, ultimately, one request to the object storage. The Cache Proxy's purpose is precisely this: to prevent each CDN edge from sending a request to Image Proxy, ensuring only the initial request is sent and subsequent ones are answered by the Cache Proxy. As a result, recently popular images remain in the caches, and requests are served either from the CDN edge or the Cache Proxy. Ultimately, we store only one version of each image.

 

What Were the Results of This Design?

By reimagining the system, we leveraged fewer resources to achieve better response times and higher service quality. As a result, we reduced our object storage space by over 58%. Additionally, in the new service, we optimized and saved 46% of processing resources (CPU) and 34% of memory resources. Collectively, these optimizations reduced our monthly infrastructure costs by about 6% at that time.

With the new architecture, image processing no longer occurred during ad submission and did not negatively impact the user experience. This change led to an increase in the number of image-containing ads on Farsi Directory, ultimately positively affecting the quality of the ads.

Special thanks to the technical team of Farsi Directory, both current and former engineers, who undertook the responsibility of this redesign and shared their experiences for this article.

 

 

Engineering & Technology Search Engines Computer Science Advertising & Marketing