Data access patterns refer to the ways in which users store, update, or retrieve data—in this case, images. These patterns are influenced by the frequency and method of image requests by users. For example, in a social network, the retrieval of a user's profile pictures is directly related to their followers and activity level. Similarly, in a listing service like Farsi Directory, user access patterns are based on the recency of ads. Newer ad images are viewed more frequently than older ones.
In Farsi Directory, the concept of recency plays a pivotal role in how users interact with ads. Typically, newer ads receive more views because they appear at the top of the list and are more visible. This means a small subset of images related to newer ads is viewed repeatedly, while the majority of stored images associated with older ads are viewed less frequently. Understanding this pattern allows us to optimize our storage strategy: storing images of newer posts in faster data storage for quicker access and better user experience, while older, less-used images can be stored in slower, more cost-effective storage solutions to balance performance and cost.
In our previous design, uploaded images were stored in a temporary bucket named temp before an ad was submitted. After the ad submission, we created three copies of the image, deleted the current image from the temp bucket, and stored it in the main ad images bucket.
The three versions were:
However, this design presented several issues:
After researching available tools and based on benchmarks, we decided to use Image Proxy. Image Proxy is a tool that performs image processing online without storing images in memory. One of its key features is that it reads images on the fly and performs image manipulations (such as reducing image quality, resizing, adding watermarks, etc.) in real-time.
By utilizing Image Proxy, we no longer needed to store three versions of each image. Instead, we kept a single version, and Image Proxy handled the image processing. By setting up three modes—Thumbnail, Post, and Manage—the desired images were processed by Image Proxy. With this tool, instead of storing three versions of images, we processed and displayed them upon request.
With this change, two significant improvements occurred:
This raised the question: by using Image Proxy and significantly reducing our storage space, we shifted the load onto our processors. How could we alleviate the pressure on the processors?
The answer lies in using a Content Delivery Network (CDN) and Cache Proxy. Essentially, we didn't eliminate storing images on disk; instead, we avoided storing all processed versions of images in object storage by leveraging CDN and Cache Proxy. One reason for this decision is the product nature of images in Farsi Directory.
Newer images or ads are always viewed more than older ones. Therefore, by using a CDN, we can cache recent images while older images naturally get purged from CDN nodes. This approach allows us to display images without processing them each time, using a self-managing cache system that requires less space and offers easier management, without needing to store three versions of every image for an extended period.
Another data access pattern in Farsi Directory is based on the geographical location of ads. For instance, if someone posts an ad in Mashhad, most views of that ad will come from users in Mashhad. Using a CDN with edge servers distributed across various geographical locations means that images of ads from each city are mostly requested from the edge server in that city (or the nearest one). This optimizes the caching of ad images.
Each CDN edge might send a request to Image Proxy to load images. So, for three versions of an image and multiple CDN edges, numerous requests could flood Image Proxy. To prevent excessive load on processors, we implemented a Cache Proxy. This means that only the first request from the CDN reaches Image Proxy; subsequent requests are handled by the Cache Proxy. We used Nginx as our Cache Proxy and configured the cache to remove old images based on policy.
The diagram below illustrates the new architecture of the image service:
This solution ensures that each image sends only one request to Image Proxy and, ultimately, one request to the object storage. The Cache Proxy's purpose is precisely this: to prevent each CDN edge from sending a request to Image Proxy, ensuring only the initial request is sent and subsequent ones are answered by the Cache Proxy. As a result, recently popular images remain in the caches, and requests are served either from the CDN edge or the Cache Proxy. Ultimately, we store only one version of each image.
By reimagining the system, we leveraged fewer resources to achieve better response times and higher service quality. As a result, we reduced our object storage space by over 58%. Additionally, in the new service, we optimized and saved 46% of processing resources (CPU) and 34% of memory resources. Collectively, these optimizations reduced our monthly infrastructure costs by about 6% at that time.
With the new architecture, image processing no longer occurred during ad submission and did not negatively impact the user experience. This change led to an increase in the number of image-containing ads on Farsi Directory, ultimately positively affecting the quality of the ads.
Special thanks to the technical team of Farsi Directory, both current and former engineers, who undertook the responsibility of this redesign and shared their experiences for this article.