A high cost in both latency and price
In this blog, we discuss the high cost of DynamoDB as a feature store in both price and latency.
DynamoDB Feature Store Benchmark
A recent benchmark by Tecton, showed that it has high performance - but only if you measure on the axis of throughput. As all good engineers know, performance is both throughput and latency. But more on that later.
The recent benchmark was built on DynamoDB on-demand tables that support auto-scaling - you pay for what you use, which is essential. The benchmark had 6,500 features for three models, with a total of 20-40KB per feature vector, all spread over 65 DynamoDB tables. The benchmark showed that they could scale to 100,000 feature vector reads per second, which was achieved. But at what cost?
At what Cost - Latency?
Due to the fanout - when you have a dynamo table for each feature view, and your features are spread over many tables, you are hit by the tail at scale - you are only as fast as the slowest DynamoDB worker to respond. So even if DynamoDB quotes 9-14ms as typical latencies (here, the p50 was 14ms), the p99 latency over many workers is much higher. In fact, the p99 latency was ~60ms.
However, for many high end use cases, like recommendation systems and online fraud, applications can tolerate at most single millisecond latencies for feature vector lookups. For this reason, Lyft put a Redis cache in front of their DynamoDB backed online feature store to reduce latency, massively increasing its operational complexity.
At what Cost - Price?
In total, the recent benchmark produced 3.3m ops/sec on DynamoDB (this is the key figure, as this is what incurs cost) and 100k feature vector lookups. The cost of running the benchmark is roughly $2.2m per month for reads ($0.25 per million read request unit up to 1KB, 31 days/month). You could potentially bring down your DynamoDB costs by paying for provisioned DynamoDB, but then you no longer pay per use. You could also drastically reduce the benchmark price by pre-computing aggregations using a streaming application, like Spark Streaming or Flink, instead of computing aggregations on-the-fly from events in DynamoDB. The result here was that only 100,000 feature vectors were served per second from 3.3m reads/sec on Dynamo. Of course, DynamoDB’s costs for writes are even higher than for reads, so if you are writing lots of pre-aggregated events per second, you will have a very high write cost as well.
Horror Story with the Sagemaker Feature Store
From the recent feature store summit, we learnt that Sagemaker Feature Store uses DynamoDB as its Online Store. We heard this story from a reader whose company decided they would try out the Sagemaker Feature Store, and proceeded to write 350GB of data to it. The bill for this one-off data dump came to $35,000, and they were shocked! They didn’t read the fine-print that if you use the ingestion API to backfill historical features, as it is designed to be used, you will be charged $1.25 per million write requests units for DynamoDB. The company was inefficient in their use of DynamoDB - ideally, they should have written 1 KB per write request, but instead they just wrote the feature values as the API suggested they should, resulting in lots of small key-value pairs. So the lesson here is - don’t just use the Sagemaker Feature Store API because their documentation tells you to do so. In fact, the DynamoDB ingestion API is so expensive that their solution engineers explain how to keep costs down by not using the Sagemaker Feature Store API as it was designed to be used and instead backfilling data by writing files to S3 and then reverse-engineering the creation of feature groups.