It looks like stormy weather in the clouds of Google.
Their Developer Blog announced yesterday they no longer are charging for Datastore CPU costs because of performance problems.
As many of you know, App Engine’s Datastore performance has been seriously degraded over the last few weeks. In addition to May 25th’s 45 minute Datastore outage, applications have seen an increased latency and thus errors as a result of timeouts. As a rough estimate, we have seen Datastore latency increases of around 2.5x.
They explain this problem as a byproduct of their own success.
There are a lot of different reasons for the problems over the last few weeks, but at the root of all of them is ultimately growing pains. Our service has grown 25% every two months for the past six months.
Stock image from PhotoBucket:
Congratulations might be in order until you realize they also are announcing that they watched a problem coming for six months yet kept adding accounts…now that services are failing and systems are down something will have to be done about it. A cost change is an interesting way of trying to compensate for the mistake until things improve. However, it seems a lack of foresight is what really needs to change.
The site says fees will return when performance is at a level they “consider acceptable” or when they “are proud” of the service. In the meantime, they “appreciate your patience”. These phrases ring hollow to me, especially when compared with the more precise language and data offered at the start of their announcement. It sounded better when they said problems are expected for the next two weeks but not longer. It also would sound better if they said fixing the problem is just a start; they next will work on how to address issues more proactively.
Security management requires system availability and recovery to be measured in order to be proactive. A time objective (in this case two weeks) and a point objective (what is acceptable?) have to be documented and tested at least annually. These tests can help find problems and create solutions before a real outage occurs. This is a known internal IT requirement and so nothing less should be expected from a cloud.
One thought on “Google App Engine Pain”