App Engine supports Python 2.5. The Python interpreter runs in a secured "sandbox" environment to isolate your application for service and security. The interpreter can run any Python code, including Python modules you include with your application, as well as the Python standard library. The interpreter cannot load Python modules with C code; it is a "pure" Python environment.
At the top of the FIRST page of documentation: http://code.google.com/appengine/docs/python/overview.html
Google Apps domains do not currently support HTTPS. HTTPS support is limited to apps accessed via .appspot.com domains. Accessing an HTTPS URL on a Google Apps domain will return a "host not found" error, and accessing a URL whose handler only accepts HTTPS (see below) using HTTP will return an HTTP 403 "Forbidden" error. You can link to an HTTPS URL with the .appspot.com domain for secure features, and use the Apps domain and HTTP for the rest of the site.
HIGHLIGHTED on http://code.google.com/appengine/docs/python/config/appconfi...
A request handler has a limited amount of time to generate and return a response to a request, typically around 30 seconds. Once the deadline has been reached, the request handler is interrupted.
and
While a request can take as long as 30 seconds to respond, App Engine is optimized for applications with short-lived requests, typically those that take a few hundred milliseconds. An efficient app responds quickly for the majority of requests. An app that doesn't will not scale well with App Engine's infrastructure.
http://code.google.com/appengine/docs/python/runtime.html#Th...
I could go on and on.. reading this I see "I wasted 15000€ by not reading the documentation"
I usually think the title technical architect is a bit stupid (my business card says I'm one, so I can say that) but this guy needs a good technical architect to make platform decisions prior to wasting that much money
We made the trade of zero platform maintenance and near-free scaling and accepted the limits placed on what we can do. Many of those limits are being lifted each month as well. I can say for sure that there are a number of items mentioned in this post that have fixes in the works right now.
I will say that choosing GAE let us ship a project much faster than we could have without it and with much less manpower. The time we spend working around limits is easily made up by not having to do IT work.
You can use all kinds of techniques to protect your application from things like variable datastore performance. For example, nowdays I usually use a pattern where user-facing servlets only ever read from the datastore - all updates go via a taskqueue, which means that even if the datastore is in maintenance mode the updates will eventually be applied.
GAE isn't for everyone, but the first step to getting the most out of it is accepting it isn't anything like a typical frontend plus database architecture.
This is the core of all their problems. It is a mindset incompatibility between these app designers and GAE.
The GAE APIs and rules are actually pretty simple and well defined. It works really well, but only if you work WITH those rules. You have to adopt the GAE application design philosophy.
If you don't, and if you work AGAINST the rules and best practices set by GAE then you are in trouble. Big trouble. This is what happened here.
I understand this is easy to say afterwards. And you can't really blame them for finding out the hard way.
Note that the same applies to for example all the great services that Amazon Web Services provides; they only work if you build your apps with the Amazon specific design approach in mind. Things like eventual consistency, expect things to fail, don't do large amounts of work in single jobs. Etc. Etc.
These appoaches suck more or less if you come from a 'total control over a bunch of machines' background. But they are so needed to scale.
Yes, there are limitations. And most of them might get you upset first, but if you "work with the rules" it is a fantastic platform.
Metric Mail has been running absolutely fine since its launch in August, it has never failed us. Even under severe traffic it scaled without any problems.
We knew from the beginning what the limitations are and this way there were no big suprises during the development.
I always think if your site can't be host by GAE, then it's probably not very scalable at the first place. I agree it might be better to host the data-processing end on EC2 or elsewhere if it's intense. Else in terms of the "View" part of your project, I wouldn't give it another thought for using GAE again.
http://www.creativeapplications.net/games/3-degrees-of-wikip...
Well, your site is basically one static page. Do you really feel qualified discussing GAE?
First off, you have a full featured framework which was designed for SQL relational databases. Many of Django's features either have to be given up, or are monkey-patched beyond belief to get partial functionality. Not to mention quite a few Django apps use database features which are simply not supported by BigTable.
Secondly, Django is not exactly the smallest framework, so loading time can be quite expensive and will be tacked on to every cold start.
All that being said, I've had good success with the tornado framework. It's fast, well written, and thoughtfully designed. Check out my profile if you want to see some examples of apps written with tornado + GAE.
There's a port called django-nonrel, which is specifically designed to work on nonrelational databases. So yes, you can't do fancy joins and stuff, but hey, most of the time you can work around that; but it's still great to be using the ORM just as you are used to it in 90% of the cases.
Although I do agree with the OP that all the rules GAE imposes can be a pain in the arse a lot of the time I really really love them. It pushes me to design my application well and to make it able to handle the load I'm expecting to be handling in a foreseeable future. If I didn't scaling even up to a few thousand users could get dicey.
1. "You're supposed to handle exceptions from the datastore? Really?" — Are you saying you don't handle exceptions when your regular relational database throws them? I've seen Sybase and MySQL blow up with depressing regularity, and have had to write exception-recovery code for them.
2. "No https with a domain?" — What stops you from making https://<your-app>.appspot.com the handler for forms which read sensitive information? You can submit the form with XHR, have the server send back a structured response (perhaps as JSON), and deal with it with JavaScript on the client. I admit that it's annoying (you need different form submit targets in development and production), but it shouldn't be a deal-breaker unless you need your entire site to use SSL.
(If you do need your entire site to use SSL, then the appspot domain limitation obviously sucks. If that's the case though, then you're probably doing something with finance, and the datastore's transaction semantics probably cannot work for you at all.)
Compare that, in terms of functionality and revenue generation, to Yahoo! Store (Viaweb), and it seems like Google really has its priorities with App Engine askew.
Obviously, the scope of Viaweb's functionality is much, much narrower than that of GAE, but what is the purpose of GAE from Google's perspective?
Surely, it is not to create bitchfests for upset developers on tech blogs. It doesn't seem like revenue generation, nor developer satisfaction is a priority, either.
That said, it's a beta product.
It always is with Google. When exactly do you expect the "rc"?
I was aware of most of the limitations of AppEngine that the author of the article mentions after just a few hours of experimenting with AppEngine. Now, AppEngine now no longer gives me many problems.
I think the lesson is to do a lot of experiments before committing to technologies.
I don't use the Python SDK. Most of what I have done has been using Java (but with small Clojure and JRuby experiments). One thing that helped was to start using Objectify instead of JDO (as an example).
At this point, I think using JDO on App Engine only makes sense if you're porting an existing JDO application to App Engine.
If you're starting from scratch, Objectify is the way to go.
This can be a good thing, if you know scalability is going to be a killer feature in the near future. It can also a real pain in the ass if it's more important to simply get something off the ground quickly and see if it has market traction, and you don't want abandon the convenient but difficult to scale practices like long running processes and JOINs. In my experience, most startups fall squarely into this latter camp. Scalability is a nice problem to have for most of us.
AppEngine for Business now has a hosted SQL mode, which presumably uses a less scalable but ACID compliant alternative to the standard GAE data store (disclaimer - I haven't used it). Since he's already throwing down some serious coin on his app on GAE it might be worth investigating that before abandoning the platform completely.
This is a pretty common pattern in software, so it might be more interesting to write an article about why they chose the wrong technology and how they stuck with it even with it was clear it wasn't built to do what they needed.
With AppEngine, I've never had to migrate a database schema, build a load-balancer, hire a fulltime sysadmin, or even pay for servers that arn't receiving traffic. I don't have to set up a large-scale deployment system, nor spin up a new database server when traffic gets too heavy. AppEngine so far has been remarkably cheap (we're starting to bring in more customers however, so we'll see how long this lasts).
Many of the challenges he mentions come down to thinking about writing a webapp with a longer-term vision in mind. Datastore limitations crop up when you outgrow your first datastore in a standard system; in AppEngine they're properly enumerated and dealt with from day 1. Likewise long-running connections become very tricky to deal with with lots of traffic... this point is a little harder to argue with the recent popularity of asynchronous-io servers, but I think Google is working hard on these limitations. SSL is just annoying; we've had to deal with this by adding an SSL proxy until Google adds SSL support -- but it sounds like Google is pretty close to solving this one (it's been promised by end of year).
Also, AppEngine is written in a very high-level way; should you reach a point where AppEngine no longer makes sense, it is amazingly easy to transition over to another system (as the OP apparently found out; I would give more credit to the design patterns inherent in the AppEngine APIs than 'TDD driven development'). Tornado, webpy, etc have virtually the same interface as AppEngine's webapp framework.
There are definitely tradeoffs when choosing AppEngine as a production backend right now, and its certainly not the right solution for every problem... but for many people, us included, its been a pretty large net benefit for our startup. Google is actively improving the system, and I expect many of these problems will go away in the next 6 months or so.
1) If you want "SQL and Joins", use SQL. This is like complaining that you can't play Halo on Linux.
1A) There isn't full text search. If you need full text search, use a system with full text search as a feature.
2) Some of the points are out of date (or will be out of date soon). The 30 second limit for cron jobs will be 10 minutes after the next release. As noted, the 1000 results per query limit is gone already.
3) Anything can fail. If you assume your own system won't fail, you're going to be in worse shape later.
4) What objects would you cache that are >1MB anyhow? In almost any case, you'd be better off caching it as multiple, smaller objects.
http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-A...
The problem we ran into with the merge join functionality was the following:
Let's say you're searching for "lcd monitor", your code could do a search for lcd and monitor then merge the result (select * from ngrams where ngram in ['lcd', 'monitor']). There are many lcd monitors so the merge join will find 1000 results very quickly.
Let's say you search for "dell monitor". Unlike the previous search, there aren't many dell monitors but there are lots of dell products and lots of monitors. Your merge join will timeout because there isn't enough time to perform a query for dell and another for monitor then merge the results because of the internal merge-join limitations.
Also, it was VERY expensive to index every document (our data is in a constant flux) so we decided to use a different solution.
#1 Has never been an issue for us #3 Is incorrect with the new task queue upgrades #6 We have a full-text system working just fine #7 Is a benefit when working with a distributed datastore #8 DB performance after the recent updates has been stunning #10 So they badly designed their queries and blame app engine? #11 Is flat out incorrect #12 What database is immune to failure? Would love to know
App Engine doesn't do everything, and no one is claiming it does. We have a secondary VPS we offload certain image processing tasks for example. But what it does do is extremely powerful from a develop perspective, and the application-centric model, like heroku or engine yard, is where things are headed. I would much rather leave the server and scaling issues to the experts so I can spend time improving my application.
I'm actually quite surprised that all those same limitations are still in place after all this time. I guess if I took a minute I could come up with an issue or two I had back then that has been fixed since, but his list of show stoppers are all things that people were complaining about, and that Google gave the impression of being on top of.
I developed an app with GAE about 2 years ago, and ran into many of the same problems (although some of the limits were probably lower then). Fortunately I could work around them, and the app wasn't used by tons of users anyway. I can see how it would be a serious problem otherwise, though.
Granted we don't have a need for SSL and not being able to use C libraries in python has caused us many hours "pain", but compared to the alternative for a small company like we have, it's well worth it.
My biggest issue with AppEngine is that there's no full-text index functionality, and there's no way to create your own. We've tried everything, and nothing works if you have millions of documents like we have. Our search is still external to AppEngine but we're hoping that Google will do something about it sooner or later.
Certainly ironic for a Google platform to be so bad at search.
I'm almost sure, that i can run the same amount of traffic from $100/month dedicated server.
If you're small, GAE is free, but then, you could host anywhere for peanuts - just buy a linode or a small EC2 instance, it doesn't really matter.
Once your site becomes big, cost is going to matter and GAE is as expensive as anyone else, last time I looked, quite a bit more expensive.
So there is really not much advantage, it is only free when it doesn't matter, and when it does you're going to start looking for the exit pretty quickly, not just because of cost but because GAE imposes all kinds of limits which may make dealing with your problems much much harder.
Yes, existing techniques for full text search works or things like geolocation queries won't work but there are other[1] techniques[2] that work just as well; it's just not the sql way. Basically, support for multiple set membership queries against a list of tags stored with entities is extremely powerful and if you index properly, you can do a lot of cool things [3]. Plus, you can do datastore queries in parallel [4], which means you don't have to denormalize as much as you think; just parallelize and memcache results; e.g for a complicated front page, you can fetch different types of content in parallel.
The local server behaves remarkably the same as the deployed server, it's quite rare I find a situation where something behaves differently in production. the entire datastore can be tested locally, including complicated schemas / indices / queries in fast running unit tests. This means when I do need to do something fancy with the datastore, I can fully test it with unit tests and be confident it will work when deployed.
Long running tasks can always be broken up using the task queue. the limit will soon be 10 minutes for individual tasks and cron jobs [5]
I agree that cold start is a huge issue, but looks like it is being addressed in the 1.4 release [5] where you can pay for 3 reserve instances at all ties. Lack of support for https on your domain definitely sucks too, but I don't see how he wouldn't have been aware of that before going with GAE.
Finally, there are a number of things that are a huge time / money savers: - really easy deployment process including support for multiple versions. This let's you have staging instances and quickly roll back to a previous version if there are any problems - a nice admin console with a number of tools, including comprehensive access to logs that are coherent across all instances - some really nice libraries for examining performance of datastore queries and other api calls [6] and getting daily email reports of any exceptions [7]. these are built using hooks available to you in case you want to build something similar (for instance I used hooks to have regression tests on the number of datastore queries each page requires). - the services and apis made available are really nice. for instance, the image hosting infrastructure that provides fast access to different sizes for a stored picture based on a url is pretty slick; they basically opened up the same infrastructure that is used by picasaweb to app engine users - virtually no hosting costs until you get a lot of traffic. thousands of daily visitors is still in the free range
That said, my biggest outstanding gripes:
- cold start problem (until 1.4 is out)
- datastore latency spikes sometimes. this has gotten a lot better in the past few weeks, but I'll still have this gripe until I see it more consistent for a couple months
- no support for incoming emails with attachments > 1mb (makes incoming photos from smart phones impossible since they are usually > 3Mp these days)
- no support for long polling (upcoming channel API seems to be more for chat rooms than for general purpose server push) [8]
[1] http://www.billkatz.com/2009/6/Simple-Full-Text-Search-for-A...
[2] http://fluffybunnysoftware.com/node/8
[3] http://code.google.com/events/io/sessions/BuildingScalableCo...
[4] http://code.google.com/p/asynctools
[5] http://groups.google.com/group/google-appengine/browse_threa...
[6] http://googleappengine.blogspot.com/2010/03/easy-performance...
[7]http://code.google.com/appengine/articles/python/recording_e...
[8] http://bitshaq.com/2010/09/01/sneak-peak-gae-channel-api
http://code.google.com/appengine/docs/python/mail/overview.h... http://code.google.com/appengine/docs/java/mail/overview.htm...
Any links? I am saving both pics and thumbs and that info would help me a lot.
http://code.google.com/appengine/docs/python/images/function...
google is also adding sql database capabilities to the platform soon.
and google apps for business will eventually let you talk https on your own domain (at a price yet to be determined).
i suspect they're about 6 or 8 months from becoming a solid solution to many problems.
See under the section "Enterprise Features".
Note that this is a page about GAE for business, so I wouldn't be surprised if SQL databases require a billable account similar to the Blobstore - don't think anything has been explicitly stated one way or another as yet though.
The reason I'm sure is that I just tried it:
entities = data.MyModel.all().fetch(1010)
print len(entities) # Prints 1010
Then I thought that perhaps you meant the limitation still exists in the low-level datastore API — and it's worked around by the Model interface making multiple calls to the low-level API — but that's not true either: from google.appengine.api import datastore
entities = datastore.Query("MyModel").Get(1010)
print len(entities) # Still prints 1010
So: what do you mean?http://code.google.com/appengine/docs/python/datastore/query...
Where does it say that 1000 is the limit?
:-)
Any plans to do a RailsTutorial-esque Py/Dj/Git/Djangy tutorial (as opposed to Rb/Rls/Git/Heroku)?
It's now been ages since I last looked at Amazon's offerings; does anything have any links to best practises / development strategies for either AWS or GAE?
Personally, I recently had my fair share of problems with the DownloadError, but I knew it would come up (because I read the documentation - and I concur with the limit for scaling reasons). So I built myself a failover decorator relaying the failing requests to a VPS.
Otherwise, I love GAE!
1. For example the top commenter nl with 83 points: http://news.ycombinator.com/item?id=1928148
Also at first glance there is no indication of how the author got to a value of 15k€. My best guess, and a guess at that, is that they put the value of a line of code at 1€ and had to migrate 15k lines, but I hope there is more scientific than than.
After using AWS for some time now, with things like CloudFront, S3, Elastic Map Reduce and managed Mysql instances and AutoScale (and hey, they've got servers in Europe), App Engine really feels like a half baked toy.
I mean, if I want to eat a cake, and I know how to make a cake, I'm not going to pull up a recipe for brownies and try to make a cake out of them. This guy made some really bewildering choices; the list of things he clearly wants to do is both silly and frustrating.
"I want to write a site using Django that uses full-text search and multiple-table JOINs so that it takes longer than 30 seconds to load a single page, and Google App Engine won't let me!"
Brilliant.
Shouldn't there be a checklist to show "What applications can be moved to cloud (appengine here)? including the myths, expectations, assertions?"
i originally also considered gae for my project, but decided against it, my impetus was that i wanted to use a homegrown best of breed stack: tornado, mysql, nginx, memcached, python2.7 and have more control over the environment.
As someone who's considering whether going w/GAE or custom, the points he makes are totally valid and applicable.
http://code.google.com/p/memcached/wiki/FAQ#What_is_the_maxi...
"This is a preview release of Google App Engine."
My biggest gripe about GAE (and Google in general) is that when a change is made on Google's infrastructure that causes large problems, no acknowledgement (or answer) is made until enough people complain.
my buddies and i are working on a time manager for college students and we are having a hard time deciding between RoR and GAE. any insight would be helpful.
we are looking for: login manager database capable of up to 50 fields for each user sorting and search capability must cost less than $10/year/user.
ps. if you are someone interested in solving this problem, please contact me directly at hn@sahajsingh.com
(btw: non western myself)