Sunday, October 9, 2016

Compute Engine 101: Auto-updating instance

We're using Google Compute Engine to run one-off tasks which require strong machines (32 cores), then shut down automatically after completing the task. Previously I started those machines manually, connected to it, ran the task and waited for it to complete in order to stop the instance afterwards. It took some time until it annoyed me enough to automate all the things, but here's how it works now.

After following the instructions you'll also be able to view your instance's logs via Google Cloud Logs Viewer, which is a huge advantage! You could say we're creating a more flexible version of Heroku (for one-off tasks) here. :)

Compute Engine

Source Repository

Mirror your existing git repository using Google Cloud Repositories so you can easily access it from within Compute Engine without fiddling around with SSH keys. The code in this example assumes you call the repository "worker", but you can call it whatever you want and adapt the code accordingly.

Instance Setup

Create your instance as you would normally do, but be careful when configuring "Identity and API access". Default access does not include access to Google Cloud Source Repositories, so you have two options:
  1. choose "Allow full access to all Cloud APIs". This might be dangerous because it, well, gives the instance access to all APIs and therefore allows a hacker to run havoc if he gets access to that instance.
  2. create a new service account and give it access to "Compute Instance Admin", "Logs Writer" and "Source Repositories Reader" 

After creating the instance connect to it and install your dependencies. Here's my script installing Java, Maven and Google Cloud Logging:

Afterwards shut down your instance and configure a startup-script. Here's mine, which fetches the latest code from git and executes the program via Maven:

That's it! Your program executes at its latest version now every time you start the instance.


Implement a servlet on AppEngine, which starts the Compute Engine instance using google-cloud-java:

You could, for example, trigger that servlet via a cronjob every day if need be. Definitely make sure to properly authenticate requests to this servlet as you don't want strangers / hackers to start your instance without you knowing.

Wednesday, August 31, 2016

Bugs From Hell: Web Developer Edition

Every now and then I have the pleasure to encounter bugs which leave me absolutely clueless even after thinking about all sane possibilities. Here's one of those:

We have a landing page where the user has to enter his zip code (which is stored in cookies) and is forwarded to the webshop afterwards. In the webshop we read the zip code from the cookies at page load and fetch data accordingly (some products are not available in some areas). Every now and then no data was fetched. Why? Because the zip code was not set in cookies. Wait what? I made sure the cookie was set correctly after entering the zip code, but after page load it was not there anymore. As soon as I set a breakpoint before the cookie was accessed the problem disappeared, so I knew it was some timing-problem. The cookie was always accessible after the page was fully loaded. Was it the browser persisting the cookie "too slow"? No way, it's all synchronous.
Suddenly the solution struck me (it was one of those Matrix-moments where I was not in control of my mind)! On the landing page we're using prerender. This had to cause the problem (I still had no clue how exactly at this point)! Some googling revealed: yes it does. The browser tries to resolve cookie conflicts for you, but in that case it only made things worse it seems.

Here's some thoughts from the Chrome developers on that topic.

And here's someone else experiencing a similar (the same?) issue.

While debugging this insanity I came across a really cool library which allowed me to cross out some possibilities quite fast. It allows you to halt your code whenever a variable is accessed or modified (without changing your code!). This way I ruled out that some third-party code was modifying my cookies.

Generally speaking when debugging the most important thing is to reproduce the problem yourself. You might not be able to reproduce it consistently (I couldn't either), but you get a good idea of what could possibly be wrong. Afterwards you have to come up with possible causes for that error. Make sure not to rule something out because you think "it's impossible". Everything is possible, it's an unforeseen bug after all! Next you focus on one after another of the possible problems you came up with. Using a lot of knowledge about the parts being involved (libraries, browser, etc) you will eventually come up with a solution. :)

PS: In case you have a similar problem but don't use prerender it's probably someone who is reading the whole cookie-string (, potentially modifying it) and writing it all at once afterwards, instead of adding / modifying just his own cookie. Again, the above library quickly ruled that possibility out for me.

Friday, June 3, 2016

AppEngine 101: Datastore Consistency

The beauty of cloud solutions like AppEngine and its database, called Datastore: it just scales. It does indeed scale very well, but it does so by applying a few restrictions. In the case of Datastore that is "eventual consistency", something you're not used to when you're used to conventional databases like MySQL.

What does eventual consistency mean?

Here's a really simple example to describe it: You have a table called Messages where you store messages sent by the users of your website (a chatroom, or guestbook, etc). When the page is reloaded you query all data from the Messages-table and display it. Someone enters a message and it's stored in the database. The page is reloaded moments later and all Messages are looked up using a query. The recently stored message does not show up though. Because eventual consistency!

After changing (creating, updating, deleting) data in your database, queries executed moments later might (!) not return the latest data in some cases. Eventually, though, it is going to return those changes. It might be nanoseconds, seconds, ... later.

Your first thought might be that this is awful, however it isn't. It is what allows us to scale virtually infinitely. Eventual consistency is completely fine for lots of usecases: Facebook News Feed (who cares if those status updates show up a few moments earlier or later?), or even static data (a shop which changes its product assortment only once a week during a maintenance timeframe).
Of course there are times where consistent data is crucial: everything involving real money flowing, mission-critical data used for real time status monitoring, etc. This is why Datastore has "Transactions". Every database action executed within a transaction is consistent. However, if consistency can't be assured because someone else is changing that data at the same time transactions fail and you have to retry them for example.

For a much more detailed explanation check out this article: Balancing Strong and Eventual Consistency with Google Cloud Datastore

Tuesday, May 31, 2016

Java 101: Collections

One thing you really have to understand when learning Java are its data structures, commonly known as Collections in Java. As the name suggests, Collections describe classes which allow you to collect a certain amount of data. If you're coming from JavaScript for example, it's what you refer to as arrays and objects. There is tons of different Collections available in the standard libraries of Java - even more available from third-party sources - but there's only a few types and implementations of them which you really have to know by heart for day-to-day development. Let's discuss them briefly one by one:


A list has an order and does allow duplicate values*. The two most commonly used implementations are LinkedList and ArrayList, the former being used if you don't know upfront how much data will be added to the list later, the latter if you do know.


A set has no order and does not allow duplicate values*. You usually use it in form of a HashSet. There is nothing comparable available in JavaScript.


A map has no order and does not allow duplicate values*. HashMap is how you use it in most cases. Each value is assigned a key and can be effectively accessed using that. In JavaScript this behaves very similar to an object.

*when I talk about order and duplicate values, I'm talking about the most common implementations. Theoretically you could have each type of collection with any characteristics you want, based on its implementation.

How do you decide which one to use?

Given enough experience in hands-on programming you instinctively know which data structure to use for a certain task. Here's some rule of thumbs for quick reference:
  1. for a certain amount of data without duplicates, which you want to access effectively (e.g. use contains-method) use a Set
  2. for a certain amount of data without any further requirements (duplicates allowed, no specific performance requirements) use a List. As mentioned before, use ArrayList if you know the final size upfront, LinkedList otherwise.
  3. if you have to lookup data repeatedly (not iterate them one by one, but get a specific item) use a Map where you assign a key which is easily creatable and look up your data using that.
Here's a short code snippet which I hope makes it clear what I'm talking about:

I found that has a few nice examples for each of the aforementioned implementations. Check them out for more details on each of them.

Monday, May 30, 2016

Mac OSX Virtual Machine guest lags

If you have a Mac VM running as a guest on a non-Mac host (Windows in my case) you might experience some serious lag and graphic glitches like I did. Fixing it was pretty easy by disabling "BeamSync" - the Mac-equivalent to VSync if I understood that correctly. Anyway, you don't need it for normal Mac usage (i.e. no videos, games, etc I guess), so download "BeamSyncDropper" and keep on using your Hackintosh efficiently :)

You can download the tool + read instructions on how to use it here:

PS: Mac VMs are great for developers who only developer for iOS if they really have to! ;)