Frogatto

Frogatto Ported to the Blackberry Playbook

March 5th, 2012 by Sirp

With some wonderful help and support from Research in Motion, we’ve ported Frogatto to the Blackberry Playbook. The Playbook is the perfect size for playing Frogatto, fitting snugly into one’s hands with the controls within perfect reach.

Frogatto is going to be demonstrated by Research in Motion this week at the Game Developer’s Conference.

Porting to the Playbook has been remarkably easy. It should be available on the Playbook’s app store soon!

Posted in Uncategorized | 5 Comments »

Multiplayer Development Begins!

October 14th, 2010 by Sirp

I’ve started work on implementing multiplayer support for Frogatto. We think there is great potential for multiplayer in Frogatto:

Races: levels where players must race each other to the end
Deathmatches: levels where players must defeat/kill each other. These will probably be done with some kind of twist, such as only able to hurt the other player by picking up objects and spitting them at them.
Co-op play: levels where players must play together, helping each other to win.

Of course, none of this can be done until we have the foundation of a solid multiplayer experience in place, so that’s what I’m working on. The big goal is to provide an experience where the game continues to run at 50 frames per second, with little or no perception of lag.

To achieve this, the connection between two machines has to have as little latency as possible. We want to use a peer-to-peer system where each machine in a game has a direct connection to other machines. To achieve this I’ve implemented UDP Hole Punching so that even machines behind firewalls can establish peer-to-peer connections.

I think that a reasonable goal is that two machines which can get a round trip ping of 100ms or less should be able to play a smooth game of Frogatto. This means that two people who are both in North America, or two people in Europe, can probably play a game, as long as they have decent connections. A trans-Atlantic ping under 100ms is unlikely though, so, players on different continents will probably have some lag. Because of this, a secondary goal of the system will be to degrade as gracefully as possible.

So what does the architecture of multiplayer look like? There are a number of ways we could implement multiplayer: one would be to send position and state information about many of the dynamic objects in the level across the network. This could be made to work quite well, but uses a lot of bandwidth, and would be somewhat laggy. It would also be significant overhead to implement: every time we add new state changes for objects, we’d have to worry about making sure they get synchronized across the network properly.

Instead, we choose a different approach: in Frogatto we’ve worked carefully to make sure that all actions taken are deterministic based on their input. That is to say, if you play a Frogatto level multiple times and press the same buttons at exactly the same times, your play through will be exactly the same.

This means that all we have to do is send the other machines the button presses we’re making on our machine and the games will remain in-sync. Of course, sending a message will take some time, so in multiplayer we can’t quite expect a key press to be instant. Fortunately, we’ve done some testing and found that if we introduce a 60ms delay before a key stroke is recognized in Frogatto, it’s bare noticeable, and if we introduce a 40ms delay, it’s not noticeable at all.

So, in multiplayer, when a keystroke is made, the game will delay the time before it is put into effect by a small amount of time. When the game is started, the peers will do some tests to estimate their latency. Based on their latency, the keystroke delay will be determined. Hopefully the delay chosen will be as small as possible while still allowing packets to arrive in time.

It’s also very important that games start in synchronization — at the same time. If one system is ahead of the other, then the system that is behind will be less likely to get its data to the other system in time. So, when the game is started, the hand shaking process does its best to co-ordinate the systems to start the game at exactly the same time.

Now, it’s always possible that packets will get lost or arrive too late. Every frame we send a packet with the keystrokes for the current frame, but we also send the packets for previous frames. Additionally, we tell the other systems the furthest data of theirs that we have confirmed, so systems know they don’t have to send data for frames earlier than this. This takes care of redundancy.

Now, there is the problem of a machine getting to frame N, and only having keystroke data from its peer for frames up to frame N-1. When this happens, the machine simply waits to calculate and render the frame until it has the data. Hopefully the data will arrive within a few milliseconds, and we could possibly catch up with little or no problem. But sometimes it doesn’t. When this happens, the machine will lag, perhaps significantly. Then of course, it will be behind the other machine, and it’s likely the other machine will have the same problem.

Because both machines will be forced to lag, if it’s due to an intermittent network problem, hopefully the problem will correct and we’ll quickly get back to an equilibrium. Of course, sometimes there might be too many problems — late and lost packets — to continue at all, in which case we timeout and the game is terminated. Most of the time though, the game may be continued in some form.

Code to do all this has been checked into SVN, and Jetrel and I played our first multiplayer game last night. Hopefully in not too many more releases, it’ll be ready for prime time! 🙂

Posted in Uncategorized | 9 Comments »

Randomized Arcade Levels

August 14th, 2010 by Sirp

One thing we’re working on for Frogatto is “arcade style” levels. Arcade style levels will be never-ending levels that are randomly generated, and which have making a high score an objective. We’re hoping this will allow for a fun alternative way to play the game.

As part of this, I’ve added some nice support for making randomized levels to the Frogatto editor. The editor allows you to split levels into segments, just like this:

Segment Editor

The yellow segment in this screenshot is the currently selected segment, and then the red segments are all segments which can go directly after the selected segment. Then, Frogatto Formula Language commands are available which allow the level to order the segments randomly, making sure that one segment only follows another where appropriate.

Of course, in some cases we might need to augment this basic logic with a little more sophistication: making the game get harder as the player goes further, for instance. But this can always be done by adding FFL to the level.

Hopefully with this system in place we’ll be seeing some fun arcade style levels in future versions of Frogatto.

Posted in Uncategorized | 5 Comments »

Frogatto Available on the iPhone App Store

July 22nd, 2010 by Sirp

Frogatto has been released on the iPhone App Store!! Please show your support for our project by purchasing it.

We’re having an introductory sale with 60% off the planned regular price, so get in fast!

Posted in Uncategorized | 6 Comments »

How Frogatto Formula Language (FFL) Works

July 18th, 2010 by Sirp

In Frogatto, we use Frogatto Markup Language (FML) to describe most game data — how levels are laid out, rules for laying out tiles, for defining animations for objects and so forth. For those of you familiar with Wesnoth and WML — or those familiar with XML — FML will seem very familiar. It’s a simple data language, like XML. Here is a small snippet of FML which defines one of Frogatto’s animations, to show you what I mean:

[animation] id=lookup rect=37,233,68,265 frames=1 duration=3 reverse=yes [/animation]

FML, like WML, works very nicely for describing data. However, in Frogatto we really wanted objects to have very flexible behavior — for instance, consider all the complexities of an object as simple as an ant in Frogatto. It has to walk back and forth, fall off cliffs for a red ant, but turn back for a black ant. Then when it is spat out it has to fly through the air, bounce off surfaces, finally ending up on its back and then spring back onto its front.

Describing behavior like this in FML would be far too awkward, at least without hard wiring a lot of core behavior into the engine, and we really wanted Frogatto’s engine to be a very general engine.

So, in addition to FML, Frogatto has an embedded language called Frogatto Formula Language (FFL). FFL is designed to work like a mathematical formula. That is, a formula has inputs, makes calculations, and then results in outputs. A formula itself can’t actually do anything — that is, it can’t modify the environment around it — it can only return a result. Just like a mathematical formula.

Let’s jump into an example of how this works in Frogatto. You know those big metal balls that are thrown by the psychotic bunnies? Here is a (slightly simplified) code snippet which controls what happens when they collide with a wall:

on_collide="set(velocity_x, -velocity_x/2)"

This is taken from the metal_ball.cfg file which defines the metal balls. Now, the first thing I want to note is that the on_collide=”…” part that surrounds this is FML. ONLY the part that is inside the quotation marks is a formula; a formula written in FFL. This defines the event handler that is run whenever the object receives a “collide” event.

What the formula is designed to do is cause the ball to be reflected off the wall, but its velocity dampened by half. One important thing to understand is that the formula itself does not modify the object’s velocity: a formula itself cannot modify its environment. Rather, the formula is evaluated and returns a command object. The game engine executes the command that the formula returns. In technical terms, this makes FFL a pure functional language.

There is an entire API of functions that are exposed to the formula system that will create different commands to return to the engine. The set() function, which creates a command to set the value of an object property, is probably the most useful, however there are a whole host of functions such as spawn(), which spawns a new object (useful, for instance, for an object which shoots projectiles), a teleport() function which allows teleporting the player to a different level, and so forth. The game engine actually comes with a utility that outputs all of the available functions.

Of course, in addition to functions that can create commands, to do useful things we need to be able to inspect the state of the game. For instance, in the above formula we had to read the object’s “velocity_x” property to find its horizontal velocity. When an event handling formula is executed, it gains access to all of the properties of the object receiving the event. Objects are pretty complex entities, and they have properties ranging from their velocity and location, to their hitpoints, current animation, a store of user-defined variables, the level they are located in — from where you can access all the other objects in the level — and so forth. Fortunately we have documented all the available object properties.

Now that we’ve seen the basics of formulas, let’s look at a slightly more complex example. This is how the Frogatto object handles collision with a wall:

on_collide="[set(velocity_x, 0), if((animation in ['jump', 'fall']) and can_slide(self) and abs(velocity_x) > 200, [animation('slide'), set(velocity_y, 0)])]"

First things first: this formula returns a list of commands to execute rather than a single command. Lists in FFL are delimited by [ and ]. The first thing this formula does is simply sets Frogatto’s horizontal velocity to 0 when he collides with a wall — if you run into a wall with Frogatto he simply stops rather than bouncing off.

The second part of the formula is to handle Frogatto’s ability to grab onto walls if you jump into them. It makes use of the if() function which is a built-in FFL function. if(cond, x, y) will evaluate to x if cond is true, and y otherwise. You can also use simply if(cond, x) which results to null if cond is false.

We can see that Frogatto will only cling onto a wall under certain conditions — if he’s in a jumping or falling animation, if a custom function defined as “can_slide” is true, and if his horizontal velocity has a magnitude greater than 200. If the criteria to cling onto the wall is met, we will return a command for him to enter his slide animation, and a command to set his vertical velocity to 0.

As we can see, FFL is fairly powerful, but hopefully not too difficult to understand. Using it, we can define objects which have all kinds of sophisticated behavior. Of course, another key element is understanding what events are triggered on an object. We have events that are triggered automatically, periodically, events that are triggered upon collisions of different types, events triggered when the level is loaded, when the player first approaches an object, and so forth. All the events are documented here.

I’ll give one more example of the kinds of things FFL can do:

timer_frequency=50 on_timer="[set(brightness, brightness*4), schedule(5, set(brightness, brightness))]"

This makes it so a ‘timer’ event is received by the object every second. (Frogatto runs at 50FPS). On this timer event, the object is made to be four times as bright as normal for a tenth of a second (five cycles). Note that use of the schedule() function. What schedule does is takes commands and schedules them to be run some number of cycles in the future, rather than right now. So, we are setting brightness to 4x what is it now, and then scheduling it to be set back to its current value 5 cycles in the future.

Note how the entire formula is evaluated, and THEN the commands are evaluated.

FFL is a general formula language, and though its main use in Frogatto is for object events, that’s not its only use. It is also used, for instance, to draw the HUD which shows Frogatto’s status, health, coins, score, and so on. In that context, FFL is still used but it has access to a different set of properties and a different API — an API designed for drawing elements of the screen rather than controlling objects. It’s also used to allow custom scripts that can be used in the Frogatto editor. One can write a script in FFL that applies some effect or another on a level — such as rounding out a cave level, or adding trees to a grassy level. One could even write FFL to try to make an automatic level generator!

I’m hoping this gives a good overall view of FFL, and what it can do. The Frogatto engine is very moddable, and we’re hoping people will try to develop their own concepts — from fixing bugs in Frogatto, to adding new features, such as new objects or better behavior of existing objects. But one could also implement a completely new and different game. We very much welcome people trying to develop their own game concepts with FFL — if you do please come and share them with us on the forum. Good exercises would be to implement a game in the vein of Pong or Space Invaders of Pac Man.

Posted in Uncategorized | 3 Comments »

Frogatto 1.0 Released!

July 13th, 2010 by Sirp

Frogatto 1.0 has been released! Frogatto is now a fully playable, fun, classic adventure platformer with over thirty levels, version 1.0 representing the first stable version of the game. Available for Windows, Mac, and in source form on our download page. Frogatto is also soon to be released on the iPhone App Store.

Frogatto also contains a fully functional editor that allows you to create your own fun levels or edit existing levels. The engine is heavily moddable, and as an Open Source game we welcome new talented contributors.

After trying Frogatto, consider visiting our forum to discuss the game.

Posted in Uncategorized | 34 Comments »

How C++’s vector works: the gritty details

November 17th, 2009 by Sirp

The C++ Standard Template Library (STL) contains a powerful set of data structures and algorithms. However, they are often misunderstood. I think to be an effective C++ programmer, one has to understand how the STL works, and so today I’m blogging about how its most commonly used component, vector, works. My explanation is more oriented toward the practical, and typical implementations, rather than the weird corner cases that the C++ Standard might allow.

What is a vector? It is a dynamically allocated array of elements. Whenever you want an array in C++, a sequence of elements, vector is probably what you want to use. However, this post isn’t meant to be too remedial. I’m going to assume you’ve used a vector once or twice before, and step in to show you how it works, under the covers. Once you understand vector thoroughly, it is much easier to understand the operation of other C++ components.

Let us take a simple example:

vector<int> v; v.push_back(4); v.push_back(3); v.push_back(8); v.push_back(5); v.push_back(2);

There, we started off with an empty vector of integers, and then added some numbers to the vector. Now let’s look at how the vector represents this in memory:

Vector's memory layout

On the left, we have the actual members of the vector object itself, three pointers, begin, end, and storage. On the right we have a dynamically allocated buffer that the vector stores its data in. It’s important to understand what these three pointers are, so I’ll explain them:

begin points to the start of the buffer. It is fairly easy to understand. You can get it using the begin() member of vector.
end points to one past the end of the valid, initialized elements of the vector. Note that it points one past the end, making it so that end – begin = size. You can retrieve the end pointer using the end() function and the size() function is the same as evaluating end() – begin().
capacity pointers to one past the end of the buffer. The dotted square in the diagram above represents this position: one past the end of the buffer, and not a legal part of the buffer. It tracks how much room for growth the vector has in its current buffer. The capacity() function gives you the size of the buffer. i.e. capacity – begin.

Note carefully the pointers that point one past the end. This is a common idiom in C++ and the STL in particular.

Note that a vector always has exactly one buffer that it stores all data in. (Exception: an empty vector won’t have any buffer at all). Note how we have put five integers in our vector, but the buffer has room for up to eight. Each time we call push_back(), adding a new element to the vector, the end pointer will move one step forward.

What happens when our buffer runs out of space to store new elements? A new buffer will be allocated, twice the size [1] of the old buffer, all of our elements will be copied into the new buffer, and then the old buffer will be destroyed. Suppose on our vector from above we now made the following calls:

v.push_back(9); v.push_back(10); v.push_back(12);

Here is what our vector looks like now:

Vector at full capacity

The vector is now at full capacity, end == capacity. The vector cannot perform another push_back in the current buffer. But suppose we push_back() nevertheless:

v.push_back(1);

The vector will allocate a buffer of size 16, and copy all elements to the new buffer, releasing the old buffer.

Vector after reallocation

This process can, of course, continue indefinitely. A vector can grow to any size, constrained only by available memory and addressability.

Now, let’s talk about performance for a moment. Intuitively, to many people, vector is not very efficient. Every time its buffer runs out of space, it has to copy all of its elements over, and this means if you’re adding a lot of items to your vector you’re going to do a lot of copying of buffers, and this is going to be terrible for performance, etc etc.

In actual fact, it’s not nearly as bad as most people intuit. To begin with, let’s think about how many elements the vector will be unnecessarily copying. Suppose you grew a vector to the size of 1024 elements, your vector’s buffer would be full. Now add another element, and the vector has to copy its buffer of 1024 elements over. Additionally, on the way to growing to size 1024, you had to copy as many as 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 = 1023 elements. So, to grow to 1025 elements, the vector has to make as many as 1023 + 1024 = 2047 element copies in overhead. More generally, if you grow your vector to size N, the vector might be making as many as 2*N element copies when it reallocates buffers, and on average will make 1.5*N such element copies. Note that the key to all of this is how vector grows exponentially. If you implemented vector by simply adding, say, 128 elements to its buffer every time it has to resize, instead of doubling its buffer, then the number of copies and performance would be terrible.

Is this really very bad? Actually, almost certainly not. Unless you are storing a bulky object that is expensive to copy in your vector, the copying overhead is likely trivial in comparison to the performance benefits of vector.

Formally, calling push_back() on a vector runs in O(1) amortized time. What does the amortized bit mean? Usually if you push_back() on a vector, it’s a trivial operation, that is clearly O(1) time. But occasionally it has to resize its buffer, and that takes O(N) time. However, if you amortize the cost of the buffer resize over subsequent operations, the average is still O(1) time. This generally means you can consider a vector’s push_back() to take O(1) time, however, there are occasional cases where it could hurt you. Suppose you were storing a huge vector — to build up a record of a player’s stats for instance — and suppose this vector was appended to every game cycle. Once it grew sufficiently large, the push_back() might actually take a significant amount of time. This might cause occasional delays in gameplay. In such a case, it would be a good idea to use another data structure, such as a deque. This problem is very rare in practice, though.

This does bring us to an important topic though: What should you store in a vector? Built-ins like integers can be stored easily, of course. But what about objects of class type? Suppose you are programming a particle system and have a particle class, and want to store a long list of particles, what is the best way to do it? One way is to store a vector<particle>, another is to store a vector<particle*>
— that is, don’t store the particles themselves directly in the vector, but store pointers to particles.

How do you choose which? Generally, the larger and bulkier the object is, the more likely it is you want to store pointers to it, rather than the object itself. Storing a vector<int*> would be very inefficient, since the pointers would be as large or larger than the integers and you’d have to have the overhead of the memory allocations too. But for a large object, like Frogatto’s custom_object class, a vector<custom_object*> is probably what we want. Note that to store an object directly, it must be copyable, i.e. have accessible copy constructors and assignment operators.

Note also that if you store a vector of pointers, the vector will not manage the memory pointed to by the pointers. If you want the object’s memory to be managed for you, you could use a vector
<boost::shared_ptr<particle> > to have a vector of ‘smart pointers’ that manage the memory they point to.

Now, let’s move on to some more things you can do with a vector. Let’s look at how we would iterate over all the elements of our vector and sum them up:

int sum = 0;
for(vector<int>::const_iterator i = v.begin(); i != v.end(); ++i) {
  sum += *i;
}

Now what is this ‘vector::const_iterator’ thing? Well, an iterator is a concept the STL introduces, intended to generalize the concept of a pointer. A pointer works well for moving over elements, inspecting them, modifying them, etc, but only if your underlying storage is an array — a flat buffer of data. If, in C, you wanted to iterate over a linked list, for instance, you’d likely have to write a loop that looks something like this:

for(node* ptr = list->begin; ptr != NULL; ptr = ptr->next) { ... }

…and then differently again for a data structure like a deque, and so forth. An iterator is a type that looks and behaves like a pointer, providing either all of a pointer’s operations, or at least a defined subset of them. The idea of an iterator is to allow access of members of a data structure using a uniform syntax, regardless of the data structure’s underlying implementation.

Thus, just think of a vector::const_iterator as behaving exactly like a const int* does.

Another important concept to understand regarding vector is known as iterator invalidation. Remember how when we push_back() on a vector, and it runs out of space, it’ll reallocate the buffer? Think about if you had a pointer to one of the elements within the vector. That pointer would now point to the old buffer, the one that has been destroyed.

In C++ terms, calling push_back() on a vector invalidates all iterators into that vector. Once you call push_back() all iterators you have into the vector are unusable, and the only thing you can legally do with them is reassign them to a new value. Reading or writing to the iterators may cause the program to crash, or a host of other nasty behavior.

This effect can be rather subtle. For instance, we had code something like this in Frogatto to detect collisions between objects:

for(vector<object_ptr>::iterator i = objects_.begin(); i != objects_.end(); ++i) {
  for(vector<object_ptr>::iterator j = objects_.begin(); j != objects_.end(); ++j) {
    if(i != j && objects_collide(*i, *j)) {
      handle_collide_event(*i, *j);
    }
  }
}

This code iterates over every object pair to see if there are collisions. Simple enough, right? Now what if inside an object’s collide event it spawns some new objects, and that spawning adds the new objects to the objects_ vector? Then the objects_ vector’s iterators are all invalidated, including i and j. We are still using them in our loops though! To make matters worse, it only occurs if the new objects happen to trigger a reallocation of the buffer.

Note that you can iterate over a vector using indexes, which a lot of people find easier/simpler than using iterators:

for(int i = 0; i < v.size(); ++i) { ...use v[i] instead of *i... }

This has similar performance, but note it’s less general. You can’t use this approach to iterate over most of the other STL containers. In Frogatto, we have a foreach macro we use for most simple iterations.

So far we’ve covered growing a vector using push_back, and iterating over it. Let’s look quickly at the other operations vector supports:

pop_back(): This is the inverse to push_back, taking the last element off the vector
resize(): This lets you change the size of the vector to any size you want.
reserve() : This changes the capacity of the vector. Note that this doesn’t change the vector’s size, it just changes the size of the underlying buffer, to give more room for expansion of the buffer before the buffer has to be resized. Unlike calling resize(), this doesn’t change the behavior of the program, just the performance.
insert()/erase(): These operations allow you to add new elements anywhere in the vector, not just at the back. Note however that these operations take O(N) time.
front()/back(): Convenience operations to look at the first and last element of the vector

Remember that vector represents a contiguous buffer. If you want to erase an element in the middle of the vector, all the elements in front of the erased element will have to be shuffled back.

As an example, suppose you had a vector of particles, and wanted to remove all of the ones that have expired. Do not do this:

vector<particle>::iterator i = v.begin();
while(i != v.end()) {
  if(particle_expired(*i)) {
    i = v.erase(i);
  } else {
    ++i;
  }
}

Note that this code does do the correct thing. erase() will invalidate the iterator being erased, but it will return a valid iterator to the next element. The loop carefully moves over all particles erasing the expired ones. But the performance of this is potentially terrible. Suppose we had a million particles in our vector, and they have all expired. We’ll be doing around half a trillion particle copies, just to empty out our vector!

So how should we do this? There is an algorithm supplied in the STL designed to do exactly that, called remove_if. This is how you do it:

vector<particle>::iterator end_vaild = remove_if(v.begin(), v.end(), particle_expired);
v.erase(end_valid, v.end());

How does this work? Firstly, all of the STL algorithms operate on iterators not on containers. If they operated on containers, you’d have to have a version for each type of container. So, the remove_if algorithm takes an iterator to the beginning, and then to the end of the sequence we want to operate on. It also takes the function to call on each element to see if it is to be removed.

Remove_if efficiently shuffles all elements that should not be removed forward, overwriting elements that should be removed. At the end of the call to remove_if, all the remaining elements are at the front of the sequence. Let us illustrate this with an example, suppose our vector contains particles with the following ID’s:

[4, 8, 2, 12, 19, 3, 7, 18]

Now suppose all particles with ID’s lower than 6 are being expired. After the call to remove_if, our vector will now contain this:

[8, 12, 19, 7, 18, ??, ??, ??]

See how everything less than 6 has been removed. Everything over 6 is now at the front of the vector, with the order maintained. However, the size of our vector hasn’t changed — because remove_if only has access to iterators, and iterators can’t change the size of the vector they point into — now at the end of the vector are some ‘garbage’ undefined values.

Fortunately, remove_if provides a convenience way to resize the vector and remove the garbage. It returns an iterator to the end of the valid values. So we use this iterator to remove all the invalid values at the end, with our erase call.

One final operation vector has which I want to talk about is swap(). You can call swap() on two vectors and they will be efficiently swapped. That is, they will simply swap the pointer values they have. This is useful in a variety of situations, for instance to ‘move’ a vector into a new location without the expense of a copy. Also, we have discussed the way a vector grows its buffer. Yet if you call shrinking operations such as resize() with a smaller value or pop_back() or even clear(), a vector never shrinks its buffer. So if you call push_back() a million times on a vector, then call clear(), the vector will still hold a huge buffer of memory. This is probably a reasonable decision, since any shrinking would risk pathological situations where a vector keeps growing and shrinking its buffer. However, it is useful to be able to shrink a vector by hand. Here’s a way you can manually clear a vector, so it has no buffer at all:

{ vector<int> tmp; tmp.swap(v); } //tmp is destroyed here, taking the swapped buffer with it.

There’s a lot more to understand about vectors, and all their implications. Hopefully this gives a good overview. Unless you have a very good reason not to, vector is generally the best container to store things in in C++. It is efficient and compact, and generally gives the best real-world performance of any container.

[1] Actually, the C++ Standard only requires that a vector grow exponentially, it doesn’t specify the exponent. So a vector could make its new buffer four times the size of its old buffer, for instance, or one-and-a-half times. But all implementations I know of double it each time.

Posted in Uncategorized | 61 Comments »

Unit Tests in Frogatto

November 5th, 2009 by Sirp

People who are familiar with any of my projects probably know that I am not always fond of using existing tools and libraries, often preferring to implement things myself, for one reason or another. This is particularly so if I feel a certain library imposes a significant intellectual burden in understanding its workings, while some hand-written code tailored to the purposes of the project could have much less intellectual burden.

I’ve used a few popular unit testing frameworks, such as cppunit, and found them generally cumbersome. I hate having to write an entire class to write a unit test, with a bunch of boilerplate code for each test. I must admit to not being entirely aware of all the benefits of using a unit test framework. I know that some frameworks give support for mocking and stubbing objects out and so forth, but, well, that all sounds like far more burden than it’s worth.

So, in Frogatto I have implemented a super-simple unit testing framework. If you want to write a unit test, then in any Frogatto source file, just include unit_test.hpp and add some code like this:

UNIT_TEST(simple_arithmetic) {
    int x = 4;
    x += 8;
    EXPECT_EQ(x, 12);
}

That’s all a test takes! Just write UNIT_TEST(name_of_test) and then a code block with your testing code. When you’re ready to see if your test ran as expected, you can use the macros provided, EXPECT_EQ/NE/GT/GE, etc etc to see if values came out as you expected them too.

Then, when you start Frogatto, all tests are run automatically, every time! The program will die with an error message if any tests fail. This forces and guarantees that people will not check in code with broken tests, because if they do, it’ll break the game horribly.

This is what unit tests should be like, in my view, very simple, easy to write, and easy to run. I’ve worked at too many places where some Agile fanatic tells everyone that they should write unit tests to test every single bit of code and are willing to discuss the joys of testing for hours on end with all the nuances of how many different code paths you should test, and then want you to learn some framework that they can understand very easily because they are interested in unit testing, but which everyone else who just wants to write code finds to be a burden to learn.

Sure, unit testing is a good idea, but to be workable it has to have minimum overhead. That is the aim of Frogatto’s framework.

The second part of Frogatto’s framework involves performance testing. If you’re writing a piece of code and you think its performance is going to be important, or if you have a piece of code and you know it’s taking up a significant amount of time, it’s nice to have an isolated test to verify that. So, the unit testing framework provides a benchmarking system.

Let’s see an example of how to add a benchmark. I wanted to benchmark how long it takes to query and convert a WML attribute into an integer, so here’s my benchmark:

BENCHMARK(wml_get_int) {
    wml::node_ptr node(new wml::node("a"));
    node->set_attr("abc", "47");
    BENCHMARK_LOOP {
        wml::get_int(node, "abc");
    }
}

This declares the benchmark with its name, and sets up some test data. Then the code inside BENCHMARK_LOOP is the part we actually want to benchmark. The benchmarking framework will run this part an appropriate number of times. So if the code takes a second to run, it’ll only be run a few times, and that averaged. If it takes a few nanoseconds to run it’ll be run billions of times, and that averaged.

How does a benchmark get run? Unlike unit tests, we don’t want to run benchmarks every time we run Frogatto, because that’d slow down startup and most people would just ignore the results. Instead, you run Frogatto with the –benchmarks argument. Frogatto will run all benchmarks with a report, and exit. You can also provide a comma-separated list to –benchmarks of the names of the benchmarks you want run. e.g. –benchmarks=formula_load to only run the ‘formula_load’ benchmark.

Output of the benchmarks framework looks like this:

BENCH wml_get_int: 1000000 iterations, 341ns/iteration; total, 341000000ns

This shows us that our wml_get_int benchmark ran for a million iterations, and each iteration took 341ns. So, it takes 341ns for us to get an integer from a WML node. Being able to work with and quickly intuit time periods is important in performance analysis. It takes 341ns to extract and parse an integer from a WML node. How significant is this? For one thing, we could find the wml::get_int() function and put a counter in to see how many times it is executed in a typical load of Frogatto. However, my guess is that even in a long game we’d parse less than 100,000 integers. This would make this function take up only around 34ms per run, which really isn’t very much.

On the other hand, 341ns seems like a lot to me to parse an integer. Of course, this code also extracts the integer, which requires a map lookup, and string construction, and so forth. If it turned out to be important to optimize this code, we probably could.

Posted in Uncategorized | 2 Comments »

What every programmer should know about memory management

October 30th, 2009 by Sirp

The way that memory is allocated and used is an oft-misunderstood topic. I’ve never found a satisfactory yet simple reference of everything a programmer should know about it, so here is my feeble attempt to write one.

Modern Operating Systems use a memory management design known as virtual memory. This is a very oft-misunderstood term, even amongst experienced programmers. If you think that virtual memory is “when hard drive is used as memory” then you need to read very closely, because that definition is completely wrong, and it will remain wrong regardless of how certain Operating System vendors choose to label their interfaces. 🙂

Virtual memory is a memory architecture where processes are presented a view of memory as a single flat address space, when in fact the data may be stored in one or more physical devices, or in some cases, not stored at all. Using hard drive in lieu of memory is known as “swapping”, and virtual memory is nice because it enables swapping easily in a transparent manner to a programmer. There are plenty of benefits to virtual memory that have nothing to do with swapping, though.

One of the big benefits of virtual memory is that each process gets its own address space to play with. It is not possible for one process to access another process’s address space, because the OS maintains a table for each process which shows which addresses map to what physical storage for each process. So the address 0x00FABE0F20 will likely refer to one piece of physical memory in one process, and a completely different piece of physical memory in another process.

Memory is divided up into pages. A page is typically 4KB, and rarely less than 4KB, though some systems use (much) larger page sizes. When a process wants memory, it must use a system call to allocate one or more pages. This is typically the mmap() system call on Linux. The Operating System will set up a mapping, allocating an address range in the process’s address space.

The Operating System is responsible for choosing how to physically store the data represented, and will do so in the most efficient way possible. Each page will be stored in one of three possible ways:

(1) unmapped: if the program has not written to the memory region since requesting its allocation, then it is by definition filled with all-zeroes. The Operating System does not have to store it at all, since it knows it’s just filled with zero bytes. Thus the OS will just mark the page as ‘unmapped’ until the program actually writes to it. Thus, on most Operating Systems, when you allocate “memory”, the OS will give you an address range but won’t actually map it to physical storage (yet).

(2) resident: the page corresponds to a page in RAM.

(3) swapped: the page corresponds to a page that has been swapped to disk.

It is quite important to understand the implications of (1). For instance, in C++ it is common to use the std::vector class template as a dynamic array. std::vector over-allocates memory so that it doesn’t have to expand its buffer too often. This may seem expensive, but for a decent size std::vector, you don’t actually pay in terms of physical memory until you start using the memory.

For most typical programs these days, it’s actually common for between 10% and 50% of memory to be in state (1). State (2) is the next most common, and most people hate it when things start getting swapped, so (3) is relatively rare.

Let’s look at what top has to say about Frogatto’s memory usage:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 1427 david     20   0  119m  76m  17m S   22  2.5   0:02.70 frogatto

VIRT refers to the amount of address space that Frogatto is using. i.e. the amount of memory that is in states (1), (2), or (3) above. RES refers to the physical memory that Frogatto is using. i.e. the amount of memory that is in state (2) above. As we can see, Frogatto has allocated 119MB of memory, but only 76MB is resident. Unless we have a good reason to think swapping has occurred, our best guess is that the rest is in state (1) above. So Frogatto has allocated 119MB, but only actually started using 76MB, and so the Operating System has only needed to actually allocated 76MB worth of physical memory for it.

It should be noted that on a 64 bit system, VIRT is not a very constrained resource. The address space is huge, and so VIRT can grow almost without bound. Most OSes will reject allocations that they consider ridiculously above the amount of physical storage space, though Linux at least can be configured to allow any allocation.

On a 32 bit system, each process can generally only allocate 2GB of virtual memory, due to lack of address space. This makes VIRT still a rather constrained resource for many applications — though for desktop applications, 2GB is still considered a very large amount of memory, and most servers are moving toward 64 bit architectures.

Now, what is this “SHR” thing? It is how much memory is “shared”. Another feature that virtual memory facilitates nicely is making it so a virtual address range in use by one process happens to map to some physical memory that a completely different virtual address range in a different process also maps to. Some programs do this explicitly, as a communication mechanism — a very fast communication mechanism for that matter — however the most common reason this is done is by the OS itself as an efficient way of storing shared libraries in memory. It is common for a shared library to be used by many programs that are running at once, and rather than storing a copy of the shared library for each program, the OS will store it just once, and map each process to it.

Since we know that Frogatto doesn’t do any explicit sharing, this tells us that at least 17MB of Frogatto’s memory usage is due just to libraries. Being shared memory, it means that even if Frogatto wasn’t using this memory, another process would be, so Frogatto is actually only adding 59MB, not 76MB, to the system’s usage of RAM.

You can actually break down all the memory usage of a program using a cool little command called pmap. Running pmap on Frogatto gives output that looks like this:

08048000   2132K r-x--  /home/david/frogatto/frogatto
0825d000      4K r----  /home/david/frogatto/frogatto
0825e000      4K rw---  /home/david/frogatto/frogatto
0825f000      8K rw---    [ anon ]
09218000  31400K rw---    [ anon ]
b288f000   2048K rw-s-  /dev/dri/card0
b2a8f000   1024K rw-s-  /dev/dri/card0
b2c90000    772K rw---    [ anon ]
b2d52000   2004K rw---    [ anon ]
b2f48000   3500K rw---    [ anon ]
b32b4000   1936K rw---    [ anon ]
b3499000   2476K rw---    [ anon ]
b3705000   1032K rw---    [ anon ]
b3807000      4K -----    [ anon ]
b3808000   8192K rwx--    [ anon ]
b4008000     12K r-x--  /usr/lib/alsa-lib/libasound_module_rate_speexrate.so
b400b000      4K r----  /usr/lib/alsa-lib/libasound_module_rate_speexrate.so
b400c000      4K rw---  /usr/lib/alsa-lib/libasound_module_rate_speexrate.so
b400d000     64K rw-s-    [ shmid=0x5c8016 ]
b401d000     84K r-x--  /lib/tls/i686/cmov/libnsl-2.9.so
[snip]

This shows us the starting address of the mapping, the size of the mapping — which is always a multiple of 4KB on this system with 4KB pages — the permissions of the memory, and then the underlying device.

On Linux, files may be “memory mapped”. That is, a file loaded into a segment of memory, with modifications of that memory modifying the underlying file. This is very different to simply reading a file into memory; the memory becomes an actual representation of the file that the kernel maintains. We can see that the Frogatto binary itself occupies several megabytes, as do various libraries. The most common use of memory mapped files are to access executables and shared libraries, though it is possible to map any file into memory.

The [ anon ] mappings are requests the application has made for “just plain memory”, not mapped to any file. This is the most common form of memory an application obtains.

For most applications, getting memory in multiples of 4KB isn’t very convenient. Most modern applications tend to allocate very many objects, of varying sizes, most of which are far smaller than 4KB. So, most programming frameworks have an additional memory management layer which will allocate memory from the Operating System, and then carve it up into smaller and different sizes for the application to use. In C, this is usually malloc() and free(), and in Java it is the memory management/garbage collection system provided by the VM.

Because memory is allocated in chunks of at least 4KB and then carved up, it is very difficult for a program to release memory that it has allocated back to the Operating System. A page might be carved up into 40 or 50 or 100 different small allocations, and every one of these would have to be released for it to be possible to release the memory back to the OS. For this reason, most programs stay at or near their high water mark in terms of memory usage.

Most memory allocators tend to satisfy large allocations by calling the OS directly. For instance, it is typical in dlmalloc() — one of the oldest allocators that many modern ones are derived from — to implement malloc() calls of 64KB of more as a direct call to mmap() on Linux, and then call munmap() when free() is called.

This has advantages, though it also has disadvantages. Generally if one accesses memory — or at least writes to it — it must be resident — i.e. in state (2) above. If the memory is actually in state (3), swapped, then it must be copied from disk to RAM. This is called a major page fault, and is terrible for performance. However, even if it’s in state (1), unmapped, the kernel must still spend time allocating it to physical RAM. This is called a minor page fault, which isn’t near so bad as a major page fault — indeed typically several thousand times faster — however nevertheless, if your application regularly allocates and uses large buffers of memory, it might be better to consider caching the buffers than continuously allocate and deallocate, since every new allocation of a buffer will trigger a new round of minor page faults when you start using the memory.

The way in which the kernel allocates memory also has a somewhat unfortunate effect on the behavior when memory exhaustion occurs. The C memory management system was designed so that malloc() returns NULL if the memory couldn’t be allocated. However, unless you try to allocate something truly ridiculous, or run out of address space on a 32 bit machine, malloc() will almost certainly succeed on a modern OS. Instead what will happen is that when you actually start using the memory, the minor page fault will be unable to be satisfied, and the OS will kill your application. That is, if the user didn’t become sufficiently frustrated with all the swapping to kill it first.

Another feature of the way Operating Systems use memory is the filesystem cache. If you’re developing an OS, you know it’ll improve performance to use some memory to cache filesystem accesses, but how much do you use? Simple: use all the memory that applications aren’t using. This is how Linux and most other modern OSes do it. Let us look at the output of the free -m command:

              total       used       free     shared    buffers     cached
Mem:          3030       2948         82          0        222       1332
-/+ buffers/cache:       1392       1637
Swap:         4110        547       3563

At first glance it may appear that this system has 3030MB, is using 2948MB, and only has 82MB available. However, 1332MB are ‘cached’ — that is, being used for filesystem cache. 222MB more is being used in kernel buffers. If applications start using more memory, the OS will generally rather happily allow their requests to be satisfied by shrinking the size of the filesystem cache.

Thus for all intents and purposes, it is recommended to use the -/+ buffers/cache line and say that this machine is using 1392MB, and has 1637MB free. The OS uses the filesystem cache to fill up memory that isn’t otherwise being used, but usually prioritizes application storage over it.

I say “usually”, because Linux and other OSes sometimes decide that certain memory allocated by applications is being used very infrequently, while the filesystem cache is being put to good use, and actually swap out some portion of applications in favor of keeping more space for filesystem cache. Whether this is a good idea is sometimes debatable, but filesystem cache can indeed speed up the performance of many systems.

Speaking of swap, it is important to understand that swapping out generally isn’t very expensive. It can be done in the background. What is expensive and kills performance is only if memory that is swapped out is accessed and needs to be swapped back in. This generates major page faults — accesses to memory which require reading the page that memory resides in in from swap. This is usually especially terrible because if you were, say, traversing a linked list, each node in the list might reside in a different page, meaning you will have repeated page faults, and each time you have to wait for the page to be loaded before you can work out where the next node is.

So we’ve talked about various kinds of faults. Another popularly known fault is a segmentation fault. What is that exactly? Generally it occurs when a program tries to access a memory address that is not in the kernel’s table of mappings for that process, or if it accesses a page that is in the table of mappings, but does so in a way that it doesn’t have permission for. For instance, in the pmap output above, we can see that executables and libraries do not have write permission in their mappings.

Understanding how memory works gives us a better understanding of how buggy C programs will behave. Suppose you malloc() a buffer and then write to that buffer, overrunning it. It is possible that your overrun will end up straying into a page that you don’t have access to, though this is relatively unlikely. More likely is that you will simply overwrite some part of your own program’s memory that you didn’t intend to. However, it is quite likely that you will overwrite the value of a pointer, and the new value you write in the pointer will not correspond to a valid address in your application. Next time that pointer is dereferenced, you will have a segmentation fault.

Hopefully this article gives a good overview of how memory works from an OS and performance perspective. I welcome any comments or suggestions.

Posted in Uncategorized | 1 Comment »