?

Log in

No account? Create an account

Moving on!

Been forever since I posted here, so I doubt anyone will notice. But I'm moving off LJ as well: too many invasive ads, too many alternatives. Find me on http://lertulo.blogspot.com from now on.

Investing

Okay, I'm now officially betting on myself finishing this one. I've invested a whopping $50 registering a domain name for my new game; you can drop by and see a screenshot of the Windows client's menu, since that's about the only thing that's working so far. But it's a start--and now if I don't finish, I'm out the $50 for no reason.

www.praetoronline.com


Incentive, here I come!

After Scripting

Six weeks ago I said I wouldn't post again until I'd finished off the scripting engine. Did that: it's wrapped up, streaming and all. A fun sub-project complete with a compact command-line tool for running scripts and a nice robust runtime interface for linking the interpreter into other applications.

For the curious, the interpreter is downloadable here, and I've put the main runtime interface headers behind an lj-cut just to give you the flavor of the thing.

To be truthful, I finished all that over a week ago. Since then I've been working on the next step: a map editor for my next game, the game that will also use that scripting engine. I'm using the working title Praetor, so you'll probably catch me referring to that. I'll post more about that thing later, but the quick summary is that it's a collectible card game played out using armies deployed on a hexagonal playing board. Cells on the playing board have various terrains, and as you progress through the game you'll face opponents on different boards--so a map editor to lay out that terrain was essential. Here's a sample of what it lets you create:



The editor has a small amount left to do, but it's not a big project and is almost wrapped up already. Importantly, a lot of its code is content that will be shared with the game itself: one third is raw game-engine code, one third is common Banshee-based map-rendering code, and one third is specific to the editor itself. Should help me get the game itself running a little faster than if I had to rewrite all that from scratch.

Oh, and here's that header, for the truly curious.Collapse )

Interpreter Progress

Things are going well on the scripting front: the parser has been enriched to handle array definitions properly, the compiler and generate about a dozen different bytecodes, and the interpreter is actually implemented fully enough to run scripts using those codes.

The result is that you can actually run simple scripts now: global variables and local variables all work, mathematical expressions are properly evaluated, and even things like += and -- are compiled and interpreted properly. Yay!

There's still a lot to go: in particular classes, methods, namespaces and array object references are still pending. Oh, and things that involve jumping around at runtime: if/else, for loops, trinary operator, logical-or/-and etc. But things are looking good.

For the die-hard curious out there, I'm posting my current bytecode descriptionCollapse ) I'm filling in the docs as I go, so this describes accurately what the script does today--but it lacks several bytecodes that will be necessary to finish this thing off.

Rough Interpreter Design

Been doing too much posting and not enough coding. I'm off to make some more progress on the compiler, which means I need to lay out the bytecode design, which in turn means I needed to summarize how the interpreter was going to run. So here are the notes on the interpreter; back when it's starting to work.

Interpreter NotesCollapse )

Recursive Descent Parsing

If you look over the formal grammar, you'll see that it's defined as a hierarchy. For example, type A is defined as type B followed by a semicolon or an optional series of type C followed by a type D:
   type_a ::=   type_b ";"   |   [type_c]* type_d

A simplistic parser that recognizes such a definition would look roughly like this:
   bool parse_type_a (sourcedata &in)
   {
      pos = in.tellPosition();

      // See if we have a type_b followed by a semicolon
      if (parse_type_b (in)) {
         if (parse_token (in, SEMICOLON)) {
            return true;
         }
         in.setPosition (pos);  // didn't pan out; rewind try the other form
      }

      // Didn't have that; see if we have a series of type_c's followed by a type_d
      while (parse_type_c (in)) {
         ;
      }
      if (parse_type_d (in)) {
         return true;
      }

      in.setPosition (pos);
      return false;
   }

Notice that, on failure, we didn't scream and complain: if parse_type_b doesn't see a type_b syntax coming up next on the stream, then it's not a big deal. All it does is return false, indicating that we didn't see a type_b coming up--and importantly, we didn't advance the stream at all. (Notice that we might've successfully pulled a type_c or two off the stream, then failed to find a type_d following on. Gotta put those type_c's back.)

With an adequately complex grammar (like the one we're using for parsing C-like expressions), you can imagine how quickly this turns into a deep recursive stack. In practice, this is exactly how the recursive descent parser works: having defined the grammar very precisely, we write methods that exactly parse each basic type, and allow them to recurse pretty deeply. It's quite simple to implement, but can be unpleasant to debug; helps to add a tracing facility to show where you are in the parsing stack.

Starting the Compiler

Once you've nailed down the grammar--and yes, I made several changes since my last post--it's time to start the compiler itself. The compiler takes two stages: parsing and generation.

In the parsing stage, we'll just write enough code to properly walk the script all the way through--doing nothing at all, just scanning through. By the time the parsing stage is done, the compiler will be able to detect and report all manner of syntax errors while silently accepting all properly-formatted scripts. The compiler will not, however, do anything else useful at all. This is the last stage we can really accomplish without giving some serious thought to how the interpreter is going to work.

The generation stage will be much harder: that's where we insert code into the existing parser to make the compiler actually generate some kind of machine-usable bytecode output. And that's when things get really fun.

I've just about finished my parser; it handles everything except flow control (for/do/while/if/else etc) and array initialization. When I've got that stabilized a little bit, I'll post in more detail about how that parser works, then move on to the interpreter and bytecode generator. Good geek stuff coming up!

Defining the Grammar

Having designed the scripting-language-to-be, and having already broken an inbound script into tokens, the next step is to compile the thing. To do that, we'll need a formal grammar for the language: something more precise than the human-readable examples whipped up so far.

The traditional way to lay this out is through BNF format, and to make use of YACC ("yet another compiler compiler"). Again I'm going to skip over that thing and just roll the code myself, but writing out some BNF-like form for the language is essential--especially when you plan to use a recursive descent compiler, since the methods you write will almost exactly follow the BNF assignment statements.

A first cut at the grammar for Dualscript is below. I've smoothed out a few things in the language since I posted earlier: added a keyword here or removed one there, eliminated class-scope methods and data variables, and so forth. The result is a pretty simplistic scripting language that has a certain degree of agreeable symmetry. There are probably some significant errors below--in particular I've omitted the array definition syntax and I'm not happy with postfix_expr through postfix_operator, so I may need to rework those. But it's time for bed now, so this is going to have to suffice.


BNF hides behind cutCollapse )

Tokenizing

A short article for an easy step. The first part of compiling or interpreting a script is breaking the script's text up into tokens. If you're using lex (which you probably should do), that's pretty easy--but since "easy" is no fun, I'll be doing it the old-fashioned hand-coded way.

A clever tokenizer won't run too far ahead of the actual compiler: that's called a streaming tokenizer, and it's marginally more complex than a tokenize-everything-and-remember-it-all-at-once tokenizer. I'm using the latter, since I've used up all my "easy is no fun"-ness in the first paragraph.

The basic tokenization loop looks more or less like this:

   as long as there's content left in the input script {

      skip whitespace (including EOLs)
      if we hit "//", skip to the end of the line and restart this loop
      if we hit "/*", keep reading until we hit "*/" then restart this loop
      if we ran into the end of the script, quit now

      if the next char is a digit, look for number sequences
         don't forget to look for hex and octal radixes ("0x5E13", "0777")
         don't forget to look for decimals and exponents ("15.37", "27e+5")
         remember to look for special cases ("0", "0.3" which look octal-ish)

      see if the next 3 characters match a 3-character token (like ">>=");
         if so, record that token into our output and restart this loop

      likewise, see if the next 2 characters match a 2-character token ("+=", "<<")

      likewise, see if the next character matches a 1-character token (":", ";" etc)
 
      if the next character is an apostrophe, crack character sequences like 'x'
         remember to handle encodings like '\t' for tab, '\n' for newline etc
         and of course '\x7f', '\127', '\035' should be supported too

      if the next character is a quotation mark, try to pull a whole string
         this is pretty easy--just skip the ", then keep reading until hit another one
         again, look for \ prefixes, and don't be fooled by \"

      okay, the next word must be plain text--either a keyword or something like a variable.
      scan forward until we run out of legal characters for either, and accumulate the text.
      then match against known keywords ("for", "return" etc)
   }


And that's it. No magic involved--just some simple text cracking. The result is that we can stop worrying about the text file that the user supplied; instead, we have a much more programmatically accessible array of tokens. The compiler will start pawing through those tokens to get its work done--in the next post.

Compiling and Interpreting Scripts

Is this Game Programming 101 or not? Haven't done anything but whine about cell phones and talk about TKD for a year now. Time to get on with the programming!

  "In the beginning, the Matrix was designed as a utopia for humans, but some say we lack the programming code to create a perfect world." -Agent Smith, Matrix  


Yes, it's time to design a scripting language, complete with an interpreter. I'm doing this as part of another project--will get to that later--but it's a fun little self-contained project all on its own.

The first step in the project is to be sure you really need it. Adding a scripting language to a system can make it extremely flexible and powerful, but even the blunt approach I'm going to cover here is a good amount of work. If you can get by without it, by all means use a less general-purpose solution.

Okay, you passed that hurdle: you want a script after all--and by the same process, you've decided that downloading and incorporating a public Javascript engine or calling out to a Perl interpreter isn't doing it for you either. You need something new--some tricky new syntactical feature, integration with your home-rolled application or something you can sell without licensing problems. So a new scripting language and interpreter it will be.

Next step is to figure out what kind of syntax your language will provide. I've just finished a write-up of the major features of my next project, Dualscript. And I'll even put it behind a cut, because it's about 500 lines. Next up: posts about the bytecode, tokenizing, the recursive-descent parser and putting together a stack-based interpreter.

Dualscript Language SummaryCollapse )