WebVTT release 0.4

The Fall 2012 semester is ending, which means OSD600 is coming to a close. For release 0.4, I’ve contributed a number of unit tests, specifically for testing the general file structure of .vtt files to be parsed. It does seem that a few of these tests might be redundant, such as simply testing for the WEBVTT signature, but these provide a good initial base to work with if major changes or refactoring occur down the road and stuff needs to be built from the ground up for some reason.

These tests have landed into the seneca repo, although not all of them are enabled for one reason or another. Currently, there are segmentation faults when trying to create the parser, and this issue is not isolated to my unit tests; other unit tests had to be disabled in order to prevent this error. I think this was the first time where we had a problem that blocked other people’s work, and it couldn’t have come at a worse time for students when there are exams to study for. This situation may crop up again, especially as the code base eventually becomes fully functional, but as the project grows, there will (hopefully) be more people who have taken interest and are willing to work on it when it reaches that point.

Breaking it down with Markdown

After spending some time looking for a documentation syntax and an automatic API documentation generation tool, it looks like Markdown is a solid choice for our needs. Markdown has a really simple syntax, and can be easily converted into HTML using automated tools. The original Markdown application is actually a Perl script that uses the Markdown syntax and generates HTML files. It also uses the BSD license for free use. One can also create their own tools to generate HTML pages; mdoc is a node packaged module that generates HTML from Markdown with a basic table of contents and search function. This module is released under the MIT license. I’ve found mdoc to be suitable to our needs with the open license and simple format.

Because the original Markdown hasn’t been updated since 2004, people have created extensions to add more functionality, for example, Markdown Extra and MultiMarkdown. Both of these add features such as tables, footnotes, and definition lists. MultiMarkdown in particular adds different export formats. There are also Markdown implementations in multiple languages, including PHP and JavaScript. Even GitHub has its own “flavor” of Markdown. 2 notable features I’ve found on GitHub Flavored Markdown are the ability to use syntax highlighting for multiple languages and the removal of emphasis when using multiple underscores in words, handy for things such as underscores in function names.

Let’s look at some example Markdown. Let’s say we want to document functions for an application:

# MyProgram #
A bunch of functions I use in my lame program.

## testMe (int num) : int ##
Calculates a ridiculously complex equation using a specified integer, `num`.
Return value is based on the alignment of the planets.
Returns an integer.

int n = testMe(5); //generates 12...probably

Here, the # character is used for headers. One # is usually translated into <h1>. More # characters can be used, up to 6 to represent <h6>. Backticks are used for code spans and code blocks. Markdown automatically surrounds text in <p> tags, so using Enter/Return won’t create a line break. To insert manual line breaks (<br /> tags), simply end a line with 2 or more spaces. The Markdown text above would be converted into the following HTML:

<p>A bunch of functions I use in my lame program.</p>

<h2>testMe (int num) : int<h2>

<p>Calculates a ridiculously complex equation using a specified integer, <code>num<code>.<br />
Return value is based on the alignment of the planets.<br />
Returns an integer.</p>
<p><code>int n = testMe(5); //generates 12...probably</code></p>

You can use the Markdown Dingus to test your Markdown syntax to see the HTML that will be generated, and a quick reference guide for the Markdown syntax is included.

At the moment, I’ve been using Markdown and mdoc to generate documentation for the WebVTT parser library. My Markdown documentation can be found in my docs branch at https://github.com/mafidchao/webvtt/tree/doc/doc and HTML generation with mdoc can be found at http://mafidchao.github.com/webvtt . A style guide will definitely have to be established in the future as the project matures.

The need for documentation

As we continue our work on creating unit tests based on the WebVTT specification, it is becoming increasingly apparent that documentation of the API will help with creating these tests, especially with those people who were not involved in the development of the parser from the very beginning. Taking note of Kyle Barnhart’s initial work with adding WebVTT information to the Mozilla Developer Network (https://developer.mozilla.org/en-US/docs/HTML/WebVTT), it’s probably a good time to start API documentation.

Eventually, the API would have to be documented in order for other developers to make use of the parser, but I think now would be a beneficial time to start. The downside of doing this now is that the parser is still under development, so any changes made to the application will have to be updated in the documentation, possibly causing inefficiencies and unnecessary work. In addition, there was no functional design document or anything of the sorts created beforehand.

As I have to develop unit tests, I obviously need to understand how to use the parser, but I don’t have to understand much of the parser’s inner workings in order to write these tests. But in order to document the API, I’ll need a little more than working knowledge. Luckily, I don’t need to know a lot as I’ll have the parser developers to aid me in this technical writing exercise.

With no functional design document or anything of the sorts to reference, I’ll have to rely on the developers of the parser to guide me. They’ll be the main source of information, although I may not have all the right questions in mind in order to create the API docs. I think the best route at this point in time is to create a first draft and then have it reviewed and revised.

Regarding the actual writing process, I think that using the GitHub flavour of Markdown with the README.md file will be sufficient for now, as our GitHub repo is pretty much the central point of our operations. I haven’t decided on the structure of how each function will be documented; I will have to use other API documentation as reference.

Unit Testing: Moving Forward and Cutting Losses

Our 0.3 release encountered some hardships and we were unable to include everything we wanted, specifically the unit test suite using node-ffi. I suppose missing milestones does happen, but we’ll have to make sure that it’s a minor setback that doesn’t snowball into a major problem. But it is an eye opener as to the amount of time we have to complete 0.4, which encompasses a fully functioning parser, a unit test suite, and partial implementation into the <track> element in Mozilla Firefox. Communication is even more key as we run into problems, so we’ll have to rely on each other and know how everything is progressing on each side.

With this, we’ll have to continue to drive some sort of testing harness for the WebVTT parser. Previously, we focused on using node-ffi but ran into problems, so we’ve decided that some will attempt to continue node-ffi work while others branch off into using Ctypes in Python and Google Test. I’ve selected Google Test to get working with our parser.

Because Google Test is a C++ framework and our parser is written in C, there are a few issues that could cause complications down the road. With our good friend Google, I managed to find a blog post that goes into these issues in detail: http://meekrosoft.wordpress.com/2009/11/09/unit-testing-c-code-with-the-googletest-framework/. I don’t know if these issues would be enough to stray away from using Google Test, but I don’t actually understand enough of our parser to even have an opinion, let alone make a decision. At any rate, I’m moving in parallel with the other test frameworks.

A Google Test library file must first be created in order to use its functions with our parser code, so that makes 2 libraries that our main application will use at the moment: the WebVTT parser functions that we’re putting into a library, and Google Test. I built the Google Test library using the included MSVC solution and copied the generated .lib file into the library directory that our MSVC solution will generate the libwebvtt.lib file into. I then edited our WebVTT solution to use the Google Test gtestd.lib (parsevtt project Properties -> Configuration Properties -> Linker -> Input -> Additional Dependencies).

I then created a simple test file that tested the success of creating a webvtt_string object and parsed a sample .vtt file. It looks like everything worked out all right, and Google Test has access to our WebVTT C parser functions:
WebVTT parser and Google Test

If Google Test is to be used in our build, we will have to figure out a way to integrate it into autotools, as well as determine whether we want a precompiled library sitting in our repo, or to add the Google Test code to our repo and build it in tandem with our parser.

R0.3 current Travis CI progress and issues

Creating an automated build system is becoming more and more important as the WebVTT parser continues to grow in size, as well as outside modules that we will be using to test our code. For our unit tests, we’ve decided to use node-ffi, which can call functions from dynamically linked libraries. In order to use this, we will create a shared library file of the WebVTT functions that have been written so far.

Seeing the need for automation, caitp (Caitlyn Potter) has already gone ahead and set up autoconf and libtool for our repository. As far as my understanding goes, autoconf automatically generates a configure script in order to aid in portability when building with different systems. Libtool helps in generating portable libraries. In our case, we will need to create a shared library for use with node-ffi.

To begin, I would have to install the modules of node-ffi, ref, and ref-struct. It’s a pretty simple one-liner to add to the .travis.yml file: npm install ref ffi ref-struct. And with the addition of autoconf and libtool, building the C parser and libraries is now simplified to running the standard ./configure && make commands. Once everything is built, the next step would be to run our unit tests with node-ffi.

At the moment, we only have code that defines the WebVTT structs we are using and a sample unit test to see if the shared library works. However, this is enough to test and see whether the soon-to-be-finished test suite can actually run on the Travis CI box. I’ve also included the example factorial.c and factorial.js files to ensure that at least node-ffi works if the libwebvtt.js file doesn’t. And sure enough, I can get the factorial.js example to work, but libwebvtt.js fails to find the libwebvtt.so file after libtool has completed successfully. The error generated mentions that it couldn’t find the .so file, so my guess was that I simply had to copy the .so file from wherever it was generated into the libwebvtt.js working directory. Unfortunately, it didn’t remedy the situation, and now I have a funny error:

721 $ cd ..
722 $ cd ./test/unit
723 $ gcc -shared -fpic factorial.c -o libfactorial.so
724 $ ls -lRa
725 .:
726 total 336
727 drwxrwxr-x 2 travis travis  4096 Nov 10 22:25 .
728 drwxrwxr-x 4 travis travis  4096 Nov 10 22:24 ..
729 -rw-rw-r-- 1 travis travis   151 Nov 10 22:24 factorial.c
730 -rw-rw-r-- 1 travis travis   354 Nov 10 22:24 factorial.js
731 -rwxrwxr-x 1 travis travis  6662 Nov 10 22:25 libfactorial.so
732 -rw-rw-r-- 1 travis travis 78272 Nov 10 22:25 libwebvtt.a
733 -rwxrwxr-x 1 travis travis 43880 Nov 10 22:24 libwebvtt.dylib
734 -rw-rw-r-- 1 travis travis  2714 Nov 10 22:24 libwebvtt.js
735 -rwxrwxr-x 1 travis travis 57654 Nov 10 22:25 libwebvtt.so
736 -rwxrwxr-x 1 travis travis 57654 Nov 10 22:25 libwebvtt.so.0
737 -rwxrwxr-x 1 travis travis 57654 Nov 10 22:25 libwebvtt.so.0.0.0
738 -rw-rw-r-- 1 travis travis   365 Nov 10 22:24 package.json
739 $ node factorial.js 8
740 Your output: 40320
741 $ node libwebvtt.js
743 /home/travis/builds/mafidchao/webvtt/node_modules/ffi/lib/dynamic_library.js:74
744     throw new Error('Dynamic Linking Error: ' + err)
745           ^
746 Error: Dynamic Linking Error: libwebvtt.so: cannot open shared object file: No such file or directory
747     at new DynamicLibrary (/home/travis/builds/mafidchao/webvtt/node_modules/ffi/lib/dynamic_library.js:74:11)
748     at Object.Library (/home/travis/builds/mafidchao/webvtt/node_modules/ffi/lib/library.js:43:12)
749     at Object. (/home/travis/builds/mafidchao/webvtt/test/unit/libwebvtt.js:90:21)
750     at Module._compile (module.js:446:26)
751     at Object..js (module.js:464:10)
752     at Module.load (module.js:353:31)
753     at Function._load (module.js:311:12)
754     at Array.0 (module.js:484:10)
755     at EventEmitter._tickCallback (node.js:190:38)
758 after_script: 'node libwebvtt.js' returned false.
759 Done. Build script exited with: 1

Interesting that I get a “no such file or directory” error for the libwebvtt.so file when libwebvtt.so is RIGHT THERE! And I don’t think it’s supposed to be in any other directory either, since the factorial.so was right there as well and the factorial.js code worked successfully. My guess is that I need to actually specify the location of the libwebvtt.so file in the libwebvtt.js file instead of copying it to the current directory; perhaps it depends on being in that location. I will have to ask someone with a Linux machine to help me along with this process. I can’t test it on my own machine, as I use Windows and node-ffi expects a .dll file.

That brings me to another issue: in order to be compatible with the OS X and Windows environments, we will need different types of dynamic link libraries. OS X uses .dylib, and Windows uses .dll. I will need to do more research on generating these file types, but some preliminary reading suggests that libtool can create these files with a bit of work.

On the way to Release 0.3

With Release 0.3 set for November 15, there are a number of things that need to come together. One major concern for 0.3 is to integrate all of the WebVTT C parsers that a number of people were developing separately. Even with the mentality of planning to throw one away, thankfully the development of the C parsers went different ways. But even though most bases were covered, there’s still a lot of work to do before we have a fully functioning parser to use.

And while the parser moves to 100% functionality, unit tests must be ready to test the parser for proper usage and bugs. At the moment, we created tests that validate the structure of a .vtt file, but now we have to check that the parser produces expected output when given certain files/data to parse. We discussed the number of ways that we could approach the development of unit testing. The first idea we played with was to see how the WebKit developers achieved this goal; apparently, what they did was output a text file from the parser and compare it to the expected output. It’s not exactly an “innovative” or complicated method to implement, as our teacher mentioned, and goes to show that even though we aren’t professionals ourselves, we could have accomplished the same thing.

Leaving that idea behind for now, we discussed the rest of our options: a C language testing framework called Check, Javascript tests, Python ctypes, and JUnit tests. Because our goal is to implement a parser for the Mozilla Firefox browser, using Javascript seemed to make the most sense for developing unit tests. Specifically, we’ll be using QUnit to create our unit test suite, along with node-ffi, which is a Node.js addon that enables the use of calling library functions.

After some preliminary research, using QUnit and node-ffi on Travis CI appears to be possible. It will be my task to ensure that building the parser and running the unit tests can be automated with Travis CI. It looks like I will also have to drive the unit tests a little, as there are a number of tests to write, and it will be much easier for everyone if the tests can be automated every time there is a change with the parser.

More Travis CI testing – installing modules and running shell scripts

With most of the WebVTT Release 0.2 work resting on the creation of a C parser, I’ve been doing some more testing with Travis CI to prepare for an automatic build system once everything is complete. In the meantime, we have a toy C parser that Ralph Giles (rillian) created and a bunch of .vtt sample files that can be validated using the webvtt module and a python script. As a start, I decided to see if I could get the python script to validate the sample files on the Travis CI machine.

Because the webvtt module is not installed, I have to run npm install webvtt before running the python validation script. Travis CI does its magic and runs it for me, and a couple of lines after, the make check-js command to execute the python validation script executes swimmingly. The script displays the validation results of the test files, and the Travis CI returns a “Build script exited with: 0″ to signify success.

This seems promising, but the python script had reported failing tests. I wondered if the return code for the script was reversed, so I quickly fixed the failing tests and re-pushed to my GitHub repo. The script reports 0 failed tests, but Travis CI still returns a successful 0. It looks like the python script returns the same thing regardless of whether everything passes or something failed. Guess that’ll have to be fixed so Travis CI can report a failure.

The next thing I could play with was Fuzz testing; Vince Lee (Lynart) had did some work with zzuf and wrote a simple shell script. In his GitHub repo, he had even added a directory that contained the zzuf package to build. Without hesitation, I attempted to get Travis CI to build zzuf from here…but then I realized, why not just install it instead? Sure enough, Travis CI’s docs mentioned the use of sudo apt-get, so I added sudo apt-get update and sudo apt-get install zzuf to my .travis.yml file. I was now on my way to running that shell script to fuzz the test files!

I had to modify the shell script a little so that it would accept command line arguments instead of waiting for input from the user. But a perplexing problem occurred, and I ran into a permission denied message: -bash: ./fuzzv2.sh: Permission denied. That’s strange, I can’t run bash scripts? I immediately ran to Google and funnily enough, discovered a Japanese blog post where the developer ran a chmod 777 statement to change the permissions on his script before executing it. Pretty strange that one has to do that, but I followed along and got the fuzz script working and generated fuzzed test files. I guess the next step is to throw these fuzzed files into our soon-to-be-developed C parsers and see what breaks.

Another little neat thing I found was that you can have GitHub report the Travis CI build status of your repo by embedding an image in the README file. I couldn’t get this to work with the standard README format with the command documented in Travis CI’s guide, but a README.md markdown file worked just fine with their example.

Once those C parsers are completed, we can combine the sample test files and fuzzed files to automate the entire build and test process. So every time there’s a change to anything, whether it’s the parser or a test file, there’s no need to manually build the parser, create fuzzed files of the test files, and then run the fuzzed and test files through the parser. And if anything has failed or something is broken, we’ll know as soon as possible.

My current .travis.yml file looks like this (with some random pwd/which/cat statements thrown in to see what happens):

language: c
  - gcc
  - clang
# Change this to your needs
  - npm install webvtt
  - pwd
  - sudo apt-get update
  - sudo apt-get install zzuf
  - which zzuf
  - which bash
  - which sh
script: make check-js
  - cd ./fuzz
  - chmod -R 777 ./fuzzv2.sh
  - ./fuzzv2.sh 0.1
  - cat ./fuzzedFiles/good.tc_1009_missing_line_between_cues.vtt.fuzzed.vtt

What it should look like in the future is the installation of webvtt and zzuf, building the C parser, creating fuzzed files, and then running the test and fuzzed files against the built parser.

EDIT: you can see my Travis CI page here: https://travis-ci.org/#!/mafidchao/webvtt. My GitHub repo with the most recent branch can be found here: https://github.com/mafidchao/webvtt/tree/tci-lynart_fuzzer.

Initial usage of Travis CI

I’ve read a little more documentation of using Travis CI and all the wonderful things it can do. Inevitably though, I’ll have to get some actual practice with it. So I decided to create a branch of whatever was in my latest Seneca branch and began the process of getting it onto Travis CI. The actual process of setting it up is actually rather simple: you sign in with your GitHub account, select the repository/repositories to hook into Travis CI, add a .travis.yml configuration file, and then push to your repo. As soon as you complete your first push, you’ll get your repo integrated with Travis CI. My repository can be found here at https://travis-ci.org/#!/mafidchao/webvtt .

Travis CI - successful build

All of the magic lies in the .travis.yml file. This file manages how everything is built, and you can specify a number of options, including environment variables, the compilers to test against, and any packages that may be required for installation in order to build. More commands can also be executed before and after the build command and any package installs.

The actual build command doesn’t have to be the GNU make command. In fact, any executable can be used as the build command, which could run our current python script to validate our WebVTT test files. As long as the executable returns a 0 on successful execution, it will count as a successful build to Travis CI; any other return value is considered a build failure.

My next step will be to work with the people developing the build system so that everything runs as desired. We should be able to commit and have Travis CI automatically build the commit and report if the build is valid. Speaking of reporting, Travis CI can also be set up to notify people of the results. Email or even IRC notifications can be sent after a build finishes, and the .travis.yml file can be modified to determine who is notified and the events the person/people are notified for.

A first look at CI – continuous integration

In OSD600, we’ve completed Release 0.1, consisting of creating a test suite for the upcoming WebVTT parser. With all of our test files landed in the main repository, the next release of 0.2 is, unsurprisingly, to create the actual WebVTT parser in C. At the moment, there are a few parsers available, but they’re either not perfect or not portable. The JavaScript parser we tested fails with a number of tests that should pass, perhaps due to being significantly outdated from the current WebVTT specification. The parser written using the WebKit rendering engine only functions for browsers that use WebKit, specifically Google Chrome and Safari. Our main aim is to create a C parser that is functionally complete, adheres to the specification, and is portable. However, there are numerous other tasks to be completed for Release 0.2, including fuzz tests and test suite maintenance.

The task I’m working on is to get a sort of quality control applied to our parser using something called CI – continuous integration. CI will help us automate the build and test process for the parser, which will save time and effort. Once someone commits code to a server, the code will trigger an immediate build to be completed after the commit.

The normal process for a developer would be to check out a piece of code from the main repository. Once it’s time to merge and push it in, the developer must first pull the latest change which would have inevitably occurred during development time, merge and test the build to see if anything is broken, and then finally push his or her code in with the latest change. The more time that passes between the developer’s initial work and the latest update, the more potential there is for an inefficient integration due to more and more changes being done on the code along the way. The integration may even become so bad that the developer may be wasting time trying to add his or her code compared to just pulling the latest code and starting fresh. CI aims to remedy this problem with automated integration, including automatic integration tests run on the server.

Wikipedia defines the main principles of continuous integration: http://en.wikipedia.org/wiki/Continuous_integration#Principles_of_continuous_integration

For our project, we’ll be using Travis CI, an open-source distributed build system that was designed to be used with GitHub projects. Adding our project to use Travis CI will automate building and testing our code after every commit, and it’s a good time to do so as we’ve just created our test suite and we are beginning to develop the parser that will make use of this test suite. The Travis CI website will allow us to view the process of current builds, the build history, and the status of other branches.

Baby’s first code review

We’re currently wrapping up Release 0.1 that consists of our initial test suite. Most/all of our test files have been landed into the main Seneca repository. However, before each of our test files were accepted into the main repo, we had to issue pull requests to the main repo holder (in this case, our instructor David Humphrey) and have our code reviewed. When you’re doing the introductory programming courses, group work is usually a no-no as each individual needs to build up a certain amount of skill. But once you get into large scale projects where group work is a necessity, I guess it’s imperative that no one makes a mistake or bogs down the main program with inefficient code.

GitHub provides a lot of features to aid the code review process. Comments can be added to a pull request, and you can even comment on a specific line of a file. The pull request is also updated with commits of the repository to be pulled, which allows easy discussion of the code and a history of how the code was influenced. Not only that, but code reviews look like a great way to discuss ideas with others and possibly create new approaches to major problems. For example, there was some discussion on a few pull requests regarding a standardized format for comments/metadata. We had agreed to include at least 4 key points of information, but I believe it was up to the test creator to determine how those pieces of information were worded.

Although not documented in the pull request discussion, a main issue that was brought up in my test files was the existence of carriage return CR and CRLF newline characters. The WebVTT specification allows CR, LF, and CRLF line endings in .vtt files. However, with our Git repositories, our line endings would be based on the test creator’s machine and configuration. For our purposes, we were to use Unix-style LF line feed characters; Windows users would have to pull with CRLF and make sure to push to their repository with LF. This blog post details this issue more in-depth: http://timclem.wordpress.com/2012/03/01/mind-the-end-of-your-line/

But there are other issues to address with this newline thing. Because the WebVTT specification allows 3 new line character types, we would have to test for this. I had decided to create files that test for the CR, LF, and CRLF line endings (referred to as line terminators in the spec). But it seemed that I was alone in doing this, so it seemed to be on me to determine a way to preserve these line endings and prevent them from being altered from automatic line ending normalization from commits. We resolved this by creating a .gitattributes file that would prevent line ending normalization if the test ended in a certain name:

./test/spec/good/*cr(lf)?.test -text
./test/spec/bad/*cr(lf)?.test -text
./test/spec/known-good/*cr(lf)?.test -text
./test/spec/known-bad/*cr(lf)?.test -text

Well, that would have to do. Any configurations from machine to machine will be ignored, and the files that contain CR/CRLF characters will be preserved.

But there seems to be another issue arching over the entire thing: is it worth testing these line endings? I would say that if it’s in the spec and the spec is to be implemented, it should be tested. My line of thinking doesn’t seem to be shared by anyone else though, as I seem to have been the only one bothering with them. As I mentioned before, a code review seems to be a great way to catch these issues. This particular one hasn’t been closed just yet, though. We’re going to continue with attempting to test CR/CRLF and hope we don’t run into complications…