This article is about capturing and encoding microphone audio in ogg format on Chrome. It is based in the work done along a 4Talent internship.

Repository:

https://github.com/astroza/chrome_ogg_encoder

The problem

HTML5 new API UserMedia & MediaStream provides access to audio and video input devices. In the audio case, you get raw PCM data [1] but is too big to be sent by network.

Solution

Extend the browser functionalities throught JavaScript to compress the captured microphone audio.

Codec libraries

There are two popular Open Source project on the wild to achieve this, Vorbis and FLAC. I chose Vorbis because it is a lossy compression codec that lets me define the output quality and that makes me able to shrink even more if it is needed. So, I had to get:

  • libogg: Handles ogg files. Ogg is a multimedia container format and it is often used with Vorbis
  • libvorbis: Vorbis encoder/decoder. Transform a sequence of analog/audio samples in a shorter thing.

Those libraries are written in C, so how can I bring them to the web?

Emscripten

Emscripten is an LLVM IR to JavaScript compiler. It emits a JavaScript subset code specified in asm.js [4]. Working hand in hand with llvm-clang is able to compile C/C++ to JavaScript. I recommend you install it following [5].

The wrapper

(Source ogg_encoder.c)

Export all the symbols to JS is not a good idea. Instead, a few primitives looks like a more elegant way to do this. I'm going to define three C functions as a wrapper to OGG and Vorbis:

  • ogg_encoder_init(sample_rate, vbr_quality)
  • ogg_encoder_write(encoder, buffer_left, buffer_right, samples_count)
  • ogg_encoder_finish(encoder)

Tipically, the flow is: (pseudo code)

encoder = ogg_encoder_init(96000, 0.4) # 96khz, VBR quality to 40%
WHILE got_stereo_PCM_data:
    ogg_encoder_write(encoder, PCM_samples_left, PCM_samples_right, samples_count)
end
ogg_encoder_finish(encoder)

write_data() is a JavaScript function that append the written data in an U8IntArray. The array is growing as is needed. The upper primitives call to write_data() when they have data to write. It was designed so to escape from Emscripten memory limitations. In early implementations I had to fight against small recording length imposed by Emscripten HEAP memory size.

NOTE: yes, it's possible to call JavaScript functions from C code, How to do that? [3]

Web worker

(Source ogg_encoder_worker.js)

The audio compression is a tough job and it should be taken away from the main thread. Event handlers and animations can be exhausted by the compression work and it can impact the fluid user experience. Communication with web workers is through messages (JS Objects) and I defined three kind of them:

  • {cmd: 'init', sampleRate: num}: Alloc the needed memory
  • {cmd: 'write', leftData: buffer, rightData: buffer, samplesCount: num}: Feed the encoder with PCM data
  • {cmd: 'finish'}: Finish the ogg buffer and send it back to the main thread

Take a look on this example

Build

In this git repository you can find a compiled and ready to use ogg_encoder (ogg_encoder.js & ogg_encoder.js.mem) as well a functional example that take advantage from ogg_encoder_worker.js.

Also, if you prefer, you could build all from scratch:

it requires emscripten SDK [5]

git clone https://github.com/astroza/chrome_ogg_encoder.git
cd chrome_ogg_encoder
make

Do you want to test chrome ogg encoder right now?
click here

References