Home

Advertisement

Customize
jmspeex
07 May 2009 @ 08:16 pm
After trying to publish the technical details of CELT, things are finally finally working. First the journal paper was accepted and now the paper describing the low-complexity version of CELT was accepted for the EUSIPCO 2009 conference. That one is based on a more recent version of CELT (0.5.1) and has comparisons with the ULD codec, which is pretty much the only high-quality codec that supports delays as small as CELT does (don't worry, CELT still comes up on top for quality!). Here's the paper details:

J.-M. Valin, T. B. Terriberry, G. Maxwell, A Full-Bandwidth Audio Codec with Low Complexity and Very Low Delay, Accepted for EUSIPCO 2009.
 
 
jmspeex
13 April 2009 @ 11:03 pm
At last, the manuscript on CELT that we first submitted in May 2008 has just been accepted for publication in IEEE Trans. on Audio, Speech and Language Processing. As is almost always the case with journal papers, it was a painful process including a rejection, a "resubmit after major corrections" and finally an "accept as is". The good news is that there's will be peer-reviewed description of CELT -- at least a description of how it was at version 0.3.2. There's been many changes since 0.3.2 (with more to come), but at least the core ideas described in the paper haven't changed. So if you ever need a peer-reviewed citation for CELT (e.g. when writing a paper yourself), there's finally something better than the website to list:

J.-M. Valin, T. B. Terriberry, C. Montgomery, G. Maxwell, A High-Quality Speech and Audio Codec With Less Than 10 ms delay, To appear in IEEE Transactions on Audio, Speech and Language Processing, 2009.
 
 
jmspeex
11 November 2008 @ 08:04 pm
It's been a while since the last time I discusses CELT, so at last, here's an update. A while ago, I was working on a low-complexity "profile" of CELT. The idea is to disable the use of the pitch predictor, which is quite costly in terms of complexity. To help speed things up, I also changed the allocator to do the conversion from bits to pulses one band at a time instead of doing it jointly for all bands at once. This decreases the complexity, while making the allocation a bit less optimal -- in theory. In practice, it means that for higher rates where bands require a large number of bits, the encoding can actually be more efficient because no bits are wasted. Because of that, I was able to replace all 64-bit arithmetic from CELT by 32-bit splits. On top of that, Timothy (derf) managed to -- again -- save some computation in the pulse encoding. The result is that in low-complexity mode, it takes about 1% CPU to encode and decode a 44.1 kHz mono stream at 64 kbit/s (on my 2 GHz box).

Here's what lies ahead now. I'd like to slowly work towards freezing the bit-stream. But there's a few things I want to do before even thinking about a freeze:

- Dynamic bit allocation
Right now, the bit allocation in each band remains about the same for every frame. I'd like to change that and allow more bits in the regions of the spectrum that are hard to encode at any given time. It's not as easy as it looks because: 1) you need to figure out the best allocation based on psychoacoustics and 2) You need to *encode* the allocation information compactly enough that it doesn't waste all you saved from the dynamic allocation. So far, my attempts at 1) haven't been very successful.

- Folding decision
To prevent "birdie" artefacts, we use a certain amount of spectral folding that acts as a noise floor. In most cases, this improves quality, but for very tonal signals (e.g. glockenspiel), it transforms a pure tone into noise, which is annoying. So I'd like to be able to turn that feature on or off based on the data, but again, it's not simple.

- Stereo coupling
CELT already does stereo. It does it by encoding the energy independently for each channel and doing (sort of) M-S encoding of the "residual". This works, but probably doesn't save much compared to using two mono streams. So I want to see how it can be improved. There's already some (disabled) code to do intensity stereo, but maybe there's more that can be done.

Of course, I only have a vague idea of how to do the three things I listed above, so suggestions are welcome.
Tags: , ,
 
 
jmspeex
14 May 2008 @ 08:32 pm
I've been conducting a listening test for a paper on the CELT codec. I've been comparing it to AAC-LD, G.722.1C (aka Siren14) and MP3. Here are the results for the 48 kbit/s MUSHRA test (95% confidence intervals):


And here are the results for the 64 kbit/s MUSHRA test (95% confidence intervals):


Considering that I was just hoping wouldn't be too much worse than these codecs, it's a pleasant surprise. That's because the version of CELT I tested had a latency of 8.7 ms, while the latency of AAC-LD was 34.8 ms (I know it's possible to get down to 20 ms, but the Apple implementation doesn't do it), G.722.1C was 40 ms and MP3 (LAME) was probably way above 100 ms.

In the graphs above, the error bars don't consider the fact that the MUSRA test is paired, so there's more statistically significant results than what is apparent. Basically, CELT and AAC-LD come out ahead of both G.722.1C and MP3 in both tests. CELT comes out ahead of AAC-LD at 48 kbit/s and the two are tied (i.e. no statistically significant difference could be observed) at 64 kbit/s.

Despite those results, I still think CELT can do better. Among the things I'd like to try once I'm done with the paper:
  • Add a psycho-acoustic mode and start changing the bit allocation based on the frequency content
  • Do lots of tuning
  • Do something to prevent time smearing of impulses (not TNS)
  • Encoding (or guessing) the spectral tilt in each band
  • Better stereo support
Tags: ,
 
 
jmspeex
24 December 2007 @ 08:40 pm
Before reading this, I recommend reading part 1 and part 2. As I explained in part 1, CELT achieves really low latency by using very short MDCT windows. In the current setup, we have two 256-sample overlapping (input) MDCT windows per frame. The reason for not using a single 512-sample MDCT instead is latency (the look-ahead of the MDCT is shorter). With that setup, we get 256 output samples per frame to encode (128 per MDCT window). Now, at 44.1 kHz, it means a resolution of 172 Hz, not to mention the leakage. That's far from enough to separate female pitch harmonics, much less male ones. To the MDCT, a periodic voice signal thus looks pretty much like noise, with no clear structure that can be used to our advantage.

To work around the poor MDCT resolution, we introduce a pitch predictor. Instead of trying to extract the structure from a single (small) frame, the pitch predictor looks outside the current frame (in the past of course) for similar patterns. Pitch prediction itself is not new. Most speech codecs (and all CELP codecs, including Speex) use a pitch predictor. It usually works in the excitation domain, where we find a time offset in the past (we use the decoded signal because the original isn't available to the decoder) that looks similar to the current frame. The time offset (pitch period) is encoded, along with a gain (the prediction gain). When the signal is highly periodic (as is often the case with voice), the gain is close to 1 and the error after the prediction is small.

Unlike CELP, CELT doesn't operate in the time domain, so doing pitch prediction is a bit trickier. What we need to do is find the offset in the time domain, and then apply the MDCTs (remember we have two MDCT windows per frame) and do the rest in the frequency domain. Another complication is the fact that periodicity is generally only present at lower frequencies. For speech, the pitch harmonics tend to go down (compared to the noisy part) after about 3 kHz, with very little present past 8 kHz. Most CELP codecs only have a single gain that is applied throughout the entire frame (across all frequencies). While Speex has a 3-tap predictor that allows a small amount of control on the amount of gain as a function of frequency, it's still very basic. Working in the frequency domain on the other hand, allows a great deal of flexibility. What we do is apply the pitch prediction only up to a certain frequency (e.g. 6 kHz) and divide the rest in several (e.g. 5) bands. For the example from part 2 (corresponding to mode1 of the 0.0.1 release), we use the following bands for the pitch (different from the bands on which we normalise energy):

{0, 4, 8, 12, 20, 36}

Another particulatity of the pitch predictor in CELT (unlike any other algorithm I know of) is that the pitch prediction is computed on the normalised bands. That is we apply the energy normalisation on both the current signal (X) and the delayed (pitch prediction from the past) signal (P). Because of that, the pitch gain can never exceed unity, which is a nice property when it comes to making things stable despite transmission losses. Despite a maximum value of one in the normalised domain, the "effective value" (not normalised) can be greater than one when the energy is increasing, which is the desired effect. The pitch gain for band i is computed simply g_i = <X_i, P_i>, where <,> is the inner product and X_i is the sub-vector of X that corresponds to band i (same for P_i).

Here's what the distribution of the gains look like for each band:



It's clear from the figure above that the lower bands (lower frequencies) tend to have a much higher pitch value. Because of that, a single gain for all the bands wouldn't work very well. Once the gains are computed, they need to be encoded efficiently. Again, using naive scalar quantisation and encoding each gain separately (using 3 or 4 bits each) would be a bit wasteful. So far, I've been using a trained (non-algebraic) vector quantiser (VQ) with 32 entries, which means a total of 5 bits for all gains. The advantage of VQ for that kind of data is that it eliminates all redundancy so it tends to be more efficient. The are a few disadvantages as well. Trained VQ codebooks are not as flexible and can end up taking too much space when there are many entries (I don't think 32 entries is enough for 5 gains).

The last point to address about the pitch predictor is calculating the pitch period. We could try all delays, apply the MDCTs and compute the gains for each and at the end decide which is beat. Unfortunately, the computational cost would be huge. Instead, it's easier to do it in "open loop" just like in Speex (and many other CELP codecs). We compute the generalised cross-correlation (GCC) in the frequency domain (cheaper than computing in the time domain). The cross-spectrum (before computing the IFFT) is weighted by an approximation of the psychoacoustic masking curve just so each band contributes to the result (instead of having the lower frequencies dominate everything else).

Now the results: how much benefit does pitch prediction give? Quite a bit actually, hear for yourself. Here's the same speech sample encoded with or without pitch prediction. Even on music, which is not always periodic, pitch prediction can a bit, though not as much. I think there's potential to do better on music. There's a few leads I'd like to investigate (and again, I'm open to ideas):
  • Using two pitch periods
  • Frequency-domain prediction
Feel free to ask questions below in the (likely) case something's not clear.
Tags: , ,
 
 
jmspeex
20 December 2007 @ 08:35 pm
As mentioned in my previous post, one of the main ideas in CELT is to divide the signal in bands and directly encode the energy in each band. There are several reasons for that. First, the ear is generally more sensitive to the energy in a frequency band than to the exact details of where that energy is. This is especially true at higher frequencies, where we sometimes only need to get the rough shape of the spectrum right to get decent quality. A second reason is that it is convenient to separate the signal into energy and "details", just like CELP codecs (such as Speex) split the signal into a filter and an excitation, or Vorbis that uses a "floor". In CELT, we go one step further and actually divide the data in each band by the band's energy and then constrain each band to have unit magnitude (\sum (x^2_i)). Once a band has been normalised, its magnitude will always be equal to 1, no matter what happens to it. Any processing/encoding/mutilating we do to it needs to preserve that unit magnitude.

Ideally, the width of each band should be roughly one critical band. In practice, there isn't much much point in having a single frequency bin per critical band, so although the ear has roughly 25 critical bands, we only use about 15-20 in CELT. Here's an example using 256-sample MDCTs (128 output samples) and 15 bands. The band boundaries are:

{0, 2, 4, 6, 8, 12, 16, 20, 24, 28, 36, 44, 52, 68, 84, 116, 128}

Using this, band number 0 includes samples 0 and 1, while band number 14 includes samples 84 to 115. The remaining samples (116-127) are just discarded because they are outside the ear's range (a 44.1 kHz sampling rate is assumed here).

Now, the first thing we need to do is actually encode the energy of each band in an efficient way. The ear is more sensitive to lower frequencies, so these will need to be encoded with better resolution. Of course, we use the log (dB) domain and add a small value (equivalent to -10 dB) just to prevent overflows when taking the log. In this example, we use a quantisation interval of 0.75 dB for the lowest band, increasing linearly to 4.25 dB for the highest band. Doing naive quantisation/encoding over a fixed range would require a prohobitive number of bits (>100 bits per frame) and is thus not an option. Measuring the ideal entropy (assuming a perfect probability model for the data) for same speech and music samples gives us an average of 71 bits per frame. That's still expensive, considering we're going to encode around 200 frames per second.

The only way to further reduce the number of bits used for energy quantisation is to eliminate redundancy. Energy usually doesn't vary that much from one frame to the next, so we can use a time-domain predictor of the form P(z) = 1 - alpha*z^-1. That means we remove from the current energy alpha times the previous energy (we're already in log domain). Here's what the entropy per frame looks like (as a function of alpha) if we use that predictor:



That's already much better. As we increase the prediction coefficient (alpha) from 0 to 1, we can reduce the entropy from 71 bits down to around 45 bits, a 26 bits improvement. Unfortunately, using alpha=1 for prediction isn't practical because it would mean that any transmission error (e.g. lost packet) would propagate through time with no attenuation (even 0.95 would take too long). A value of alpha around 0.7 would be a nice tradeoff between redundancy reduction and limited error propagation. That's 52 bits per frame. However, we're not done yet eliminating redundancy. There's still a correlation across the bands in the same frame. This time, we can use any predictor we like because a frame either arrives completely or it doesn't. So we use a second predictor Q(z) = (1 - z^-1)/(1 - beta*z^-1). With that second predictor, the entropy goes down again:



With alpha=0.7 and beta=0.5, we have just under 44 bits of entropy. Much better than the 71 bits we started from and even better than only the first predictor with alpha=1. Of course, that entropy value is optimistic because it assumes a perfect probability model and because it assumes that prediction isn't degraded by quantisation.

For encoding, it's not very practical to use the actual probability model because would require storing the probability for each value of each band (and for each bit-rate if we change the resolution). However, it turns out that the distributions are somewhere between a Gaussian distribution and a Laplacian distribution. Although actually closer to being Gaussian, we use a Laplacian model because it reduces the spikes in bit-rate (a Gaussian would significantly underestimate the probability of extreme values). Despite the rough approximation, the average actual encoding rate for all 15 bands is 46 bits per frame. That's just 2 bits worse than the theoretical best case using that predictor. Not bad at all.

I've also played with the DCT (for intra-frame redundancy) without getting better results, mainly because it's harder to control the error in each band. Still, there may be better ways that what I've done so far to reduce the bit-rate for the energy. I'm open to ideas/suggestions on that.

Updated: Fixed the definition of the example bands.
 
 
jmspeex
Here's a bit more info on the CELT experimental codec I just released. First, the goals. For the past two years, Monty and I have been discussing what the next generation free audio codec would be. Monty's goal is to basically be better than Vorbis in terms of quality vs compression. My main goal, on the other hand is to have a high-quality codec with very low latency, even if it means being less efficient. So, we're trying to combine both into the Ghost codecs. Whether that'll succeed or we have to go with two separate codecs is still an open question. For now, I'm working on CELT, which I hope to be both a low-latency codec, and a noise encoder to be used in a lower bit-rate Ghost codec like Monty wants.

Below is an overview of how CELT works:


It may look a bit hairy, but it's actually a relatively simple idea. The four main ideas are:


  • We use a lapped transform (here an MDCT) on very short windows (128-256 samples)

  • The spectrum is divided in bands and the energy in each band is encoded and kept constant

  • We use a time-domain pitch predictor, with frequency-domain gains

  • The residual is encoded using a pulse codebook



I'll address each of these (and more) in later posts.
Tags: , ,
 
 
jmspeex
08 December 2007 @ 11:52 pm
Speex 1.2beta3 has been tagged and will be up on the website shortly. There should even be Windows builds this time thanks to Alexander Chemeris. I'm expecting the next release to be named 1.2rc1. There's still a few things to address before 1.2, but I'm hoping the libspeexdsp API will be complete for rc1. Stay tuned.

There's another releasing coming up: a new Code-Excited Lapped Transform (CELT) codec prototype. This codec is (for now at least) meant to compete with neither Vorbis, nor Speex. Instead, the primary idea is to reduce latency to a minimum -- currently around 8 ms (compared to ~25 ms for Speex and ~100 ms for Vorbis). Of course, this comes with a price in terms of efficiency, but I'm already surprised the price isn't bigger. I've been mainly focusing on speech, but unlike Speex, I'm hoping this one will handle music as well. For the curious, I've put a 56 kbps CBR music file (original). This is still very experimental and everything is still likely to change, including the exact goals. I'm still trying to figure out how to put psychoacoustics into that. Stay tuned for the release of version 0.0.1 (or should I use a negative version number to make it clearer it's experimental?).

CELT is based on a paper I submitted to ICASSP and which I'm hoping will be accepted so I can make it available to everyone. The only difference is that the ICASSP paper was based on the FFT (non critically sampled), whereas this version is based on the MDCT. One part that is already published though is Timothy's explanation of the pulse codebook encoding along with some source code. Now, here's a challenge. Who can beat the algorithm on Timothy's page? Simply stated, the idea is to enumerate all combinations of M pulses in a vector of dimension N, knowing that pulses have unit magnitude and a sign, but all pulses at the same position need to have the same sign.

Updated: The full source for CELT is available at: http://downloads.us.xiph.org/releases/celt/celt-0.0.1.tar.gz or through git at http://git.xiph.org/celt.git

Updated again:: Speex 1.2beta3 is out.
 
 
jmspeex
23 November 2007 @ 02:17 pm
I realised recently that Speex tends to have a lot of array copies done using for loops instead of memmove()/memcpy(). However, I wasn't quite happy with just replacing everything with memmove()/memcpy() because of the very poor type safety they provide. For example, it's all too easy to change the type of an array and have memmove() doing the wrong thing without any warning. So, after a bit of thinking, I came up with the following macro, which I think should work:

#define SPEEX_MOVE(dst, src, n) (((dst)-(src)), memmove((dst), (src), (n)*sizeof(*(dst))))

Compared to memmove(), this macro does two things. First, it removes the need for using sizeof() in the length argument, removing a source of error. Second, the discarded (dst)-(src) value before the comma basically ensures that the compiler will generate an error if src and dst point to different types. Expect to see that macro appearing in Speex soon. I'd be curious to hear if anyone knows of any unwanted side effects of this, except of course the usual "don't use arguments with side effects" limitation.

Update: Actually, the following expression also has the advantage not producing a warning with gcc:

#define SPEEX_MOVE(dst, src, n) (memmove((dst), (src), (n)*sizeof(*(dst)) + 0*((dst)-(src)) ))
 
 
jmspeex
07 October 2007 @ 09:01 pm
Slightly less technical than usual, here are a couple tips for friends moving electric devices from a North America to a country that has 220-240V (i.e. most countries). Not every 120V device can work easily on 240V, but some will. To figure it out, the key is to look for a label on the device (present almost all the time) that describes the input voltage. For example, the label on my laptop's AC adapter says 100-240V, 50-60Hz, 65W. The key here is the voltage (in this case 100-240V) and the power (65W). It's very rare that the frequency (50 vs 60 Hz) will cause problem. For my laptop adapter, 100-240V means it will work pretty much anywhere with no problem. All I need is an adapter to plug it into the wall socket where I'm going.

Almost all laptops and many electronic devices (e.g. MP3 players, cell phones) have AC adapters that work at all voltages. However, there are some that do not. For instance, some may have "110-120V, 10W". This means that simply plugging the device on a 240V outlet is very likely to fry it in a very short amount of time. For these devices, it may be possible to use a step-down transformer. Those can be purchased from places such as: here and there (note, I'm not endorsing any of those places). Transformers are rated in terms of power and are generally practical only for appliances that take under 1000W. In general, the higher the power, the more expensive and the noisier the transformer is. The last one is important. We have a 500W transformer that is on all the time and if it were noisier it would start becoming annoying. Indeed, the 750W transformer is annoying and we only uses it for short periods of time. Most electronic devices that don't work directly on 240V are in this category. When it comes to blenders, vacuum cleaners and a few others, it'll depend on the actual power it requires and what you intend to buy in terms of transformer.

Now, what about devices that are rated above 1000W or so? Some of them can actually work using another kind of voltage converters that are sometimes rated for up to around 2000W. These converters are generally quite small (unlike the transformers above) and cheap. So what's the catch? These will only work for "resistive" (i.e.) heating devices, such as an iron or hair dryer. Still, if you're planning on going abroad for a long period of time and all you have is an iron and a hair dryer, I would actually suggest buying these in 240V version. Over 2 years, we've actually blown a hair dryer using such converter and using it with the iron, we then ended up blowing the converter (we we had to buy both devices again).

So, what's left? Devices that just shouldn't be moved. I definitely don't recomment moving the stove, fridge, washer/dryer, microwave oven or any appliance like that. Things like (most) steros can be moved. You may be able to move the TV/DVD/VCR as well, but be aware that most countries outside North America don't use NTSC, so they may be of limited use (e.g. Australia and most of Europe use PAL, so an NTSC TV just does not work there).

One last tip. Buy several adapters to plug your devices into the outlets wherever you're going. Make sure you get some adapters with and without the ground pin. One way to reduce the number of adaptors is to use power boards. Those can be used either on the 120V you get from a transformer (so you don't need one transformer per device) or directly on 240V to distribute power to your devices that work "natively" on 240V. In the latter case, make sure you get power boards without surge protection, otherwise the protection is likely to blow very quickly. Also make sure never to plug 120V-only devices into these 240V power boards.
 
 
jmspeex
20 March 2007 @ 12:36 am
So a while ago, I wasn't careful with type lengths and wrote some code in the speex encoder (speexenc, not libspeex) that wouldn't work very well on 64-bit machines. More precisely, it would make speexenc crash on startup 100% of the time, so you can't really miss it if you have a 64-bit machine. Fortunately, someone noticed and the bug was promptly fixed. This should normally have been the end of the story... except that Ubuntu was going to ship Dapper with an older version (current Debian unstable).

Turns out that the bug was reported against Dapper very early on. A patch was even posted more than a month before the release of Dapper. From there, it took 11 months for the 2-line fix to be applied and released. And if it wasn't for me harassing some of the developers (thanks crimsun, tritium for pushing the fix in), I don't think the fix would never have made it.

Sometimes one wonders why it is that Ubuntu has a bug tracker. Another example is bug #52600. You can't see it because it's marked as a security bug, but considering I filed it more than 8 months ago, I don't think making it private makes sense anymore. That one comes down to the fact that any local user with no privilege can crash a Dapper machine very easily. You just compile the following program:

#include <sched.h>
int main() {
struct sched_param param;
param.sched_priority = sched_get_priority_max(SCHED_FIFO);
sched_setscheduler(0,SCHED_FIFO,¶m);
while(1);
}

and then execute it. What this does is simply ask the maximum real-time priority and then spin doing nothing, starving every single other process on the machine and forcing a reboot. While allowing SCHED_FIFO to some users in some circumstances makes sense, I can't understand why it's enabled for everyone on the system. It's a bit like making the shutdown command setuid root. Yeah for the Ubuntu LTS (Long Time to get Support) process!
 
 
jmspeex
14 January 2007 @ 04:31 pm
I just attended FOMS last week and I have to say it was one of the most useful conference/workshop I have attended. Not only did I get to meet three new Xiph.Org people for the first time (Ralph, Mike, Tim), but I also got to discuss with James (ALSA) and Lennart (PulseAudio) and many others. We had a total of 22 people, of which 12 came from oversees for FOMS/LCA. The result: a new audio API for PulseAudio (and other systems) and a new BSD resampling library aimed at desktop audio. Although I did help a little in the organisation, most of the credit for making FOMS rock so much goes to Silvia Pfeiffer. I just hope next year's edition (Melbourne?) will be just as good!
 
 
jmspeex
02 December 2006 @ 11:22 pm
After a couple days fighting with this annoying overflow bug, I think I've managed to solve the problem. As you can see, some of the fixes are not very nice. It basically comes down to
  • Adding explicit saturation (SATURATE) before 32-bit to 16-bit conversions.
  • Scaling the signal up/down for some operations to avoid having to add saturation all over the place, especially in critical loops
  • PSHR* is evil. Well, not quite but can you shot the danger in the PSHR32 definition?

    • The moral of the story: saturating isn't great, but it still beats overflowing!
Tags: ,
 
 
jmspeex
25 November 2006 @ 02:56 pm

Ubuntu EdgyFeisty amd64 on Dell Latitude D820



Here's a list of what works, and what doesn't with Edgy (and now Feisty) on my Dell D820 laptop. Also, some fixes for known bugs.

Summary



In short, the amd64 version of Edgy is simply unusable out of the box and even the installer crashes most of the time. The 32-bit version does not seem to be affected. The only way to get around that is passing the notsc and no_timer_check options to the kernel. After than it becomes usable.

Components Status Notes
Intel Core Duo 2 T7200 (2.0Ghz 667Mhz FSB) Works with fix Pass options notsc and no_timer_check or use newer kernel. Updated: No problem on Feisty.
2.0GB, DDR2-667 SDRAM Works
15.4" WUXGA LCD display (1920x1200) Works with fix You need to install the 915resolution package to get 1920x1200, otherwise you're stuck with 1600x1200.
Intel 945GM graphics Works with fix X itself works out of the box, but you need to pass the -noacpi option to the X server, otherwise the display goes black when you close the lid (and never comes back). Updated: The "intel" driver now works on Feisty and is recommended. I use the "NoPM" server option and had no problem since then.
CD-RW/DVD+RW Drive Works Haven't tested too much, but it seems to work fine
IntelĀ® 3945 802.11a/g Dual-Band Mini Card Works but Requires binary-only daemon provided with Ubuntu (driver is not merged and is a pain to compile if you want to use a custom kernel)
Broadcom Corporation NetXtreme BCM5752 Gigabit Ethernet PCI Express (rev 02) Works
ICH7 Family Serial ATA Storage Controller IDE (rev 01) Works
Bluetooth Untested
Internal modem Untested
USB Works Tested with Extigy soundcard, mouse, printer, mass storage
Touchpad Works but I use proto=exps for the psmouse module, otherwise tapping doesn't work. Updated: Using the touchpad as synaptics now works with Feisty, but then docking/undocking causes tapping to no longer work. I'm stuck with using it as exps and reloading the module every time I dock/undock (anyone's got a fix for that?)
High Definition Audio Controller (rev 01), STAC9200 audio codec Problems Basic playback works, but problems in full-duplex mode and mixer is b0rked. Updated: One must use plug:front (instead of plughw:0,0) for the PCM mixer to work. Also, the Hg version (since May) of ALSA appears to work better with full-duplex, although it's not oerfect.
Smartcard reader Untested
Express card Untested
Firewire Untested
Suspend to RAM Problems It works sometimes, but sometimes it never wakes up. Updated: wake up works now, but things stop working sometimes, e.g. keyboard
Suspend to disk (hibernate) Problems Tried once, didn't work. Haven't investigated
Docking/Undocking Problems Was completely broken with Edgy. Now with Feisty, the main problem I have is with the touchpad (see above).
IRDA Untested


Kernel problems



While the 32-bit Edgy kernel runs find on this machine, the 64-bit one is totally broken by default. There are two problems:
1) X hangs every now and then, usually after clicking on a button. dmesg also reports clock problems. This is solved by passing the "notsc" kernel option.
2) Kernel crashes on any ACPI event. Whether I close the lid or plug/unplug the AC adaptor, the kernel crashes right away with nothing printed on the console. This problem is fixed by passing the "no_timer_check" kernel option.
It seems like kernels from both Debian Etch and Fedora Core 6 are also affected. Stock kernel 2.6.19rc6 (which I'm running now) isn't affected.

Soundcard problems (Intel HDA)



The D820 ships with an on-board Intel HDA card (Sigmatel STAC 9200 codec). Alsa support for that card is currently (as shipped with Edgy or ALSA 1.0.13) very buggy. The PCM control does not work. When I plug in the headphones, I can still hear the audio from both the internal speakers and the headphones. Also, trying to do full-duplex results in a lot of xruns. I don't have any fix for that yet. If anyone knows how to get it to work, please reply here.

X problems


Seems like the X server is now trying to handle some ACPI events. Unfortunately, the ACPI event generated when I open or close the lid just kills X. To fix the problem, I had to edit /etc/X11/gdm/gdm.conf and add the -noacpi to the X command line (there are three different command lines, so you need to add it to the one being used).
Just after installing Edgy, all my X server could do was 1600x1200. To get the full 1920x1200 resolution, I had to install the 915resolution package.

Software for AMD64


Overall, the software works pretty well. The only things that don't work ATM are the ones that involve closed-source software. No acroread, flash (the ad display plugin) and binary codecs for MPlayer. So far, I can live without those without too much problem.
Tags: ,
 
 
jmspeex
20 November 2006 @ 12:08 am
After I bought myself a new Dell D820 laptop with a Core 2 Duo chip, I thought I'd finally be able to run Linux in 64-bit mode. Not quite. I tried Ubuntu Edgy, Fedora Core 6, Debien Etch (testing), all of which where totally unusable due to strange X freezes and random crashes. After searching for a while, I finally find out that adding "notsc" works around some kernel bug. Good, so I give Edgy another try and indeed the strange X freezes are gone. Unfortunately the random crashes are still there... and they're not quite random. Turns out that any ACPI event (closing the lid, unplugging the AC adapter) simply crashes the machine before it can even print a panic(). Not good! So in the mean time I'm stuck in 32-bit world waiting for a 64-bit distro that would finally support my machine. If anyone managed to run a 64-bit distro on a Dell laptop with a Core 2 CPU, please leave me a note.

Updated: Looks like adding the no_timer_check option solves the problem. Haven't dared trying suspend yet, but it's already much better than crashing every time I plug/unplug the ac adaptor!
Update #2: Not quite perfect though. Closing the lid makes the screen go blank (never turns back on) and docking/undocking crashes the X server (go figure).
 
 
jmspeex
16 November 2006 @ 01:14 am
OK, so I thought the fixed-point code in 1.2-beta1 was getting pretty good. But that was until a user ((wouldn't things be simple without them!) was able to make it fail horribly by feeding it totally clipped speech. It turns out that the file manages to trigger at least a half-dozen overflows all around the code, some of them easily fixed, some not.

So here's the deal with fixed-point. Some CPUs/DSPs support saturating versions of add/sub/mul/... and some don't. Most G.72x codecs are usually implemented assuming that they exist, so they don't need to worry about overflows. For Speex, I decided to do it without assuming hardware saturation, so it can run on ARM and other chips (including x86) that don't support saturation. And that's how everything suddenly becomes more complicated. If once in a while 0.5 + 0.6= 1.0, you usually don't care too much. On the other hand, if 0.5 + 0.6 = -0.9, then suddenly you do care.

So the fundamental question here is how much overflows on corrupted input can be tolerated (based on the "garbage in, garbage out" principle) and how much needs to be avoided regardless of the input? Answer when I get to the bottom of this. To be continued...
 
 
jmspeex
15 November 2006 @ 05:06 pm
Everybody's doing it, so I thought I might as well. Let's see how often I can manage to update this blog.
 
 
 
 

Advertisement

Customize