Richard C Yeh

Wednesday, May 07, 2008

FFT Window references

http://www.lds-group.com/docs/site_documents/AN014%20Understanding%20FFT%20Windows.pdf
http://www.bores.com/courses/advanced/windows/files/windows.pdf

The first link has a table with window characteristics:

Window / Best for these Signal Types / Frequency Resolution / Spectral Leakage / Amplitude Accuracy
Barlett: Random / Good / Fair / Fair
Blackman: Random or mixed / Poor / Best / Good
Flat top: Sinusoids / Poor / Good / Best
Hanning: Random / Good / Good / Fair
Hamming: Random / Good / Fair / Fair
Kaiser-Bessel: Random / Fair / Good / Good
None (boxcar): Transient & Synchronous Sampling / Best / Poor / Poor
Tukey: Random / Good / Poor / Poor
Welch: Random / Good / Good / Fair

A. O. Scott: Here Comes Everyboy, Again

"Mr. Sandler did not invent the archetype of the overgrown man-child, which has been around at least since the silent era. ... Nor has Mr. Sandler been alone, over the past 15 years or so, in turning male infantile aggression into the basis of a lucrative and long-running movie career. His rivals and confreres have included Jim Carrey, Jack Black and Will Ferrell — all of them different physical and temperamental types, but all of them committed to a brazen and unyielding refusal of maturity."

Add to this list the television comedies Seinfeld, Friends, Everyone Loves Raymond, and most recently, King of Queens. I had wondered about the provenance of this archetype, and the exact nature of the thing that so repulsed me about those shows, until this essay put a name to the phenomenon.

Thursday, April 03, 2008

DoG is used in wavelets

Just noticed that derivative of gaussian is also called "DoG" when used as a wavelet basis function. But I think Daubechies are more popular.

Are you using those in your price models?

Maybe in the future. One of my coworkers is using Daubechies wavelets in his model; compared with ordinary moving-average smoothing, doing a wavelet transform, cutting off the spectrum, and then transforming back seems to offer better noise reduction with less information. You still need a power-of-two data length, like for fast Fourier transforms.

Recently, I have been using techniques from "robust statistics": medians, trimmed data, etc. This reminded me of our early experiences with median filtering. Now I think that, while the technique may be nonlinear, the point of experimental science is to make a reliable measurement. If the number itself is what we want to measure (or even ordinary arithmetic transformations thereof), then I say filter. I think the only case where I wouldn't automatically reach for a trimmed mean (where the top and bottom x% of observations are excluded before taking the mean) is if I were doing transformation to a different space, such as FFT. But for dynamic pulling experiments, I suspect FFT is a rarely-used technique.

You probably know that the reporting of a standard deviation is only meaningful if the data are close to normal. Especially in the kind of data I have, I am plagued by fat-tailed distributions, and find that ordinary means and variances are too sensitive to outliers. The robust alternative is the MAD (median absolute deviation).

Wednesday, April 02, 2008

Database Monte Carlo (DBMC): A New Strategyfor Variance Reduction in Monte Carlo Simulation

Monday, March 31st,2008
10:00 am - 11:00 am

CNLS Conference Room (TA-3, Bldg 1690)

DataBase Monte Carlo (DBMC): A New Strategy for Variance Reduction in Monte Carlo Simulation

Professor Pirooz Vakili
Boston University

A well-known weakness of (ensemble) Monte Carlo is its slow rate of convergence. In general the rate of convergence cannot be improved upon, hence, since the inception of the MC method, a number of variance reduction (VR) techniques have been devised to reduce the variance of the MC estimator. All VR techniques bring some additional/external information to bear on the estimation problem and rely on the existence of specific problem features and the ability of the user of the method to discover and effectively exploit such features. This lack of generality has significantly limited the applicability of VR techniques.

We present a new strategy, called DataBase Monte Carlo (DBMC), which aims to address this shortcoming by divising generic VR techniques that can be generically applied. The core idea of the approach is to extract information at one or more nominal model parameters and use this information to gain estimation efficiency at neighboring parameters. We describe how this strategy can be implemented using two variance reduction techniques: Control Variates (CV) and Stratification (DBMC approach can be used more broadly and is not limited to these two techniques). We show that, once an initial setup cost of generating a database is incurred, this approach can lead to dramatic gains in computational efficiency. DBMC is quite general and easy to implement -- it can wrap existing ensemble MC codes. As such it has potential applications, among others, in ensemble weather prediction, hydrological source location, climate and ocean, optimal control, and stochastic simulations of biological systems.

We discuss connections of the DBMC approach with the resampling technique of Bootstrap and the analysis approach of Information Based Complexity.

LANL Host: Frank Alexander, ADTSC

Monday, February 11, 2008

Liars' Poker

After reading Michael Lewis's Liars' Poker, and Roger Lowenstein's When Genius Failed, I am struck with a thought: make a program to play Liars' Poker. Either decimal or dice systems. Bayesian estimation of probabilities. Python.

Monday, November 26, 2007

Citation software - the next generation

Zotero < http://www.zotero.org/ >

CiteULike < http://www.citeulike.org/ >

Del.Icio.Us < http://del.icio.us/ >

Tuesday, November 20, 2007

Unemployment benefits

The on-line application at the New York State Department of Labor Unemployment Insurance web site is only available "between the hours of 7:30 am to 7:30 pm Monday through Thursday (Eastern Time), Friday, 7:30 am to 5:00 pm, all day Saturday, and Sunday until 7:00 pm."

Bank of America / 100 Corporate Place, Suite 403 / Peabody, MA 01960 // EIN 56-2058405 NY 562058405

212-583-8000

Lockheed Martin NV Tech, Inc. / PO Box 98521 / Las Vegas, NV 89193-8521 // EIN 88-0347976 NM 02-297731-007

Monday, November 19, 2007

Working with recruiters

What is the value of working with recruiters?

Some have inside contacts (hiring managers).
Some have relevant information (company X has a hiring freeze).

Coverage is spotty, though, and big firms tend to have established paths through human resources.

after-Thanksgiving sales

"Black Friday" advertisements can be found at http://bfads.net/ and http://www.fatwallet.com/. Mentioned in last Sunday's New York Times.

This year, there doesn't seem to be anything special.

Electronic Lab Notebook (ELN) thoughts

My ideal electronic lab notebook would:

store everything in real time: I would like to paste in observations, measurements, complete data-processing histories, paper reprints;
allow search for: free-text, creation date range, citation date range, citation count
allow by-owner stateful highlighting of important things and collapsing of unimportant things: this would facilitate future review without permanently losing observations considered irrelevant at the time. An outline structure would probably be fine, but I would want to be able to collapse or expand any paragraph at any level.
allow annotated hyperlinks to prior paragraph-level observations and dates in this notebook and others, for backwards-referencing;
automatically count citations (i.e., this "observation was confirmed/rejected" or "protocol was reused" or "data was commented-on" on such-and-such entry) (the two points above combined mean that comments on previous observations, where the comments are actually new ELN entries, remain in-context);
I don't see that entries more than two hours old ever need to be edited again, so don't provide an "edit" feature.
treat protocols in a special way, somewhat like a source-code control or versioning system. When protocols are mature, it is easy for people to say: I used protocol ABC. However, over the course of years, there could be modifications to protocol ABC that could improve yield, save time, etc. Often, people will continue to say that they used protocol ABC. I would propose that every protocol's "permalink" should contain its version number, so that past links to protocol ABC would point to the historical ABC, not the new-and-improved ABC. Actually, this can be easily handled in the above scheme: just have the complete protocol as an entry; when the protocol is revised, then users of the newer protocol will link to the newer entries. This can be organized with a special "kprotocol" keyword, or a separate "Lab protocols" ELN where protocol developers have write access.

The GUI should facilitate easy linking, perhaps through endnotes for each entry.
When reading, the blog-like "Comment" button would become an "Add new entry to your own ELN" button, with the specified paragraph already referenced.
When writing, typing a special link string (such as double open braces) should bring up an in-page pop-up allowing search or selection of recently-cited and most-popular ELN citations. While we're at it, this should also allow web search and pasting of URLs or page snapshots.

Discovered that this is an old idea: http://pubs.acs.org/hotartcl/ci/00/jan/inet.html.