nGram Dictionary

On a recent project, had to deal with searching of tens of thousands of product descriptions, with a need to find substring matches quickly.  The select: statement in Smalltalk works like a SQL table scan – okay for small collections, but becomes seconds+ response time with larger lists.

An effective solution to this is an nGram Dictionary.  Strings of words can be broken up into sets of tri-grams, quad-grams, quint-grams, and so on.

My approach to this is a Dictionary indexed by nGram length, each element containing dictionaries of nGram strings of collections of the string objects to be searched.  Thus, indexing results as such:

3 -> ana -> ('banana')
ban -> ('banana', 'band')
4 -> bana -> ('banana')
band -> ('band')
anan -> ('banana')
5 -> banan -> ('banana')

Continue reading nGram Dictionary

Decorators as Guards

I’m exploring a new pattern – I’m sure it’s been done before, but it’s new to me, and a useful exercise to get to the next stage with an application I’m envisioning.  The pattern is using Seaside Decorators as security guards.

So, last night, finally squeezed in enough time to my decorator guards into action.  Happy to report they’re working fine.  You can review the demo app yourself at .  Login as, password bob, or, password alice. Continue reading Decorators as Guards

Health Care – costs vs. price

It’s oft repeated that health care costs continue to rise at a crazy pace.  While most costs of most products and services have been decreasing, in terms of “real”, inflation-adjusted dollars, health care, like education, have been increasing at record paces.  And, unlike the housing/real estate “bubble”, there doesn’t seem to be an end in sight.  What’s going on?

Most commentators talk about health care cost increases.  However, the evidence I see suggests something different.  Yes, we’re seeing health care price increases.  But cost increases? There’s a difference.

Continue reading Health Care – costs vs. price

The Hottest Summer in Houston

1980 remains the hottest summer in Houston: 14 consecutive 100+ degree days, a high of 107, and 32 days altogether at 100 or above.

Yep, I remember that summer well: I was a lifeguard that year – and by the end of the summer, a coach, swim teacher, a pool cleaner, and a front-desk clerk as well.  I ended up with all the jobs at this local community pool, because workers were dropping like flies!  Seriously, by mid-summer every local kid had quit, and I kept picking up additional job duties; by the end of summer, I was working 12-14 hours a day.

Continue reading The Hottest Summer in Houston

Seaside – how to change a page’s title

I know I’ve seen the answer to this before, but had a hard time tracking it down, so thought it worthwhile to post.

To change the web page’s HTML title (or any other head information) for a web component, create a method updateRoot: .  This method will be called when the component is rendered on the page – remember to always super the call too.

component>>updateRoot: anHtmlRoot
    super updateRoot: anHtmlRoot.
    anHtmlRoot title: 'fooTitle'.
    "do anything else you'd like to Root here too"

Updater examples with Scriptalicious & Seaside

The first of a series of coding vignettes in Seaside.

 Context: a web app with a form containing multiple input fields.  Instead of waiting to submit the form, I want the page to update another element every time an input is changed.  In this example, a total field.  Solution was used for a simple MoneyCounter application:

MoneyCounter>>renderContentOn: html html form id: 'f'; with: [

html table: [
    html tableRow:
     [html tableData: [html text: 'pennies'];
         [html textInput
           id: 'pennies';
 	   "on: #pennies of: self;"
           callback: [:value | self pennies: value];
 	   onChange: (html updater
                       id: 'total';
		       triggerFormElement: 'pennies';
 		       callback: [ :r  |  self renderTotalOn: r])
       ] ]


    html tableRow:
      [html tableData: [html text: 'TOTAL'; space];
            tableData: [html span id: 'total' ; with: self total]]]]

MoneyCounter>>renderTotalOn: html

 html render: self total
MoneyCounter>>pennies: value
   pennies := value

	^((pennies asInteger * 0.01) +  etc.   )

Continue reading Updater examples with Scriptalicious & Seaside

Tax Savings for Software Companies in Texas

For the past 6 years, one of the major specialties of my company is writing software applications dealing with corporate taxation.  This has usually been internal, custom apps for a corporate tax department, but recently we have entered a partnership with a local accounting firm to do some web service-based applications.

The first of these applications is now available for beta testing, so now I’m reaching out to all Texas-based software development companies to help with testing out this application, as well as potentially save money on your taxes.  You may qualify!

Continue reading Tax Savings for Software Companies in Texas

Sqwitter – demonstrating a Seaside app

“returns messages from self and friends”
| allsqweets |
allsqweets := SortedCollection sortBlock: [ :a :b | a timestamp < b timestamp ].
allsqweets addAll: self messages.
myFriends do: [ :each | allsqweets addAll: each messages ].
^ allsqweets


allSqweets  is a simple 4 line method, but delivers the core feature of Twitter: displaying a timeline of all a user’s  and his friends’ messages.  And thus started my exploration into Seaside, a Smalltalk framework for developing web applications.

I started playing around with Seaside last year, but time constraints prevented me from getting too far.  But, I’m at it again.  This latest burst of energy was in part generated by the “fail whale” of Twitter.  Prompted by friends, I started using Twitter a lot from the early part of this year.  Pretty cool, simple application, yet, why does it keep crashing?  Hey, though, it’s got a million users, so give it a break.

A more perplexing problem, to me at least, was the linking of replies.  Or the lack of said linking.  I click “reply” to someone’s twitter, post my reply.  There’s even a link on my posting “in reply to …”.  Click on it, and more times than not, it would link back to some other message, not the message I was replying to.  What’s up with that?  It got me thinking: conceptually, Twitter is a really simple application: what would it look like in Smalltalk?  Not trying to solve performance problems or scalability or anything real like that, but simply as a pure object exercise, just what would Twitter look like in Smalltalk?

I’ve been looking around for other web frameworks to develop in.  That’s why I’ve given Seaside a serious look.  On the prompting of other friends, I recently gave Ruby on Rails a look, and earlier had glanced over at Django.  Both, surprisingly, kind of gave me the shudders: pop in some code for your initial objects, and they auto-generate lots of code for you.  All code broken down into a clean separation of concerns: model files here (class definitions), view files there (web templates), controllers over there.  Nice, but now I’ve got lot’s of files, and I’ve got to navigate my way around to figure out what to modify where.  I’m sure once you get used to it, it’s easy to remember what directory has what files, but I’m impatient you see, and don’t have much spare time (see above).

All these files reminded me of my C development days, with scores of header files (.h), code files (.c), and my own make files, etc.  I’m not knocking these frameworks on the Ruby or Python languages – long ago, I learned that language wars are useless.  But, maybe it’s just my aesthetic:  I’ve always found Smalltalk super clean and easy to read and navigate, and it remains my fastest development language.  So, yeah, maybe Ruby and Python are “Rapid-Enough” Application Development environments, compared to, say Java or Visual Basic, but I yearn for truly Rapid Application Development again.

Then came Barcamp Houston.  Kind of creeped up on me.  It wasn’t til the morning of the conference when I suddenly got inspired:  I should present something here!  But, a late start to the day, and an already depleted laptop battery proved challenges for preparing any kind of presentation, much less something that wouldn’t embarress me.  I would need to spend a little time relearning the Seaside environment (which has undergone a few point releases since last year), and thinking through an application.

Then my thoughts went back to the Twitter applicaiton.  Wouldn’t that be a simple application to demonstrate within Seaside?  And sure enough, it was been: in fact, too simple to demonstrate some of Seaside’s more powerful features (like tasks and workflow – the real meat of “continuations”).  Nevertheless, a good test case to begin.

So, I still don’t have much time in my busy schedule, what with my own projects, my business, family, assisting in the launching of my wife’s new restaurant, etc., but I have been able to squeeze in a few hours here and there to develop Sqwitter, an implementation of Twitter in Squeak and Seaside.  The next several blog postings will walk through my development of Sqwitter, and should serve as a useful tutorial for Seaside (there are already some excellent tutorials on the web, and a book too).  When I finish the code, I’ll also roll the application out to a demonstration website for readers to play.

Next: Sqwitter objects and Seaside components.

How to Cpk the SQL Way

As mentioned earlier, I’ve been involved in client’s Production Reporting application project, when the subject of Cpk came up.  After a lot of inconsistent references to the statistic and lot’s of code that approximated but didn’t exactly calculate it, I finally discovered the proper formula for  Cpk.  Here it is:
Cpk  =   min (  USL - μ / 3 * \hat{\sigma}, μ - LSL / 3 * \hat{\sigma})

Where USL is the Upper Specification Limit, LSL is the Lower Specification Limit,  μ is the arithmetic mean of the results and \hat{\sigma}, sigma hat, is the estimated standard deviation (sigma hat is going to turn out to be the kicker in this equation).

Let’s say you have a product you have to make that must be within some specifications: like, the length of a sub sandwich.  It should always be 12″ long, but it’s acceptable if the final sandwich comes out between 11″ (LSL) and 13″ (USL).  Cpk is the minimum of either the average deviation from the upper limit divided by 3 times the estimated standard deviation, or the average deviation from the lower limit divided by 3 times the standard deviation.

All this would be easy to calculate using SQL if just plain ole standard deviation were involved.  However, using standard deviation instead of estimated standard deviation (sigma vs. sigma hat), and you have the equation for Ppk, a  different statistic.   Always one to be lazy, I ask our client: “would it be okay to report just the Ppk statistic?”  Of course, the answer was no! Continue reading How to Cpk the SQL Way