A Review of Ben Forta’s “Learning Regular Expressions”

Being able to whip up a regular expression in the ordinary course of data wrangling is one of those skills that separates the computing neophyte from the skilled. Years ago, documentation on the subject was scanty at best, relegated to man(1) pages and the occasional ‘NIX book footnote or appendix. With the widespread adoption of regular expressions in programming languages and the rise of the web, there’s been an explosion of poorly-written Web tutorials which purport to teach you regular expressions, but seldom does more than give your shift and number keys a workout.

Ben Forta’s Learning Regular Expressions, published by Addison-Wesley, aims to change this, and for the most part does an excellent job. Situated somewhere between the first page of Google search results for “regular expression” and Jeffrey Friedl’s Mastering Regular Expressions, it provides a step-by-step tutorial on using regular expressions to match text. Illustrated with common problems including matching phone numbers, postal codes (from three different countries, an nice addition to the tried-and-true example), email addresses, URLs, and snippets of HTML.

The book’s chapters are structured as lessons, each adding on to what you’ve learned in previous chapters. A careful study of the material and working of the examples will bring you not just a basic understanding of word and pattern matching, but more advanced use of regular expressions including position matching, backreferences, look-ahead and look-behind, and even conditional embedding. I’ve been using regular expressions (poorly, I admit, for the most part) for thirty years, and I had several “Oh, that’s how you can do that” moments when reading, especially the last few chapters.

Each chapter provides a set of increasingly sophisticated examples. I hesitate to call this a cookbook, because a competitor I will not name has largely co-pted that format, and truthfully, it’s more didactic than culinary. The presentation works, however; for each section, you’re presented with a problem, some sample text, a regular expression that may or may not solve the problem, and then a discussion of the regular expression and why it did or did not work. Including regular expressions that do not satisfy your goal permits Forta to link one section to another, building on your expectation of how things might work to how they do work. It turns out to be an effective way to present the material, and it’s easy to follow along using a tool such as grep.

The book closes with an appendix on the differences between several popular evironments’ implementation of regular expressions. This is helpful, because almost every reader will come to the book with a slightly different expectation of where they will be using what they learn.

I found Forta’s step-by-step presentation refreshing without being condescending. I would have preferred perhaps a few additional examples on some of the more advanced topics, like look-ahead and look-behind, and even subexpressions. However, he does the job and does it quickly; a motivated reader can go from knowing nothing about the subject to being proficient in just a few evenings, and I found it a quick read.

Winlink on Mac OS X with a TH-D74 over Bluetooth…

This took a bit longer than I expected it to, but that’s usually the way things work when you don’t know what you’re doing. When it’s done, it works quite well — I’ve got Winlink Express running on my Macbook under High Sierra with wine — no Parallels or VMware Fusion needed!

Here are the steps…
Continue reading Winlink on Mac OS X with a TH-D74 over Bluetooth…

Friday Fun (Thanksgiving 24-Nov-2017 Edition)

  • Can an AI be taught to explain itself? Cliff Kuang, New York Times Magazine
    This is a good account of some of the problems we face with machine learning today. There is a clear disconnect between the results you get with good applications of ML, and understanding why they work the way they do. I am not convinced, however, that just adding a second network on the side to explain the first really will solve the problem — it begs the question of how we will understand what that network is doing.
  • Come On Eileen, Dexy’s Midnight Runners. It’s worth finding different versions of this song and listening, because there are some fun intros and exits you don’t hear on the usual radio mix! See the wikipedia page for a nice discussion.

Friday Fun (17-Nov-2017 edition)

Friday Fun (10-Nov-2017 edition)

  • Sixty Years of Software Development Life Cycle Models, Kneuper, Ralf. IEEE Annals of the History of Computing. The Hegelian account of software development life cycles is apparent to anyone who’s been around for more than a decade, or even worked in different sectors of the industry. In my mind, what Kneuper brings to the discussion in this case is not a simple account of the thesis, antithesis, and synthesis of software development life cycles, but interesting facts about their early development. Prototypes played a role much earlier in lifecycle planning than I think many have been aware of, as was an iterative approach with feedback loops in general.
  • The Worst Day Since Yesterday, Flogging Molly. It’s been that kind of a week around here. I highly recommend you go out, get a Guiness, and crank up Flogging Molly as loud as your speaker will allow. You can’t go wrong with that on a Friday evening.

Friday Fun (03-Nov-2017 edition)

  • Idea of Order at Kyson Point, Brian Eno. Brian Eno needs no introduction; this is a nice short recent work he put out this year.
  • Deep Reinforcement Learning: Pong from Pixels. As promised, here’s a bit of a flashback on reinforcement learning, a neat older result on using reinforcement learning to train a network to play Atari video games. It’s important to recognize in this work, too, just like with the AlphaGo Zero work, that the resulting network does not understand what it’s doing. It can’t explain the rules, doesn’t have any abstractions. It’s just very, very, very good at pattern recognition.

Friday Fun (27-Oct-2017 edition)

  • More Than, Au Revoir Simone. Dreamy synthpop at its very best.
  • Mastering the game of Go without human knowledge, David Silver, et al; good Nature summary coverage as Self-Taught AI is best yet at Strategy Game of Go, Elizabeth Gibney. This is a very important result, although I think it’s been a little too widely hyped by the popular press as evidence of the coming singularity. Go is an interesting problem domain, because the combinatorial explosion of movies leaves it intractable for traditional game-playing approaches. Reinforcement learning, used by the team, is essentially how humans learn to play go, albeit far, far faster than we learn to play. I am looking forward to seeing discussions in the coming months of the new strategies AlphaGo Zero teaches human players.

Friday Fun! (20-Oct-2017 Edition)

(So, yeah, last week’s promise of a post tomorrow didn’t quite pan out. Anyway, without further ado…)

  • Resonant Expanse, Max Cooper & Tom Hodge
    I love almost everything I’ve heard by Max Cooper. He takes traditional trance to a whole new level with his use of modulation on minimalist melodies and percussion. This work by he and Tom Hodge is on several of my “music to think by” playlists.
  • A Preliminary Analysis of Sleep-Like States in the Cuttlefish Sepia officinalis, Marcos G. Frank , Robert H. Waldrop, Michelle Dumoulin, Sara Aton, Jean G. Boal.
    The punch line is in the abstract: “In addition, cuttlefish transiently display a quiescent state with rapid eye movements, changes in body coloration and twitching of the arms, that is possibly analogous to REM sleep.” Cephalopods and mammals diverged some five hundred million years ago — like, twice as long ago as when dinosaurs and mammals were hanging together. If this holds true, it’s amazing. I can’t even really call it convergent evolution, because I’m not convinced we can articulate what evolutionary pressures would generate the need for REM sleep in such different ecosystems, unless it’s actually a requirement for brain function. But human and cephalopod brains are very, very different — the common ancestor was probably something like a sea worm with a brain similar to C. elegans. So there’s an awful lot of room for divergent evolution, which we see in things like the gross structures.

Anyway, thinking of cuttlefish and perhaps octopuses dreaming of Max Cooper’s and Tom Hodge’s music makes me very, very happy.

New! Friday fun!

For years I’ve been in the intermittent habit of occasionally sending coworkers interesting things on Fridays that I find on the web. It may be a paper, blog post, or newspaper story that I found particularly interesting that they might not have encountered. Often, but not always, tech related. It might be a paper on ML, a history of computing paper, a blog post on keeping organized or using agile, or something like that. More recently, I’ve added a link to a single song I’d like to share. With both of these I provide a bit of commentary — no more than a few sentences, explaining why I thought this was worth following along.

I’m going to start cross-posting the content here, because there’s never anything proprietary about it, and it occurs to me that it might be of interest to a broader audience.

One remark is in order. I make no apologies that some of what I reference is behind one paywall or another, and no, I won’t send copies of what I link if you can’t access it. I am a member of the IEEE, ACM, and ARRL in part because I value the editorial work they do in their journals and to have access to their digital libraries; the fee I pay for membership in part enables them to do the editorial work that they do. The same goes for the few online news sources I pay fees to access. In all cases, there are ways for motivated non-members to access the works these organizations provide or curate — typically there’s free access for a number of works a month, or the opportunity to buy a reprint of a particular paper at a nominal cost. If what you say really interests you, it’s worth your effort to support the sources that curate that material for you.

Look forward to the first post tomorrow!