A Review of Ben Forta’s “Learning Regular Expressions”

Being able to whip up a regular expression in the ordinary course of data wrangling is one of those skills that separates the computing neophyte from the skilled. Years ago, documentation on the subject was scanty at best, relegated to man(1) pages and the occasional ‘NIX book footnote or appendix. With the widespread adoption of regular expressions in programming languages and the rise of the web, there’s been an explosion of poorly-written Web tutorials which purport to teach you regular expressions, but seldom does more than give your shift and number keys a workout.

Ben Forta’s Learning Regular Expressions, published by Addison-Wesley, aims to change this, and for the most part does an excellent job. Situated somewhere between the first page of Google search results for “regular expression” and Jeffrey Friedl’s Mastering Regular Expressions, it provides a step-by-step tutorial on using regular expressions to match text. Illustrated with common problems including matching phone numbers, postal codes (from three different countries, an nice addition to the tried-and-true example), email addresses, URLs, and snippets of HTML.

The book’s chapters are structured as lessons, each adding on to what you’ve learned in previous chapters. A careful study of the material and working of the examples will bring you not just a basic understanding of word and pattern matching, but more advanced use of regular expressions including position matching, backreferences, look-ahead and look-behind, and even conditional embedding. I’ve been using regular expressions (poorly, I admit, for the most part) for thirty years, and I had several “Oh, that’s how you can do that” moments when reading, especially the last few chapters.

Each chapter provides a set of increasingly sophisticated examples. I hesitate to call this a cookbook, because a competitor I will not name has largely co-pted that format, and truthfully, it’s more didactic than culinary. The presentation works, however; for each section, you’re presented with a problem, some sample text, a regular expression that may or may not solve the problem, and then a discussion of the regular expression and why it did or did not work. Including regular expressions that do not satisfy your goal permits Forta to link one section to another, building on your expectation of how things might work to how they do work. It turns out to be an effective way to present the material, and it’s easy to follow along using a tool such as grep.

The book closes with an appendix on the differences between several popular evironments’ implementation of regular expressions. This is helpful, because almost every reader will come to the book with a slightly different expectation of where they will be using what they learn.

I found Forta’s step-by-step presentation refreshing without being condescending. I would have preferred perhaps a few additional examples on some of the more advanced topics, like look-ahead and look-behind, and even subexpressions. However, he does the job and does it quickly; a motivated reader can go from knowing nothing about the subject to being proficient in just a few evenings, and I found it a quick read.

Winlink on Mac OS X with a TH-D74 over Bluetooth…

This took a bit longer than I expected it to, but that’s usually the way things work when you don’t know what you’re doing. When it’s done, it works quite well — I’ve got Winlink Express running on my Macbook under High Sierra with wine — no Parallels or VMware Fusion needed!

Here are the steps…
Continue reading Winlink on Mac OS X with a TH-D74 over Bluetooth…

Friday Fun (Thanksgiving 24-Nov-2017 Edition)

  • Can an AI be taught to explain itself? Cliff Kuang, New York Times Magazine
    This is a good account of some of the problems we face with machine learning today. There is a clear disconnect between the results you get with good applications of ML, and understanding why they work the way they do. I am not convinced, however, that just adding a second network on the side to explain the first really will solve the problem — it begs the question of how we will understand what that network is doing.
  • Come On Eileen, Dexy’s Midnight Runners. It’s worth finding different versions of this song and listening, because there are some fun intros and exits you don’t hear on the usual radio mix! See the wikipedia page for a nice discussion.

Friday Fun (17-Nov-2017 edition)

Friday Fun (10-Nov-2017 edition)

  • Sixty Years of Software Development Life Cycle Models, Kneuper, Ralf. IEEE Annals of the History of Computing. The Hegelian account of software development life cycles is apparent to anyone who’s been around for more than a decade, or even worked in different sectors of the industry. In my mind, what Kneuper brings to the discussion in this case is not a simple account of the thesis, antithesis, and synthesis of software development life cycles, but interesting facts about their early development. Prototypes played a role much earlier in lifecycle planning than I think many have been aware of, as was an iterative approach with feedback loops in general.
  • The Worst Day Since Yesterday, Flogging Molly. It’s been that kind of a week around here. I highly recommend you go out, get a Guiness, and crank up Flogging Molly as loud as your speaker will allow. You can’t go wrong with that on a Friday evening.

Friday Fun (03-Nov-2017 edition)

  • Idea of Order at Kyson Point, Brian Eno. Brian Eno needs no introduction; this is a nice short recent work he put out this year.
  • Deep Reinforcement Learning: Pong from Pixels. As promised, here’s a bit of a flashback on reinforcement learning, a neat older result on using reinforcement learning to train a network to play Atari video games. It’s important to recognize in this work, too, just like with the AlphaGo Zero work, that the resulting network does not understand what it’s doing. It can’t explain the rules, doesn’t have any abstractions. It’s just very, very, very good at pattern recognition.