Show HN: A shell-native cd-compatible directory jumper using power-law frecency (opens in new tab)

(github.com)

26 pointsjghub3mo ago16 comments

I have used this tool privately since 2011 to manage directory jumping. While it is conceptually similar to tools like z or zoxide, the underlying ranking model is different. It uses a power-law convolution with the time series of cd actions to calculate a history-aware "frecency" metric instead of the standard heuristic counters and multipliers.

This approach moves away from point-estimates for recency. Most tools look only at the timestamp of the last visit, which can allow a "one-off" burst of activity to clobber long-term habits. By convolving a configurable history window (typically the last 1,000+ events), the score balances consistent habits against recent flukes.

On performance: Despite the O(N) complexity of calculating decay for 1,000+ events, query time is ~20-30ms (Real Time) in ksh/bash, which is well below the threshold of perceived lag.

I intentionally chose a Logical Path (pwd -L) model. Preserving symlink names ensures that the "Name" remains the primary searchable key. Resolving to physical paths often strips away the very keyword the user intends to use for searching.

Show HN: A shell-native cd-compatible directory jumper using power-law frecency

(github.com)

26 pointsjghub3mo ago16 comments

On performance: Despite the O(N) complexity of calculating decay for 1,000+ events, query time is ~20-30ms (Real Time) in ksh/bash, which is well below the threshold of perceived lag.

16 comments

10 comments · 3 top-level

zahlman3mo ago· 3 in thread

Sorry, there's a lot here about the technical implementation details but much less I can understand about the problem being solved. What exactly do you mean by "manage"? What happens differently when you use this command, versus when you use built-in `cd`?

jghubOP3mo ago

To "manage" in this context refers to maintaining a ranked database of your directory history so you can navigate without providing full or relative paths. The difference in practice:

With built-in `cd`:

Navigation is manual and location-dependent. To reach a deeply nested project, you must provide the exact path: `cd ~/src/work/clients/acme/frontend/src/components`

With sd:

Navigation is intent-based. Once `sd` has indexed a directory, you provide a fragment of the name: `sd comp`

How it works differently:

1. Passive Indexing: `sd` remembers all you cd actions (up to configurable limit, typically several 1000) and computes after each cd action a weighted score for each path you visit.

2. Intent Resolution: When you run `sd <string>`, it doesn't check your current working directory. It queries the database for the most "frecent" (frequent + recent) path matching that string and moves you there.

3. Ambiguity Handling: If "comp" matches multiple paths (e.g., `frontend/components` and `backend/components`), the power-law model I mentioned calculates which one is more relevant to your current "attention span" to resolve the tie.

The problem being solved is the cognitive load of remembering and typing long, nested file paths. It replaces a "search and find" task with a "recall" task.

so the core functionality is very similar to things like z/zoxide, but the ranking is using full visit history and detailed behaviour thus is different.

17186274403mo ago

> To reach a deeply nested project, you must provide the exact path: `cd ~/src/work/clients/acme/frontend/src/components`

In key presses that is 'cd ~TABsTABwTABTABaTABfTABsTABcENTER', if it is an often used project, I likely have it in CDPATH, so it's 'c acmTABfTABsTABc' or if I am actively working 'CTRL-Rfro CTRL-R CTRL-R'.

To me it sounds like a solution looking for a problem. Why should I give up predictability when it won't really save keypresses. Not that keypresses are that expensive anyway.

jghubOP3mo ago

That workflow is perfectly valid -- if you have a stable set of 5-10 projects, CDPATH and manual Tab-completion are hard to beat.

I built sd for a different scenario: when your 'active' set of locations to reach is large and/or shifts frequently. When managing dozens of repos, e.g., where sub-directories have identical names (e.g., client-a/src vs client-b/src), CDPATH becomes noisy and Tab-completion requires more prefix-typing.

Tools like zoxide (33k+ stars or so on github) or z have proven this is a widespread need. sd is my take on providing that functionality in a shell-native way using a qualitatively different ranking approach via utilization of full cd events history and a power-law "aging" model that I find more accurate for fast context-switching.

In my experience, this saves significant keypresses by allowing for shorthand matching. For example, I can reach .../client-a/src just by typing 'sd ent-a'. The algorithm disambiguates based on the ranking; if the top match is ever wrong (it rarely is for me), you can just do ^P <CR> (recall and re-run) to jump to the 2nd highest match.

But it’s definitely a matter of taste -- manual curation vs. an automated 'attention span' model.

1 more reply

ekropotin3mo ago· 3 in thread

> While it is conceptually similar to tools like z or zoxide, the underlying ranking model is different.

I mean, cool stuff. But does it really matter from usability perspective?

jghubOP3mo ago

The difference in usability is most apparent during context switching and burst activity.

The Problem with Heuristics: Most tools use simple counters and multipliers. If you spend 10 minutes performing a repetitive task in a temporary directory (e.g., a "one-off" burst of cd actions into a build folder), a heuristic model can "overlearn" that path. This often clobbers your long-term habits, making your primary project directory harder to reach until you have manually "re-trained" the tool.

The Usability of the Power-Law Model: Because sd calculates the score by convolving a fixed window of history (the "Attention Span"), it distinguishes between a long-term habit and a recent fluke more effectively.

Stability: Your main work directories remain at the top of the results even if you briefly go "off-track" in other folders.

Decay Precision: if you look at the README of the project, the current chosen default exponent of the power-law decay p=9.97 -- which just means that if you include the 1000 last cd actions into ranking computation that the 500th of those (in the middel of the window) gets a weight of 1/2^9.97=1/1000 -- this effectively means that only any visit to the considered dir within last 500 contributes to the score (while the window width controls how long the cd remains within "attention span") and only the last 100 or so really influence the scor. so that is by default (can be altered by user easily) much "steeper" than a linear or simple exponential decay. It prioritizes what you are doing right now without forgetting what you do consistently.

Zero Maintenance: You rarely have to "purge" or "delete" paths from the database because the math naturally suppresses "noise" while amplifying "signals."

In short: The math matters because it reduces the number of times the tool teleports you to the wrong place after a change in workflow.

The difference in detail behaivour boils down to: sd does log the individual cd event history, so nows something about _when_ each visit has happened not only how _often_ it has happened plus when the single most recent one did (which to the best of my understanding is what z/zoxide do). So there is more information to utilize for the ranking. And I believe the difference in behaviour is notable and makes a difference.

the rest is matter of taste: sd does allow to alter ranking transiently in running shell if you note that your current task requires shortly to emphasise or de-emphasise recency importance (alter power exponent) or you want smaller or larger attention span (alter window size). I rarely need/do that but it happens.

last not least sd also provides an interactive selection facility to pick any dir currently within attention span (i.e. "on the dir stack"). if you have fzf installed it uses that as UI for this selection, othrwise falls back to simply index-based selection in terminal. Again, I used this not too often, but sometimes one does want that (e.g. if simply not remembering the path but having to read it to remember it...)

cap112353mo ago

Is this not just equivalent to clearing out zoxide's db periodically?

jghubOP3mo ago

Not really, because zoxide's "aging" is a global multiplicative downscale of cumulative scores. It doesn't solve the problem of "Historical Rigidity."

In zoxide, every visit you have ever made contributes to a monolithic frequency score. Because the "aging" is a global multiplier (0.9), it preserves the relative proportions of your history. If "Project A" has 1000 visits from three months ago, its score can remain so high that it still outranks "Project B," which you started this week and visited 50 times. This is why zoxide provides a manual remove command -- users sometimes have to intervene when the "ghosts" of old projects won't stop winning matches. I have never felt the need to do something like that in sd (although the option is there, mostly as a historic artifact).

In sd, the ranking is based on the density of visits within a fixed window (the "Attention Span").

If you haven't visited "Project A" in your last ≈1000 moves, it has exited the window and is ignored for ranking purposes (as long as a pattern match is found elsewhere on the stack -- otherwise fallback logic kicks in and "Project A" is still discoverable). So it doesn't matter if you visited it 1000 times earlier in the year; it no longer occupies your "attention." Conversely, if you've visited "Project B" 50 times in the last two days, those visits occupy a high-weight portion of the power-law curve.

sd essentially uses a sliding window weighted summation approach. It prioritizes what you are doing now without being weighed down by the "debt" of what you were doing months ago. This provides "Zero Maintenance" because the math naturally suppresses old signals as they exit the window, whereas cumulative models might eventually require manual pruning to fix a sluggish ranking.

Leftium3mo ago· 1 in thread

I plan to use frecency in my bookmarking app.

Although you don't have any problems with lag, it is possible to efficiently compute frecency in O(1) complexity

> But with an additional trick, no recomputation is necessary. The trick is to store in the database something with units of date...

Full details: https://wiki.mozilla.org/User:Jesse/NewFrecency#Efficient_co...

jghubOP3mo ago

The Mozilla approach is a clever optimization for large-scale databases (like browser history), but it relies on a specific decay model that can be represented as a point-in-time value I believe.

I chose a different path for sd for two reasons:

Mathematical Fidelity: The power-law convolution of the cd event "time series" with a power law kernel yielding (S = sum 1/(t-ti)^p) provides a more nuanced ranking of "burst" activity vs. "long-term" habits than a single-point frecency score. Calculating this over a window (N=1280) allows for a "steep" decay (this p≈10 default) that handles context-switching more naturally.

The incurred computational overhead is indeed there, but modest: in my tests I see about 9ms for stack recomputation with the default window (N=1280) compared to total real time for the cd action (including pattern matching and executing the cd) of about 22ms (I am on ksh, in bash it is about 30ms). even when increasing that window to 2^14 (8192) the stack recomputation (scoring/ranking) takes only about 20ms (so the basic algorithm is indeed O(N) but the the bottleneck is rather the sub-shell overhead, not the math. So moving to O(1) would save not much (5ms?) I guess.

The "Attention Span" Logic: By using a fixed window of ≈1000 events for the "attention span" (short term memory) and falling back to the full ≈10000-event log (the long term memory) only if no match is found, sd rarely ever does not find a matching dir (and is usually right picking it).

So you are right, doing the O(N) computation does impose a cost but it is indeed modest and negligible for the task at hand (I am quite sensitive to "command lag" I think, but 30, even 60ms is still "prompt" for me).

j / k navigate · click thread line to collapse

16 comments

10 comments · 3 top-level

zahlman3mo ago· 3 in thread

jghubOP3mo ago

To "manage" in this context refers to maintaining a ranked database of your directory history so you can navigate without providing full or relative paths. The difference in practice:

With built-in `cd`:

Navigation is manual and location-dependent. To reach a deeply nested project, you must provide the exact path: `cd ~/src/work/clients/acme/frontend/src/components`

With sd:

Navigation is intent-based. Once `sd` has indexed a directory, you provide a fragment of the name: `sd comp`

How it works differently:

1. Passive Indexing: `sd` remembers all you cd actions (up to configurable limit, typically several 1000) and computes after each cd action a weighted score for each path you visit.

The problem being solved is the cognitive load of remembering and typing long, nested file paths. It replaces a "search and find" task with a "recall" task.

so the core functionality is very similar to things like z/zoxide, but the ranking is using full visit history and detailed behaviour thus is different.

17186274403mo ago

> To reach a deeply nested project, you must provide the exact path: `cd ~/src/work/clients/acme/frontend/src/components`

To me it sounds like a solution looking for a problem. Why should I give up predictability when it won't really save keypresses. Not that keypresses are that expensive anyway.

jghubOP3mo ago

That workflow is perfectly valid -- if you have a stable set of 5-10 projects, CDPATH and manual Tab-completion are hard to beat.

But it’s definitely a matter of taste -- manual curation vs. an automated 'attention span' model.

1 more reply

ekropotin3mo ago· 3 in thread

> While it is conceptually similar to tools like z or zoxide, the underlying ranking model is different.

I mean, cool stuff. But does it really matter from usability perspective?

jghubOP3mo ago

The difference in usability is most apparent during context switching and burst activity.

Stability: Your main work directories remain at the top of the results even if you briefly go "off-track" in other folders.

Zero Maintenance: You rarely have to "purge" or "delete" paths from the database because the math naturally suppresses "noise" while amplifying "signals."

In short: The math matters because it reduces the number of times the tool teleports you to the wrong place after a change in workflow.

cap112353mo ago

Is this not just equivalent to clearing out zoxide's db periodically?

jghubOP3mo ago

Not really, because zoxide's "aging" is a global multiplicative downscale of cumulative scores. It doesn't solve the problem of "Historical Rigidity."

In sd, the ranking is based on the density of visits within a fixed window (the "Attention Span").

Leftium3mo ago· 1 in thread

I plan to use frecency in my bookmarking app.

Although you don't have any problems with lag, it is possible to efficiently compute frecency in O(1) complexity

> But with an additional trick, no recomputation is necessary. The trick is to store in the database something with units of date...

Full details: https://wiki.mozilla.org/User:Jesse/NewFrecency#Efficient_co...

jghubOP3mo ago

The Mozilla approach is a clever optimization for large-scale databases (like browser history), but it relies on a specific decay model that can be represented as a point-in-time value I believe.

I chose a different path for sd for two reasons:

j / k navigate · click thread line to collapse