stat parsing bugs (opens in new tab)

(openwall.com)

88 pointsototot3y ago81 comments

81 comments

In my opinion, the fact that procfs is the only API for so many things is one of the biggest problems with Linux. BSDs have sysctl(), macOS has mach_* functions and, of course, Windows has a real API too.

Plain text interfaces lead to complicated, potentially insecure code (especially in C!), they're prone to race conditions and slow.

I wish it was possible to retrieve that information using real syscalls. I think it's a better approach than, for example, inventing a faster way to read procfs: https://lwn.net/Articles/813827/

xxpor3y ago

Even if they insist on a file based interface (it is a UNIX, so fair enough), in modern times it would be nice if they used a "real" data format. Yeah, it's not like JSON parsers have never had bugs, but on average they'll be MUCH better than everyone and their mother hand rolling a C based bespoke parser. Obviously you'd need a new name to not break backwards compatibility.

rdtsc3y ago

> Yeah, it's not like JSON parsers have never had bugs, but on average they'll be MUCH better than everyone and their mother hand rolling a C based bespoke parser.

Currently, but if this idea started when Linux was become popular the real data format would have been XML. It might have been nice at the time, but today we would have laughed at it and said how outdated and silly it looks probably.

2 more replies

kbrazil3y ago

jc[0] supports proc files. Converts them to JSON or YAML. (I am the author)

[0] https://kellyjonbrazil.github.io/jc/docs/parsers/proc

2 more replies

yuuta3y ago

Indeed. Parsing files is a less robust way compared to calling some APIs or at least parsing some files with a schema (e.g. JSON or XML). For example, uptime(1) on Linux:

% strace uptime 2> /tmp/strace && grep proc /tmp/strace

17:35:24 up 3 days, 7:47, 1 user, load average: 2.29, 1.85, 1.56

openat(AT_FDCWD, "/usr/lib/libprocps.so.8", O_RDONLY|O_CLOEXEC) = 3

openat(AT_FDCWD, "/proc/self/auxv", O_RDONLY) = 3

openat(AT_FDCWD, "/proc/sys/kernel/osrelease", O_RDONLY) = 3

openat(AT_FDCWD, "/proc/self/auxv", O_RDONLY) = 3

openat(AT_FDCWD, "/proc/uptime", O_RDONLY) = 3

openat(AT_FDCWD, "/proc/loadavg", O_RDONLY) = 4

1 more reply

marcodiego3y ago

I love the fact that I can play with the leds of my device without specialized tools. Yes, it is sysfs, not procfs, but it is the same idea.

nijave3y ago

Many of the APIs on Windows are pretty trivial to interact with using PowerShell commandlets. Similarly, many SaaS based tools have CLIs to interact with their arbitrarily complex APIs.

You can still have easy abstractions while providing a way around them for times they don't work well (acquiring structured data)

CamJN3y ago

MacOS’ KERN_PROCARGS2 sysctl is an exception to this, it is very unintuitive to parse and every single piece of code that tries to parse the results that I’ve found on the internet has been wrong, including those from Apple, Google, and Microsoft. I wound up making a library to do it (https://getargv.narzt.cam/) because apparently people need help.

convolvatron3y ago

I just ran into this and its not documented and there are very examples. I will definitely be looking at your library, thank you.

it sounds fishy, but just because sysctl is a mess doesn't necessarily imply that structured kernel interfaces are a bad idea

woodruffw3y ago

Linux does have libproc, which is meant (IIUC) to mirror the BSD-style libproc. It wouldn't surprise me if it's just parsing the same files under the hood, however, and correspondingly has the same bugs. But then again, bugs in one place is potentially a better state of affairs than bugs in many places (?).

hamburglar3y ago

Totally agreed. Any time I’ve found myself parsing proc, I’ve felt like I was doing something foolish and unsafe in lieu of a “real” api.

the84723y ago

Having C functions isn't all that much better. You have replaced a crufty text format with crufty data structures full of paddings, unions, bitfields, VLAs, unaligned nested structs and other crazy stuff. Look at ioctls or cmsg. With C structs + 3rd-party kernel drivers you can even get UB because the driver returns data that is invalid under the struct definition (e.g. incorrect alignment, invalid bools).

touisteur3y ago

I think a proper formal grammar would do the trick, with maybe a canonical implementation...

mdaverde3y ago

This is changing! (or technically, has changed!)

eBPF recently added the ability to look through internal data structures through iterators [0] so instead of parsing text we can run a program that traverses through all the task_structs and pushes the exact information we want to userspace in the form the developer wants.

So, alongside other tradeoffs, it's more flexible than syscalls.

[0] https://developers.facebook.com/blog/post/2022/03/31/bpf-ite...

touisteur3y ago

Netlink is the place to look for some of these info, https://twitter.com/dvyukov/status/1605539242506997765 . Loads and loads of stuff in netlink.

ilyt3y ago

I wish /proc|/sys would just agree on serialization format and just serialize the data into some defined format instead of having a bunch of files that all need their own parser

st_goliath3y ago

While procfs has a lot of historical baggage, sysfs is rather specific about the layout and providing only a single value per file, as plain ASCII, rather than using anything complex that has to be parsed. Structure is implemented via the filesystem.

In return, the kernel side API for sysfs is also a lot cleaner and allows to more-or-less expose individual variables as tuning knobs for a driver.

Of course there are edge cases, and there are e.g. some binary interfaces as well (e.g. for providing direct register access, or implementing a firmware upload interface for a device).

ABI compat issues aside, I think that implementing "a standardized [structured] record format" as suggested in the comments here is a rather bad idea, going into exactly the wrong direction by adding complexity rather than reducing it, which would definitely cause even more parsing related issues in the long run.

ilyt3y ago

>While procfs has a lot of historical baggage, sysfs is rather specific about the layout and providing only a single value per file, as plain ASCII, rather than using anything complex that has to be parsed. Structure is implemented via the filesystem.

I'd rather have structured file than to have open 30k files (for say conntrack)

Hell, just example from the article, /proc/<PID>/stat has 52 parameters. That would be 52 opens and reads with single value per file.

> ABI compat issues aside, I think that implementing "a standardized [structured] record format" as suggested in the comments here is a rather bad idea, going into exactly the wrong direction by adding complexity rather than reducing it, which would definitely cause even more parsing related issues in the long run.

It's literally the opposite. You have to implement it once on kernel side and once in userspace vs every special format that currently needs

3 more replies

eminence323y ago

I've been working on a library[1] that aims to have fairly complete support for the procfs filesystem, so that you can hide away these annoying parsing quirks. But for some casual usage of /proc/ where you only need one tiny bit of information, it's often better to just roll your own parser instead of bringing in a 3rd party library. It's these small one-off cases that would really benefit from a standardized serialization format like you propose.

[1] https://github.com/eminence/procfs

idealmedtech3y ago

It would be great if the kernel itself provided a header only definition of such a format, so you could focus on the data and not the parsing. Would also be able to integrate into their extensive testing infrastructure.

1 more reply

ajross3y ago

FWIW: sysfs tried to do this already. In general each node corresponds to one "thing", with a reasonably standard set of stringification schemes, and with a path that acts as a self-describing schema. Obviously in practice it ends up that every driver or subsystem ends up doing funny nonsense (e.g. uevent nodes have their own sub-schema with shell-style variables, etc...).

You can't really prevent that. People do funny nonsense in other self-describing data formats like JSON and XML all the time too. There's only so much you can do with a framework.

But /proc is... extremely old, and very heavily used by userspace. In practice it's never going to change.

ilyt3y ago

> You can't really prevent that. People do funny nonsense in other self-describing data formats like JSON and XML all the time too. There's only so much you can do with a framework.

Sure but you will get more of that if the convention is too simplistic. "one file per value" breaks really fast, just cat /proc/net/nf_conntrack or even just proc/<pid>/stats and see just how many values single entry (file/connection) has.

Doesn't need to be some ASN.1 monstrosity, could be simple conventions like "this is how key/value proc/sys file should look, this is how tabular file should look etc."

Make all escaping use same syntax, make every table separator be \t etc.

> But /proc is... extremely old, and very heavily used by userspace. In practice it's never going to change.

eh, just mount it in /proc2

saalweachter3y ago

While we're wishing in one hand, how is it that our programs still take an input of an array of strings, that get escaped and unescaped and split randomly by our shell scripts?

ilyt3y ago

That is entirely due to sh/bash and friends being a terrible programming language.

All sensible ones allow you to just pass an array of parameters to command execution and not worry about spaces in them

3 more replies

bradfitz3y ago

Advent of Proc

stefan_3y ago

Well it's too late now. But I thought the plan was for all of that stuff to move to Netlink? Not that that isn't a terrible very horrible API either.

emmelaich3y ago

osquery covers those fwiw, and can produce json.

jcelerier3y ago

we could even name the tool to query such serialized data, procctl, provided by the systemd-proc package

mzs3y ago

> sudo was bitten by this back in the day (CVE-2017-1000367):

> https://www.openwall.com/lists/oss-security/2017/05/30/16

https://www.openwall.com/lists/oss-security/2022/12/22/5

idealmedtech3y ago

> This allows any sudoers user to obtain full root privileges

The way most sudoers files are set up, if you're in the wheel or sudo group, you're only a "sudo -i" from a root command prompt, so I'm not sure I see why this is a vulnerability. Can anyone elaborate?

woodruffw3y ago

The /proc/<pid>/* hierarchy has always been a bit of a mess to parse.

/proc/<pid>/maps is similarly frustrating: there's no clear distinction between "special" maps (like the stack) and a file that might just happen to be named `[stack]`. Similarly, the handling for a mapped region on a deleted file is simply to append " (deleted)"[1].

[1]: https://github.com/woodruffw/procmaps.rs/blob/79bd474104e9b3...

esprehn3y ago

The system level fix is to create a structured record format. That could mean quoting all the records or maybe Linux should finally adopt a standardized format like JSON.

zokier3y ago

Strictly speaking it is structured; the structure is described in the man page and it is machine-parseable

jwilk3y ago

It's not described correctly. The man page says you can parse it with scanf(), which is wrong.

jbverschoor3y ago

Why do you have to parse this kind of stuff at all?

Time to let go of the everything is a stream of unorganized characters

kbrazil3y ago

Fortunately `jc`[0] does parse `/proc/<pid>/stat` correctly. I, of course, originally implemented it the naive/incorrect way until a contributor fixed it. :)

    $ cat /proc/2001/stat | jc --proc
    {"pid":2001,"comm":"my program with\nsp","state":"S","ppid":1888,"pgrp":2001,"session":1888,"tty_nr":34816,"tpg_id":2001,"flags":4202496,"minflt":428,"cminflt":0,"majflt":0,"cmajflt":0,"utime":0,"stime":0,"cutime":0,"cstime":0,"priority":20,"nice":0,"num_threads":1,"itrealvalue":0,"starttime":75513,"vsize":115900416,"rss":297,"rsslim":18446744073709551615,"startcode":4194304,"endcode":5100612,"startstack":140737020052256,"kstkeep":140737020050904,"kstkeip":140096699233308,"signal":0,"blocked":65536,"sigignore":4,"sigcatch":65538,"wchan":18446744072034584486,"nswap":0,"cnswap":0,"exit_signal":17,"processor":0,"rt_priority":0,"policy":0,"delayacct_blkio_ticks":0,"guest_time":0,"cguest_time":0,"start_data":7200240,"end_data":7236240,"start_brk":35389440,"arg_start":140737020057179,"arg_end":140737020057223,"env_start":140737020057223,"env_end":140737020059606,"exit_code":0,"state_pretty":"Sleeping in an interruptible wait"}

[0] https://kellyjonbrazil.github.io/jc/docs/parsers/proc_pid_st...

smasher1643y ago

makes you wonder if it's really that valuable to have all our infrastructure built on parsing text

raldi3y ago

Or to have filesystems that support names with every character but slash and NUL.

capitol_3y ago

I agree, not supporting NUL is really a historical c-ism that we should get rid of.

avar3y ago

I noticed this around a year ago when writing a /proc/paid/stat parser for git (for logging the chain of parent processes).

Here's that commit, it has a comment with an overview of the kernel limits and caveats involved: https://github.com/git/git/commit/2d3491b117c6dd08e431acc390...

jwilk3y ago

> Finally the maximum length of the "comm" name itself is 15 characters

As pointed out in https://news.ycombinator.com/item?id=34098360, and contrary to the proc(5) man page, this assumption is incorrect for kernel threads.

avar3y ago

Interesting. That code will only need to the "stat" files of processes in userspace, so it's correct for its use-case.

But you're right that a more general parser would need to ignore what proc(5) has to say about the limit, and parse up to a limit of 64.

As far as I can tell the difference is because when you call prctl(2) with "PR_SET_NAME" it will get truncated to the "TASK_COMM_LEN" that proc(5) discusses. See this code in kernel/sys.c: https://github.com/torvalds/linux/blob/493ffd6605b2d3d4dc700...

This is the linux.git commit that changed it, before that kernel worker threads had to obey the same limit, it was first released with linux v4.18: https://github.com/torvalds/linux/commit/6b59808bfe482642287...

1 more reply

bigcat123456783y ago

We are pixie.io ran into exact problem, we fixed that by parsing the braces, ugly but seems working

https://github.com/pixie-io/pixie/blob/bd82bb48ef4da7d6b05f2...

idealmedtech3y ago

That's exactly how it should be done! Also subtle but important that you find the last closing parentheses, as an attacker could just include a paren in their process name to terminate your parse early.

horstschneider3y ago

That is how psmisc does it:

https://gitlab.com/psmisc/psmisc/-/blob/master/src/pstree.c#...

qwertox3y ago

These comments here need more visibility.

jwilk3y ago

> if (std::getline(ifs, line)) {

But what if comm contains newlines?

jwilk3y ago

Reported this, and a few more parsing bugs:

https://github.com/pixie-io/pixie/issues/678

cryptonector3y ago

The process name should have been last. Now parsers have to split on space and then take the first token and the last N-2 tokens to leave behind the tokens that make up the second field, then join those with spaces to reconstruct the second field (or use the length of the first and the offset of the third fields to re-parse the second).

tatref3y ago

If you do this, then you can't add new fields

cryptonector3y ago

Correct. I guess you could split on parens instead.

inetknght3y ago

It's almost as if there should be an API for procfs instead of having everyone write their own reader and parser...

YesThatTom23y ago

If there is exactly one field with the “may contain spaces” problem there’s a better solution: parse the line forwards for the fields up to that one, parse the line backwards for the remaining.

jwilk3y ago

How is that better than looking for the last ")" character?

Besides, it wouldn't work, because you don't know in advance how many fields are there.

mort963y ago

I've been bitten by and tried to work around this as well. From what I can tell, the best you can really do is to parse by matching up parens, but someone could totally make a program with braces in its name. If I make a binary called "foo) R 10 20 30", the /proc/<pid>/stat entry will contain "1715376 (foo) R 10 20 30) 1544883 1715376 1544883...". It's terribly non-obvious how to deal with correctly.

st_goliath3y ago

> It's terribly non-obvious how to deal with correctly.

Like the post says: read the whole thing into memory and do a reverse search for the last ')', i.e. strrchr

Once you are aware of the problem, it's obvious how to solve it, but I do agree that the hidden danger here is not immediately obvious at first.

zokier3y ago

Note that man page says that the name is truncated to 16 chars, so if for whatever reason you don't want to do unbounded length read then you can use that

1 more reply

graymatters3y ago

Aside from bashing a paradigm one is not used to/doesn’t like/didn’t grow up with, what are the real cases where dealing with the textual output of procfs creates serious realistic performance issues? The argument about insecurity of a hand rolled C parser for that is utterly unconvincing.

j / k navigate · click thread line to collapse

81 comments

xeeeeeeeeeeenu3y ago

Plain text interfaces lead to complicated, potentially insecure code (especially in C!), they're prone to race conditions and slow.

I wish it was possible to retrieve that information using real syscalls. I think it's a better approach than, for example, inventing a faster way to read procfs: https://lwn.net/Articles/813827/

xxpor3y ago

rdtsc3y ago

> Yeah, it's not like JSON parsers have never had bugs, but on average they'll be MUCH better than everyone and their mother hand rolling a C based bespoke parser.

2 more replies

kbrazil3y ago

jc[0] supports proc files. Converts them to JSON or YAML. (I am the author)

[0] https://kellyjonbrazil.github.io/jc/docs/parsers/proc

2 more replies

yuuta3y ago

Indeed. Parsing files is a less robust way compared to calling some APIs or at least parsing some files with a schema (e.g. JSON or XML). For example, uptime(1) on Linux:

% strace uptime 2> /tmp/strace && grep proc /tmp/strace

17:35:24 up 3 days, 7:47, 1 user, load average: 2.29, 1.85, 1.56

openat(AT_FDCWD, "/usr/lib/libprocps.so.8", O_RDONLY|O_CLOEXEC) = 3

openat(AT_FDCWD, "/proc/self/auxv", O_RDONLY) = 3

openat(AT_FDCWD, "/proc/sys/kernel/osrelease", O_RDONLY) = 3

openat(AT_FDCWD, "/proc/self/auxv", O_RDONLY) = 3

openat(AT_FDCWD, "/proc/uptime", O_RDONLY) = 3

openat(AT_FDCWD, "/proc/loadavg", O_RDONLY) = 4

1 more reply

marcodiego3y ago

I love the fact that I can play with the leds of my device without specialized tools. Yes, it is sysfs, not procfs, but it is the same idea.

nijave3y ago

Many of the APIs on Windows are pretty trivial to interact with using PowerShell commandlets. Similarly, many SaaS based tools have CLIs to interact with their arbitrarily complex APIs.

You can still have easy abstractions while providing a way around them for times they don't work well (acquiring structured data)

CamJN3y ago

convolvatron3y ago

I just ran into this and its not documented and there are very examples. I will definitely be looking at your library, thank you.

it sounds fishy, but just because sysctl is a mess doesn't necessarily imply that structured kernel interfaces are a bad idea

woodruffw3y ago

hamburglar3y ago

Totally agreed. Any time I’ve found myself parsing proc, I’ve felt like I was doing something foolish and unsafe in lieu of a “real” api.

the84723y ago

touisteur3y ago

I think a proper formal grammar would do the trick, with maybe a canonical implementation...

mdaverde3y ago

This is changing! (or technically, has changed!)

So, alongside other tradeoffs, it's more flexible than syscalls.

[0] https://developers.facebook.com/blog/post/2022/03/31/bpf-ite...

touisteur3y ago

Netlink is the place to look for some of these info, https://twitter.com/dvyukov/status/1605539242506997765 . Loads and loads of stuff in netlink.

ilyt3y ago

I wish /proc|/sys would just agree on serialization format and just serialize the data into some defined format instead of having a bunch of files that all need their own parser

st_goliath3y ago

In return, the kernel side API for sysfs is also a lot cleaner and allows to more-or-less expose individual variables as tuning knobs for a driver.

Of course there are edge cases, and there are e.g. some binary interfaces as well (e.g. for providing direct register access, or implementing a firmware upload interface for a device).

ilyt3y ago

I'd rather have structured file than to have open 30k files (for say conntrack)

Hell, just example from the article, /proc/<PID>/stat has 52 parameters. That would be 52 opens and reads with single value per file.

It's literally the opposite. You have to implement it once on kernel side and once in userspace vs every special format that currently needs

3 more replies

eminence323y ago

[1] https://github.com/eminence/procfs

idealmedtech3y ago

1 more reply

ajross3y ago

You can't really prevent that. People do funny nonsense in other self-describing data formats like JSON and XML all the time too. There's only so much you can do with a framework.

But /proc is... extremely old, and very heavily used by userspace. In practice it's never going to change.

ilyt3y ago

> You can't really prevent that. People do funny nonsense in other self-describing data formats like JSON and XML all the time too. There's only so much you can do with a framework.

Doesn't need to be some ASN.1 monstrosity, could be simple conventions like "this is how key/value proc/sys file should look, this is how tabular file should look etc."

Make all escaping use same syntax, make every table separator be \t etc.

> But /proc is... extremely old, and very heavily used by userspace. In practice it's never going to change.

eh, just mount it in /proc2

saalweachter3y ago

While we're wishing in one hand, how is it that our programs still take an input of an array of strings, that get escaped and unescaped and split randomly by our shell scripts?

ilyt3y ago

That is entirely due to sh/bash and friends being a terrible programming language.

All sensible ones allow you to just pass an array of parameters to command execution and not worry about spaces in them

3 more replies

bradfitz3y ago

Advent of Proc

stefan_3y ago

Well it's too late now. But I thought the plan was for all of that stuff to move to Netlink? Not that that isn't a terrible very horrible API either.

emmelaich3y ago

osquery covers those fwiw, and can produce json.

jcelerier3y ago

we could even name the tool to query such serialized data, procctl, provided by the systemd-proc package

mzs3y ago

> sudo was bitten by this back in the day (CVE-2017-1000367):

> https://www.openwall.com/lists/oss-security/2017/05/30/16

https://www.openwall.com/lists/oss-security/2022/12/22/5

idealmedtech3y ago

> This allows any sudoers user to obtain full root privileges

woodruffw3y ago

The /proc/<pid>/* hierarchy has always been a bit of a mess to parse.

[1]: https://github.com/woodruffw/procmaps.rs/blob/79bd474104e9b3...

esprehn3y ago

The system level fix is to create a structured record format. That could mean quoting all the records or maybe Linux should finally adopt a standardized format like JSON.

zokier3y ago

Strictly speaking it is structured; the structure is described in the man page and it is machine-parseable

jwilk3y ago

It's not described correctly. The man page says you can parse it with scanf(), which is wrong.

jbverschoor3y ago

Why do you have to parse this kind of stuff at all?

Time to let go of the everything is a stream of unorganized characters

kbrazil3y ago

Fortunately `jc`[0] does parse `/proc/<pid>/stat` correctly. I, of course, originally implemented it the naive/incorrect way until a contributor fixed it. :)

    $ cat /proc/2001/stat | jc --proc
    {"pid":2001,"comm":"my program with\nsp","state":"S","ppid":1888,"pgrp":2001,"session":1888,"tty_nr":34816,"tpg_id":2001,"flags":4202496,"minflt":428,"cminflt":0,"majflt":0,"cmajflt":0,"utime":0,"stime":0,"cutime":0,"cstime":0,"priority":20,"nice":0,"num_threads":1,"itrealvalue":0,"starttime":75513,"vsize":115900416,"rss":297,"rsslim":18446744073709551615,"startcode":4194304,"endcode":5100612,"startstack":140737020052256,"kstkeep":140737020050904,"kstkeip":140096699233308,"signal":0,"blocked":65536,"sigignore":4,"sigcatch":65538,"wchan":18446744072034584486,"nswap":0,"cnswap":0,"exit_signal":17,"processor":0,"rt_priority":0,"policy":0,"delayacct_blkio_ticks":0,"guest_time":0,"cguest_time":0,"start_data":7200240,"end_data":7236240,"start_brk":35389440,"arg_start":140737020057179,"arg_end":140737020057223,"env_start":140737020057223,"env_end":140737020059606,"exit_code":0,"state_pretty":"Sleeping in an interruptible wait"}

[0] https://kellyjonbrazil.github.io/jc/docs/parsers/proc_pid_st...

smasher1643y ago

makes you wonder if it's really that valuable to have all our infrastructure built on parsing text

raldi3y ago

Or to have filesystems that support names with every character but slash and NUL.

capitol_3y ago

I agree, not supporting NUL is really a historical c-ism that we should get rid of.

avar3y ago

I noticed this around a year ago when writing a /proc/paid/stat parser for git (for logging the chain of parent processes).

Here's that commit, it has a comment with an overview of the kernel limits and caveats involved: https://github.com/git/git/commit/2d3491b117c6dd08e431acc390...

jwilk3y ago

> Finally the maximum length of the "comm" name itself is 15 characters

As pointed out in https://news.ycombinator.com/item?id=34098360, and contrary to the proc(5) man page, this assumption is incorrect for kernel threads.

avar3y ago

Interesting. That code will only need to the "stat" files of processes in userspace, so it's correct for its use-case.

But you're right that a more general parser would need to ignore what proc(5) has to say about the limit, and parse up to a limit of 64.

1 more reply

bigcat123456783y ago

We are pixie.io ran into exact problem, we fixed that by parsing the braces, ugly but seems working

https://github.com/pixie-io/pixie/blob/bd82bb48ef4da7d6b05f2...

idealmedtech3y ago

horstschneider3y ago

That is how psmisc does it:

https://gitlab.com/psmisc/psmisc/-/blob/master/src/pstree.c#...

qwertox3y ago

These comments here need more visibility.

jwilk3y ago

> if (std::getline(ifs, line)) {

But what if comm contains newlines?

jwilk3y ago

Reported this, and a few more parsing bugs:

https://github.com/pixie-io/pixie/issues/678

cryptonector3y ago

tatref3y ago

If you do this, then you can't add new fields

cryptonector3y ago

Correct. I guess you could split on parens instead.

inetknght3y ago

It's almost as if there should be an API for procfs instead of having everyone write their own reader and parser...

YesThatTom23y ago

If there is exactly one field with the “may contain spaces” problem there’s a better solution: parse the line forwards for the fields up to that one, parse the line backwards for the remaining.

jwilk3y ago

How is that better than looking for the last ")" character?

Besides, it wouldn't work, because you don't know in advance how many fields are there.

mort963y ago

st_goliath3y ago

> It's terribly non-obvious how to deal with correctly.

Like the post says: read the whole thing into memory and do a reverse search for the last ')', i.e. strrchr

Once you are aware of the problem, it's obvious how to solve it, but I do agree that the hidden danger here is not immediately obvious at first.

zokier3y ago

Note that man page says that the name is truncated to 16 chars, so if for whatever reason you don't want to do unbounded length read then you can use that

1 more reply

graymatters3y ago

j / k navigate · click thread line to collapse