Segmenting comic book frames (opens in new tab)

(vrroom.github.io)

198 pointsmatroid2y ago47 comments

47 comments

Next AI challenge: try to infer the intended panel reading sequence, and the flow of speech bubbles/narrative.

Would potentially be a useful augmentation to a digital comic book reader, refocusing from panel to panel in sequence. Not to mention making comic book content more accessible.

famouswaffles2y ago

https://github.com/ragavsachdeva/magi

matroidOP2y ago

This is such a good resource! Thank you!

yjftsjthsd-h2y ago

Is that solvable? It's my personal experience that humans can't reliably do that; I could imagine a machine doing better, but I'm not sure what information it would be working with.

Sardtok2y ago

I feel the problem is you often read some panels in the wrong order, but when you read them you realize the order is wrong after the fact, based on the text. The AI could probably use the text as additional information for determining the order.

Would probably never be perfect, but if it gets it right in all but a few outlying cases, that should be good enough.

Cthulhu_2y ago

Counterpoint; isn't reading and re-reading the page after figuring out the intended order part of the experience?

Take video games. From Software games tell stories through breadcrumbs; locations, speech lines, item descriptions, and only over time can you start to connect the dots.

Sure, you can watch a youtuber connect the dots for you, show you what you've missed, and make you aware of your own mental limitations. But it's not the same.

1 more reply

TheAceOfHearts2y ago

Crunchyroll used to offer a guided reading experience for some of their manga.

You could probably build a tool that tags each panel and attempts to figure out the order, and then have a human editor do a validation pass. If you have enough people reading a series you can probably crowdsource the panel sequence.

astrange2y ago

Kindle just added one. I find it annoying because I can't remember how to edit the single-panel mode once you get into it.

Loranubi2y ago

Files in ACBF format include panel metadata. So there should be lots of training data.

- https://launchpad.net/acbf

makeitdouble2y ago

That sounds similar to having an AI explain where you should be looking in a painting, or where to pay attention in a movie. This might be genuinely useful as an accessibility feature, but I'd also see strong sentiment against it, potentially from the creators of the art.

doesnt_know2y ago

I'm more of a manga reader then a western comic one, but why would there be sentiment against this? In what way is the reading order of panels up to interpretation?

makeitdouble2y ago

Focusing manga, the shoujo space for instance has a reputation for using out of frame character positionning and not shying away from composing pages with a visual flow that doesn't follow the actual character speech.

It would be complex to pin a given panel order as canonical when the author is playing games with the reader. I don't think authors would oppose an accesibility feature, but I could see the debate if it was a more prominent, sanctionned way of reading.

dylan6042y ago

You've now focused too much on manga/comics. Stepping back a bit and looking at art more in general and using the provided example of looking at a painting, who are you to tell me where I should be looking. yeah yeah, i see the obvious thing your AI is trying to tell me where to look, but I'm looking at this less obvious thing that really strikes my fancy. maybe it's a blemish. maybe it's a unique brush stroke/technique that others might not care about, but an aspiring artist might. (even if it is another stupid AI.)

2 more replies

daemonologist2y ago

I think aside from accessibility the main use case is for e-ink devices where the resolution isn't high enough to comfortably display the entire page at one and the refresh rate isn't high enough to pan/zoom quickly. With those limitations automatically "paging" through the panels probably makes for a much better reading experience (I haven't tried it myself).

theresistor2y ago

Guided Viewed was/is a thing in Comixology / Kindle comics.

gryn2y ago

A lot of youtube AI manga recap channels have popped up in the last year. I assumed they have some open source tool that does the panel segmentation for them.

karaterobot2y ago

I always thought a fun side gig (or even volunteer opportunity) would be defining panel areas for digital comics. Both because I'd get to read a lot of comics, and because a lot of comics I read are frustratingly bad at it, and it makes it much harder to enjoy them. Well, there goes AI taking our jerbs.

matroidOP2y ago

The author here. I would just like to say that this project is definitely work-in-progress and the AI elements often fail miserably.

As amazing as recent AI progress has been, we do overrate it a lot (I'm including myself in that).

daemonologist2y ago

You could label a bunch for OP and contribute to your own demise :D (It sounds like there isn't a large existing dataset since the author had to label P&C for testing and generate data for training.)

awdii2y ago

Awesome stuff! We're also working on comic segmentation @ https://toona.io and other stuff for motion comic generation. The synthetic dataset approaches are really interesting, I'm curious if you could use an algorithm like https://en.wikipedia.org/wiki/Flood_fill to aid in segmentation (especially for manga).

matroidOP2y ago

The original blog post by Max Halford (https://maxhalford.github.io/blog/comic-book-panel-segmentat...) does exactly that. I love his approach because, unlike mine, it is simple, yet it goes a long way. I'd encourage you to check it out.

Can you explain what you mean by motion comic generation? Sounds interesting!

awdii2y ago

Oh wow I'll check that out! We're basically working on tools for turning normal comics into semi-animated versions with things like puppet animation (similar to live 2d style), and stable diffusion models. A lot of the work for traditional approaches is segmentation and in painting, so that's what we're working on now, but we're also making tools for colorization, effects, and controllable animations!

matroidOP2y ago

All that sounds definitely a lot of fun. I was also working on colorization with SD ControlNet recently (https://vrroom.github.io/blog/2024/02/16/interactive-colorin...).

bigbillheck2y ago

This is, as they say, an insult to life itself.

liampulles2y ago

There are some graphic novels where segments are a very loose concept.

Cerebus's Reads volume is basically a book with illustrations, for example.

Wonder what SAM would do in such cases...

LegitShady2y ago

Segmenting panels is a graphic design element of storytelling that's part of the artist's job. Doing it programmatically ignores its actual nature in storytelling - establishing the amount and direction of the story beats, helping the reader understand the right reading order, and establishing the important elements on the page (Which beat is the most important, etc).

It's an interesting tech but giving such an important creative job to a computer instead of an artist is a bad idea for any comic artist who cares about their work.

probably_wrong2y ago

I half agree with you. Yes, the panel segmentation can be an element of the storytelling, but the truth is that lots of comics out there are not exploiting this ability, sticking to a square grid instead.

Yes, Bill Waterson used his fame to get out of the standard grid [1] but in a world where people read comics on their phones this technology is necessary. And if this stuff helps comic artists reach more readers, so be it. We can always hope that making simple tasks easy today will encourage artists to try harder things tomorrow.

[1] https://www.leaderonomics.com/articles/personal/bill-watters...

LegitShady2y ago

> I half agree with you. Yes, the panel segmentation can be an element of the storytelling, but the truth is that lots of comics out there are not exploiting this ability, sticking to a square grid instead.

It is an element of storytelling. If the artist isn't doing it, its because they're doing a bad job for storytelling, or publishing in a standard format where the beats are always the same (like saturday morning comics in the newspaper, if those still exist).

austin_atchley2y ago

Sure, but this is an article about detecting the edges of the panels

0cf8612b2e1e2y ago

I have been wanting to do this exact thing! Super excited to look in to this later.

tehnub2y ago

There’s a high quality, free comic reader app that doesn’t collect any data that implements this feature quite nicely: https://apps.apple.com/us/app/smart-comic-reader/id151117521...

hanlec2y ago

I haven't Smart Comic before. Thanks for the recommendation. Definitely going to give it a try. (I admit that panel handling is a nice to have feature, but what makes or breaks a comic reader for me is how easy is to sync comics---but this is a long departure from the thread).

YAC Reader [1] has great panel recognition. My other favorite comic reader has attempted but never got something that I could use.

[1]: https://www.yacreader.com/

thih92y ago

> free comic reader app

Your link points to a paid app store entry. Is this the correct app?

tehnub2y ago

Oh I didn't realize, sorry about that. I think it's free on iOS, or it was at least when I downloaded it a few years ago.

Michelangelo112y ago

> ... it is often easier to see how to improve the dataset than to design new heuristics. Once you do that, you almost have a guarantee that the Neural Network machinery will get you the results.

Money quote. Applicable to so many areas of ML/AI.

runamuck2y ago

Next up, AI that adds feet to Rob Liefeld's 90's artwork. :-) (Note: I still love his work)

inamberclad2y ago

I'd love to see an algorithm try and segment this: https://dresdencodak.com/2009/07/12/fabulous-prizes/

matroidOP2y ago

Wow! Thanks for sharing this.

inamberclad2y ago

One of my favorite series from back in the day!

Solvency2y ago

on the topic of AI and comic books, since ChatGPT was trained on Wikipedia and thousands of other properties with complete records of comic book lore, why does it get so many relatively basic comic book questions completely wrong? For example, I've asked several times to Jack GPT how did Psylocke temporarily gain the ability to move through shadows in the past? It was a side effect of drinking the Crimson Dawn elixir, which saved her life after she was nearly killed by Sabertooth. All of this information is readily available online. But ChatGPT completely makes up hallucinated explanations for this every time I ask it. Why is that?

crq-yml2y ago

Although comic book lore has a canon defined by intellectual property, the topics they explore tend to be interchangable fictions. That is: there are a lot of characters that have superpowers, characters who drink magic potions or elixirs, and characters who use these things to avoid death. Sometimes these things are defining backstories, other times they are alternate continuities, non-canon one-offs or fanfic posted on Reddit. Because GPT is inferring a "plausible next guess" for any particular piece of knowledge, the likelihood that it understands the specific causal relationships and their valuation is very low: Psylocke is a type of comic book character, therefore the ability is because of <a type of comic book event>, not <the specific event in issue such and such>.

GPT does similarly poorly if you ask it historical questions like "who were the most influential art educators of the 19th century?" It will respond with a jumble of people from different eras and books that those people did not write.

famouswaffles2y ago

Is this 3.5 or 4?

At any rate, if it keeps hallucinating answers then it means it simply doesn't know. Either it wasn't a part of the dataset or it wasn't mentioned often enough to be memorised.

Legend24402y ago

I just tried it and it says she "...gained the ability to move through shadows due to her interaction with the Crimson Dawn. This storyline occurred when she was mortally wounded, and her allies sought the help of the Crimson Dawn to save her life."

j / k navigate · click thread line to collapse

47 comments

jameshart2y ago

Next AI challenge: try to infer the intended panel reading sequence, and the flow of speech bubbles/narrative.

Would potentially be a useful augmentation to a digital comic book reader, refocusing from panel to panel in sequence. Not to mention making comic book content more accessible.

famouswaffles2y ago

https://github.com/ragavsachdeva/magi

matroidOP2y ago

This is such a good resource! Thank you!

yjftsjthsd-h2y ago

Is that solvable? It's my personal experience that humans can't reliably do that; I could imagine a machine doing better, but I'm not sure what information it would be working with.

Sardtok2y ago

Would probably never be perfect, but if it gets it right in all but a few outlying cases, that should be good enough.

Cthulhu_2y ago

Counterpoint; isn't reading and re-reading the page after figuring out the intended order part of the experience?

Take video games. From Software games tell stories through breadcrumbs; locations, speech lines, item descriptions, and only over time can you start to connect the dots.

Sure, you can watch a youtuber connect the dots for you, show you what you've missed, and make you aware of your own mental limitations. But it's not the same.

1 more reply

TheAceOfHearts2y ago

Crunchyroll used to offer a guided reading experience for some of their manga.

astrange2y ago

Kindle just added one. I find it annoying because I can't remember how to edit the single-panel mode once you get into it.

Loranubi2y ago

Files in ACBF format include panel metadata. So there should be lots of training data.

- https://launchpad.net/acbf

makeitdouble2y ago

doesnt_know2y ago

I'm more of a manga reader then a western comic one, but why would there be sentiment against this? In what way is the reading order of panels up to interpretation?

makeitdouble2y ago

dylan6042y ago

2 more replies

daemonologist2y ago

theresistor2y ago

Guided Viewed was/is a thing in Comixology / Kindle comics.

gryn2y ago

A lot of youtube AI manga recap channels have popped up in the last year. I assumed they have some open source tool that does the panel segmentation for them.

karaterobot2y ago

matroidOP2y ago

The author here. I would just like to say that this project is definitely work-in-progress and the AI elements often fail miserably.

As amazing as recent AI progress has been, we do overrate it a lot (I'm including myself in that).

daemonologist2y ago

You could label a bunch for OP and contribute to your own demise :D (It sounds like there isn't a large existing dataset since the author had to label P&C for testing and generate data for training.)

awdii2y ago

matroidOP2y ago

Can you explain what you mean by motion comic generation? Sounds interesting!

awdii2y ago

matroidOP2y ago

All that sounds definitely a lot of fun. I was also working on colorization with SD ControlNet recently (https://vrroom.github.io/blog/2024/02/16/interactive-colorin...).

bigbillheck2y ago

This is, as they say, an insult to life itself.

liampulles2y ago

There are some graphic novels where segments are a very loose concept.

Cerebus's Reads volume is basically a book with illustrations, for example.

Wonder what SAM would do in such cases...

LegitShady2y ago

It's an interesting tech but giving such an important creative job to a computer instead of an artist is a bad idea for any comic artist who cares about their work.

probably_wrong2y ago

[1] https://www.leaderonomics.com/articles/personal/bill-watters...

LegitShady2y ago

austin_atchley2y ago

Sure, but this is an article about detecting the edges of the panels

0cf8612b2e1e2y ago

I have been wanting to do this exact thing! Super excited to look in to this later.

tehnub2y ago

There’s a high quality, free comic reader app that doesn’t collect any data that implements this feature quite nicely: https://apps.apple.com/us/app/smart-comic-reader/id151117521...

hanlec2y ago

YAC Reader [1] has great panel recognition. My other favorite comic reader has attempted but never got something that I could use.

[1]: https://www.yacreader.com/

thih92y ago

> free comic reader app

Your link points to a paid app store entry. Is this the correct app?

tehnub2y ago

Oh I didn't realize, sorry about that. I think it's free on iOS, or it was at least when I downloaded it a few years ago.

Michelangelo112y ago

> ... it is often easier to see how to improve the dataset than to design new heuristics. Once you do that, you almost have a guarantee that the Neural Network machinery will get you the results.

Money quote. Applicable to so many areas of ML/AI.

runamuck2y ago

Next up, AI that adds feet to Rob Liefeld's 90's artwork. :-) (Note: I still love his work)

inamberclad2y ago

I'd love to see an algorithm try and segment this: https://dresdencodak.com/2009/07/12/fabulous-prizes/

matroidOP2y ago

Wow! Thanks for sharing this.

inamberclad2y ago

One of my favorite series from back in the day!

Solvency2y ago

crq-yml2y ago

famouswaffles2y ago

Is this 3.5 or 4?

At any rate, if it keeps hallucinating answers then it means it simply doesn't know. Either it wasn't a part of the dataset or it wasn't mentioned often enough to be memorised.

Legend24402y ago

j / k navigate · click thread line to collapse