Show HN: I stripped DALL·E Mini to its bare essentials and converted it to Torch (opens in new tab)

https://openai.com/dall-e-2/

shakna3y ago

Only in the same manner that GPT-3 eliminates the need for writers. Or influencers remove the need for advertising.

That is, a surface-level view might show these things as equivalent, but the skills required to produce a decent result are not encapsulated in the averages that models contain.

mola3y ago

I'm sure a lot of "content writers" for SEO spam will become obsolete. The content level is already rock bottom, so is easily replaced by brainless machines.

But I'm more bothered by sociatal effects where art is automated. I believe it'll expedite the effects we saw when the internet short circuited the feedback loop for creators, killing any gaps where non revenue optimizing humane creative force could thrive. Not to mention the crazy mimetic positive feedback loops tearing the discourse apart.

kortex3y ago

I dunno. I've read a lot of GPT output. It lacks a certain consistency over medium scales. The big picture checks out, and the word-by-word grammar checks out, but the sentence-by-sentence information often isn't cohesive, or certain entity references change over time.

Text-to-image algos did the same thing for a while, but you look at the latest full-size DALL-E and it's pretty much flawless.

If I were considering art school, I'd certainly be reconsidering my options. Maybe there are some defects in the output, but nothing photoshop can't fix.

I think where humans win out (for now) is where a high degree of specificity/precision is needed (e.g. graphic design). Or certain legal requirements are present - AI art can't be copyrighted at this time - such as logo design.

marban3y ago

Taste and empathy are tough to emulate.

_flux3y ago

Dall-e2 sometimes pulls through big time, though: https://www.reddit.com/r/dalle2/comments/vbtqkw/dalle_really...

But it's not going to tell you in clear words if your prompt was bad to begin with, like a human would, hopefully :).

https://softology.pro/tutorials/tensorflow/tensorflow.htm

mrtranscendence3y ago

I really don’t think that this will be the case anytime soon. Images can be generated from zany prompts, but making a coherent, fits-together-well set of images for a product like a web page or an illustrated book is far off.

Further, artists have a host of skills that DALL-E doesn’t, like “take that image, but change the colors a bit to make it more acceptable to the client, and move the cartoon bird a little further down”. Or “make an image that will look as good in a print as it does on a small screen”.

dbddv013y ago

"an illustrated book is far off". Hi, just to mention that i'm using mini DALL-E for graphic novel experiments... Indeed not really a human quality but ... https://twitter.com/Dbddv01

andybak3y ago· 8 in thread

If people just want to run text to image models locally by far the easiest way I've found on Windows is to install Visions of Chaos.

It was originally a fractal image generation app but it's expanded over time and now has a fairly foolproof installer for all the models you're likely to have heard of (those that have been released anyway).

songeater3y ago

also, https://pollinations.ai/

Needs a google sub to run colab (DALL-E itself needs Colab Pro, but other models run on free version).

edit: not local! but very handy.

sAbakumoff3y ago

I can't make any of the image generation worked - they all fail with the error message. Did you have luck with any of them?

ccbccccbbcccbb3y ago

not really surprising, but

bare minimum GPU: NVIDIA 2080 with 8GB VRAM

300 Gb of disk space

</caveat emptor>

leereeves3y ago

It should probably have an option to download just one model instead of all of them. 300GB is a lot.

I thought they don't use the gpu for inference?

where did you get these? couldnt find it, mightve missed smth

https://youtu.be/4_LgrAL7EWg?t=163

leereeves3y ago

Thank you, I've been looking for something like that and it looks very cool, judging by this tutorial that shows it in action, as it creates an image and displays the results in progress:

jquaint3y ago

Visions of Chaos is amazing.

Its great if you want to run more "classic" AI algorithms as well!

kertoip_13y ago· 6 in thread

Does it just download pre-trained DALL-E Mini models and generate images using them? Because I can't seem to find any logic in that repo other than that. I'm not into that field, just curious if I'm missing something.

joshvm3y ago

To add to the sibling comment. The challenge is not converting the weights as such. Pre-trained model weights are just arrays of numbers with labels that identify which layer/operation they correspond to in the source model. The challenge is expressing the model in code identically between two frameworks and programmatically loading the original weights in, since these models can have hundreds of individual ops. Hence why you can't just load a PyTorch model in Tensorflow or vice versa.

There are tools to convert to intermediate formats, like ONNX, but they are limited and don't work all the time. The automatic conversion tools usually assume that you can trace execution of the model for a dummy input and usually only work well if there isn't any complex logic (e.g. conditions can be problematic). Some operations aren't supported well, etc.

This isn't always technically difficult, but it's tedious because it usually involves double checking that at all steps, the model produces identical outputs for a given input. An additional challenge when transferring weights is that models are fragile and minor differences might have large effects on the predictions (even though if you trained from scratch, you might get similar results).

Also for deployment, the less cruft in the repository the better. A lot of research repositories end up pulling in all kinds of crazy dependencies to do evaluation, multiple big frameworks etc.

mFixman3y ago

I don't understand why execution of a model with the same layers and weights would be different between PyTorch and Tensorflow.

Is it a problem of accumulation of floating-point errors in operations that are done in a different order and with different kinds of arithmetic optimisations (so that they would be identical if they used un-optimised symbolic operations), or is there something else in the implementation of a neural network that I'm missing?

nonameiguess3y ago

Seems like it'll be a serious issue for people hoping we can someday upload human brains into machines if we can't even transfer models from TensorFlow to PyTorch reliably.

ShamelessC3y ago

They converted the original JAX weights to the format that Pytorch uses. Because JAX is still fairly new, it can be a lot easier to get Pytorch to run on e.g. CPU. I do find the number of upvotes interesting and I imagine many people just upvote things that have DALLE in the title, to a degree.

Not to discourage the OP of course, great work.

OJFord3y ago

Look how much easier it is to install & run, people are interested in and up-voting the result, not how much work was (or wasn't) required to achieve it.

avhon13y ago

It still seems to require JAX somewhere to work.

On my desktop, running the example

> python image_from_text.py --text='alien life' --seed=7

results in

> RuntimeError: This version of jaxlib was built using AVX instructions, which your CPU and/or operating system do not support. You may be able work around this issue by building jaxlib from source.

Unfortunately, following the instructions to build JAXlib from source (https://jax.readthedocs.io/en/latest/developer.html#building...) result in several 404 not found errors, which later cause the build to stop when it tries to do something with the non-existent files.

Unfortunately, it looks like I won't be running this today.

mikewarot3y ago· 5 in thread

What is wandb.ai, and Why does it keep asking for an API key?

It's not listed in the requirements

I've posted it as an issue

slewis3y ago

Founder of Weights & Biases here (wandb). There are a couple issues raised in this thread: api key shouldn’t be required to download a public model, cache in home directory is annoying for this case. We will fix them.

kuprelOP3y ago

Thanks for the tip! I just updated the colab to login anonymously

nl3y ago

> What is wandb.ai

Weight & Biases

> Why does it keep asking for an API key

From the README:

the Weight & Biases python package is used to download the DALL·E Mini and DALL·E Mega transformer models

It might not be obvious you need an account if you aren't in the field though.

Aardwolf3y ago

Why is this needed to download the model?

I'd prefer to download it myself and choose where I put it too.

It now uses some hashed filename in some config directory in your homedir for this, I dislike this and want control over where I put models, make it more self contained instead of random directories spread all over your OS, and give them as input by file path.

This feedback is about dalle mini playground instead but it does the same thing. If this one is stripped to bare essentials I'd expect this type of dependencies stripped too.

Edit: I don't want to seem like complaining too much though and am very happy with these open models and tooling for them. Thanks!

h0mie3y ago

Saas dashboard for monitoring/mertics

langitbiru3y ago· 4 in thread

I guess we will have text-to-image startups in the next batch of YC.

upupandup3y ago

Pornhub is working on something I hear

harpersealtako3y ago

It's only a matter of time before we get a good NSFW image generation algorithm. Text erotica generation is already a not-insignificant part of the public AI world (remember AI dungeon? Most people used it for porn). The question is whether it's going to come out of the established adult industry or not. There are clear benefits (no real humans needed anymore, personalized fetish material forever, etc.), the only issue is whether they're willing to deal with the inevitable bad press (e.g. fake images of real people, taboos like underage content or disturbing imagery). I wouldn't be surprised if any models from the adult industry will basically be heavily gimped from the start to avoid liability.

nudpiedo3y ago

wandb forbids reusing the models and other information, independently of their usage, so they should find another source for their models

EDIT: as I am being accused of inventing it I will quote the terms of agreement and license, since maybe its own founder seems to not have read it or someone without training on how to write proper terms and agreements made it for them and the restrictive usage of "Material" does apply to its hosted software.

Note that there is no formal definition of "Materials" or "Service", so that it applies to all the contents of the webpage including the software stored there: https://wandb.ai/site/terms

I quote it:

2. Use License Whether you are accessing the Services for personal, non-commercial transitory viewing only (our free license for individuals only), for academic use, or for commercial purposes (our subscription package for businesses), permission is granted to temporarily download one copy of the information or software (the “Materials”) from our website. This is the grant of a license, not a transfer of title, and under this license, you may not: a. Modify or copy the Materials; b. Use the Materials for any commercial purpose, or for any public display (commercial or non-commercial); c. Attempt to decompile or reverse engineer any software contained in the Materials; d. Remove any copyright or other proprietary notations from the Materials; or e. Transfer the Materials to another person or "mirror" the Materials on any other server. This license shall automatically terminate if you violate any of these restrictions and may be terminated by us at any time. Upon terminating your viewing of these materials or upon the termination of this license, you must destroy any downloaded materials in your possession whether in electronic or printed format. f. Utilize our personal license for individuals for commercial purposes and any such use of our personal license for commercial purposes (e.g. using your corporate email) may result in immediate termination of your license.

slewis3y ago

Founder of Weights & Biases here (wandb). We don’t forbid anything, models are property of the people who created them.

Why do you think that?

EDIT: I'll edit respond, since you did. Look at sections 3b and 3c in the terms, they cover Models and other user content specifically. Those are user property, not our property. But I can see how this is confusing. We will clarify it.

https://www.smithsonianmag.com/smart-news/us-copyright-offic...

JanSt3y ago· 4 in thread

What is the license of the generated artwork?

leereeves3y ago

Still an unsettled question, but for now: U.S. Copyright Office Rules A.I. Art Can’t Be Copyrighted

iggldiggl3y ago

I think that decision refers to an attempt to getting it copyrighted with the AI/computer program as the author.

[0] https://www.copyright.gov/rulings-filings/review-board/docs/...

witheld3y ago

You pressed the button, it's owned by you, all rights are reserved by default.

Dangeranger3y ago

This statement is not accurate[0].

> But copyright law only protects “the fruits of intellectual labor” that “are founded in the creative powers of the [human] mind.” COMPENDIUM (THIRD) § 306 (quoting Trade-Mark Cases, 100 U.S. 82, 94 (1879)); see also COMPENDIUM (THIRD) § 313.2 (the Office will not register works “produced by a machine or mere mechanical process” that operates “without any creative input or intervention from a human author” because, under the statute, “a work must be created by a human being”). So Thaler must either provide evidence that the Work is the product of human authorship or convince the Office to depart from a century of copyright jurisprudence.

punk_ihaq3y ago· 4 in thread

Comment deleted

pilotneko3y ago

I think there might be some session leakage. I typed “A pig with a bowler hat.” and the model returned a picture of a half moon.

ubertaco3y ago

No matter what I typed, it always generated the same image of a moon half-covered in shadow. I think something might be a bit buggy with this.

buf3y ago

I always generates an image of a moon for me.

mensetmanusman3y ago

That’s a feature not a bug…

fartcannon3y ago· 3 in thread

Surly wandb is not a bare essential?

Borgz3y ago

You can download the models yourself if you don't want to use it.

capableweb3y ago

Where?

https://github.com/kuprel/min-dalle/issues/1#issuecomment-11...

jjallen3y ago

I wish this requirement was in the README

smcleod3y ago· 2 in thread

Almost worked on my 2021 Macbook Pro (M1 Pro) - https://github.com/kuprel/min-dalle/issues/1#issue-128676523...

bart3r3y ago

I had exactly the same issue

sam1r3y ago

Seems like there is a fix.

etaioinshrdlu3y ago· 2 in thread

Anyone have some stats on inference time and RAM requirements? (on specific hardware)

bart3r3y ago

I have a 2019 MacBook Pro 2.4Ghz Quad-core i5, 8GB RAM with Intel graphics card

  python3 image_from_text.py --text='a happy giraffe eating the world' --seed=7  154.61s user 22.18s system 262% cpu 1:07.40 total

  WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

As you can see, it took 1min 7seconds to complete.

I assume it would be much faster with a grunty graphics card

chrisa3y ago

Using an rtx 3090 (NVIDIA gpu with 24GB of RAM):

Mini = 5.33 s

Mega = 14.7 s

Update: about 1/2 that time is just loading the model, so if you load the model and then generate multiple images, it drops to:

Mini = 3.91 s

Mega = 8.86 s

JacobiX3y ago· 2 in thread

Interesting when testing with inputs like "Oscar Wilde photo" or "marilyn monroe photo" and comparing to a Google image search. After some iterations we can have quite similar images but the faces are always blurry.

bogwog3y ago

That's intentional. When training the models, they filter out human faces and adult content among other things.

l33t23283y ago

That’s annoying

mg3y ago· 2 in thread

What is the maximum resolution possible with this?

If it depends on the hardware, what would be the limit when one rents the biggest machine available in the cloud?

jwitthuhn3y ago

Fixed size of 256x256. It cannot go any bigger or smaller.

01acheru3y ago

Out of curiosity: why it cannot be changed? I know nothing about this field so... thanks!

jcims3y ago· 2 in thread

The google collab link works if you replace the computed path to flax_model.msgpack on line 10 in load_params.py with ‘/content/pretrained/vqgan/flax_model.msgpack’

Edit: actually it's easier to open a terminal and move /content/pretrained/vqgan to /content/min-dalle/pretrained/vqgan

kuprelOP3y ago

Thanks for figuring this out. The problem was that the vqgan repository was being cloned to the wrong directory. The updated colab should work now

DivineTraube3y ago

The updated version gives me this (after successful setup with the example alien thing):

UnfilteredStackTrace Traceback (most recent call last) <ipython-input-2-0e20e3adf861> in <module>() 2 ----> 3 image = generate_image_from_text("alien life", seed=7) 4 display(image)

67 frames UnfilteredStackTrace: TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float16, float32.

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last) /content/min-dalle/min_dalle/models/dalle_bart_decoder_flax.py in __call__(self, decoder_state, keys_state, values_state, attention_mask, state_index) 38 keys_state, 39 self.k_proj(decoder_state).reshape(shape_split), ---> 40 state_index 41 ) 42 values_state = lax.dynamic_update_slice(

TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float16, float32.

https://github.com/THUDM/CogVideo

epicureanideal3y ago· 2 in thread

Free idea: Same but for making short video clips, and then eventually producing entire movies.

cnity3y ago

Not sure why this is downvoted: seems like the inevitable endpoint of AI vision/image generation research and it warrants consideration and discussion.

danielbln3y ago

sydthrowaway3y ago· 2 in thread

Gamechanger

Clone before kill.

etaioinshrdlu3y ago

I don't think this is infringing on anyone's rights.

sydthrowaway3y ago

openai?

dalle-world3y ago· 1 in thread

I spun up an aws ubuntu ec2 with 2 Tesla M60. When I run python3 image_from_text.py --text='alien life' --seed=7

I get this error detokenizing image Traceback (most recent call last): File "/home/ubuntu/work/min-dalle/image_from_text.py", line 44, in <module> image = generate_image_from_text( File "/home/ubuntu/work/min-dalle/min_dalle/generate_image.py", line 74, in generate_image_from_text image = detokenize_torch(image_tokens) File "/home/ubuntu/work/min-dalle/min_dalle/min_dalle_torch.py", line 107, in detokenize_torch params = load_vqgan_torch_params(model_path) File "/home/ubuntu/work/min-dalle/min_dalle/load_params.py", line 11, in load_vqgan_torch_params params: Dict[str, numpy.ndarray] = serialization.msgpack_restore(f.read()) File "/usr/local/lib/python3.10/dist-packages/flax/serialization.py", line 350, in msgpack_restore state_dict = msgpack.unpackb( File "msgpack/_unpacker.pyx", line 201, in msgpack._cmsgpack.unpackb msgpack.exceptions.ExtraData: unpack(b) received extra data.

pmarreck3y ago

I get a similar error running it locally (not sure if related, but it also can't find my GPU, which is a 3080ti and should be sufficient): Traceback (most recent call last): File "/home/pmarreck/Documents/min-dalle/image_from_text.py", line 44, in <module> image = generate_image_from_text( File "/home/pmarreck/Documents/min-dalle/min_dalle/generate_image.py", line 75, in generate_image_from_text image = detokenize_torch(torch.tensor(image_tokens)) File "/home/pmarreck/Documents/min-dalle/min_dalle/min_dalle_torch.py", line 108, in detokenize_torch params = load_vqgan_torch_params(model_path) File "/home/pmarreck/Documents/min-dalle/min_dalle/load_params.py", line 12, in load_vqgan_torch_params params: Dict[str, numpy.ndarray] = serialization.msgpack_restore(f.read()) File "/home/pmarreck/anaconda3/lib/python3.9/site-packages/flax/serialization.py", line 350, in msgpack_restore state_dict = msgpack.unpackb( File "msgpack/_unpacker.pyx", line 202, in msgpack._cmsgpack.unpackb msgpack.exceptions.ExtraData: unpack(b) received extra data.

pja3y ago· 1 in thread

NB: Needs a weights & biases account in order to download the models.

lars_francke3y ago

You can CTRL+C that prompt and it'll download them anyway but it'll tell you that you can't visualize your results then.

amar-laksh3y ago· 1 in thread

People might also like this one: https://github.com/saharmor/dalle-playground Really easy to work with

enlyth3y ago

I couldn't get this to pick up my graphics card when running it with WSL 2, it's just says no cuda devices found or something so I gave up, not sure if anyone had any luck

sAbakumoff3y ago· 1 in thread

The results are amazingly poor. Try "biden plays chess against napoleon"

tgv3y ago

I tried a few non-descriptive statements from random tweets. As it turns out, nobody's made a random tweet since 2016, but for the few that exist, the results are great. E.g. "Good Morning Everyone , Happy Nice Day :D" generates something that can only be described as bored-ape meets Picasso in kindergarten. Probably the next-gen 1M$ NFT. If anybody needs proof that these models don't think, this is it.

lastdong3y ago

Now we just need to containarise it (there are a few docker python nvidia images)

WheelsAtLarge3y ago

Good Job. What are the hardware requirements?

bjarneh3y ago

   TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float32, float16.

godmode20193y ago

That dinosaur image has fantastic meme potential

jwitthuhn3y ago

Love this, before I only ever saw code that ran this model through jax. This seems to perform much better on my m1.

mahastore3y ago

WTF is Wandb.ai? This seems like a sneaky way to get people to sign up for this wandb thingy.

ramesh313y ago

Has anyone got this running on M1?

ausbah3y ago

has anyone applied compression techniques to large models like dalle-2?

coding1233y ago

Cloning....

j / k navigate · click thread line to collapse

148 comments

91 comments · 28 top-level

daenz3y ago· 11 in thread

mdp20213y ago

Not differently to translators etc.: not required for every small task, still required for doing things professionally.

jokethrowaway3y ago

There is no equivalent for illustrators.

My friends who studied some specific language are all unemployed or doing unqualified jobs. Their peers from a generation before are teachers or work in some embassy.

londons_explore3y ago

Lots of professional translators moved into language tuition.

I guess lots of artists will move into teaching art.

I see tools like this might increase interest by the public into making their own art with the help of new tools, and some will want to be taught.

mola3y ago

Yeah, so a lot less work...

https://openai.com/dall-e-2/

shakna3y ago

Only in the same manner that GPT-3 eliminates the need for writers. Or influencers remove the need for advertising.

That is, a surface-level view might show these things as equivalent, but the skills required to produce a decent result are not encapsulated in the averages that models contain.

mola3y ago

I'm sure a lot of "content writers" for SEO spam will become obsolete. The content level is already rock bottom, so is easily replaced by brainless machines.

kortex3y ago

Text-to-image algos did the same thing for a while, but you look at the latest full-size DALL-E and it's pretty much flawless.

If I were considering art school, I'd certainly be reconsidering my options. Maybe there are some defects in the output, but nothing photoshop can't fix.

marban3y ago

Taste and empathy are tough to emulate.

_flux3y ago

Dall-e2 sometimes pulls through big time, though: https://www.reddit.com/r/dalle2/comments/vbtqkw/dalle_really...

But it's not going to tell you in clear words if your prompt was bad to begin with, like a human would, hopefully :).

https://softology.pro/tutorials/tensorflow/tensorflow.htm

mrtranscendence3y ago

dbddv013y ago

"an illustrated book is far off". Hi, just to mention that i'm using mini DALL-E for graphic novel experiments... Indeed not really a human quality but ... https://twitter.com/Dbddv01

andybak3y ago· 8 in thread

If people just want to run text to image models locally by far the easiest way I've found on Windows is to install Visions of Chaos.

songeater3y ago

also, https://pollinations.ai/

Needs a google sub to run colab (DALL-E itself needs Colab Pro, but other models run on free version).

edit: not local! but very handy.

sAbakumoff3y ago

I can't make any of the image generation worked - they all fail with the error message. Did you have luck with any of them?

ccbccccbbcccbb3y ago

not really surprising, but

bare minimum GPU: NVIDIA 2080 with 8GB VRAM

300 Gb of disk space

</caveat emptor>

leereeves3y ago

It should probably have an option to download just one model instead of all of them. 300GB is a lot.

I thought they don't use the gpu for inference?

where did you get these? couldnt find it, mightve missed smth

https://youtu.be/4_LgrAL7EWg?t=163

leereeves3y ago

Thank you, I've been looking for something like that and it looks very cool, judging by this tutorial that shows it in action, as it creates an image and displays the results in progress:

jquaint3y ago

Visions of Chaos is amazing.

Its great if you want to run more "classic" AI algorithms as well!

kertoip_13y ago· 6 in thread

joshvm3y ago

Also for deployment, the less cruft in the repository the better. A lot of research repositories end up pulling in all kinds of crazy dependencies to do evaluation, multiple big frameworks etc.

mFixman3y ago

I don't understand why execution of a model with the same layers and weights would be different between PyTorch and Tensorflow.

nonameiguess3y ago

Seems like it'll be a serious issue for people hoping we can someday upload human brains into machines if we can't even transfer models from TensorFlow to PyTorch reliably.

ShamelessC3y ago

Not to discourage the OP of course, great work.

OJFord3y ago

Look how much easier it is to install & run, people are interested in and up-voting the result, not how much work was (or wasn't) required to achieve it.

avhon13y ago

It still seems to require JAX somewhere to work.

On my desktop, running the example

> python image_from_text.py --text='alien life' --seed=7

results in

> RuntimeError: This version of jaxlib was built using AVX instructions, which your CPU and/or operating system do not support. You may be able work around this issue by building jaxlib from source.

Unfortunately, it looks like I won't be running this today.

mikewarot3y ago· 5 in thread

What is wandb.ai, and Why does it keep asking for an API key?

It's not listed in the requirements

I've posted it as an issue

slewis3y ago

kuprelOP3y ago

Thanks for the tip! I just updated the colab to login anonymously

nl3y ago

> What is wandb.ai

Weight & Biases

> Why does it keep asking for an API key

From the README:

the Weight & Biases python package is used to download the DALL·E Mini and DALL·E Mega transformer models

It might not be obvious you need an account if you aren't in the field though.

Aardwolf3y ago

Why is this needed to download the model?

I'd prefer to download it myself and choose where I put it too.

This feedback is about dalle mini playground instead but it does the same thing. If this one is stripped to bare essentials I'd expect this type of dependencies stripped too.

Edit: I don't want to seem like complaining too much though and am very happy with these open models and tooling for them. Thanks!

h0mie3y ago

Saas dashboard for monitoring/mertics

langitbiru3y ago· 4 in thread

I guess we will have text-to-image startups in the next batch of YC.

upupandup3y ago

Pornhub is working on something I hear

harpersealtako3y ago

nudpiedo3y ago

wandb forbids reusing the models and other information, independently of their usage, so they should find another source for their models

Note that there is no formal definition of "Materials" or "Service", so that it applies to all the contents of the webpage including the software stored there: https://wandb.ai/site/terms

I quote it:

slewis3y ago

Founder of Weights & Biases here (wandb). We don’t forbid anything, models are property of the people who created them.

Why do you think that?

https://www.smithsonianmag.com/smart-news/us-copyright-offic...

JanSt3y ago· 4 in thread

What is the license of the generated artwork?

leereeves3y ago

Still an unsettled question, but for now: U.S. Copyright Office Rules A.I. Art Can’t Be Copyrighted

iggldiggl3y ago

I think that decision refers to an attempt to getting it copyrighted with the AI/computer program as the author.

[0] https://www.copyright.gov/rulings-filings/review-board/docs/...

witheld3y ago

You pressed the button, it's owned by you, all rights are reserved by default.

Dangeranger3y ago

This statement is not accurate[0].

punk_ihaq3y ago· 4 in thread

Comment deleted

pilotneko3y ago

I think there might be some session leakage. I typed “A pig with a bowler hat.” and the model returned a picture of a half moon.

ubertaco3y ago

No matter what I typed, it always generated the same image of a moon half-covered in shadow. I think something might be a bit buggy with this.

buf3y ago

I always generates an image of a moon for me.

mensetmanusman3y ago

That’s a feature not a bug…

fartcannon3y ago· 3 in thread

Surly wandb is not a bare essential?

Borgz3y ago

You can download the models yourself if you don't want to use it.

capableweb3y ago

Where?

https://github.com/kuprel/min-dalle/issues/1#issuecomment-11...

jjallen3y ago

I wish this requirement was in the README

smcleod3y ago· 2 in thread

Almost worked on my 2021 Macbook Pro (M1 Pro) - https://github.com/kuprel/min-dalle/issues/1#issue-128676523...

bart3r3y ago

I had exactly the same issue

sam1r3y ago

Seems like there is a fix.

etaioinshrdlu3y ago· 2 in thread

Anyone have some stats on inference time and RAM requirements? (on specific hardware)

bart3r3y ago

I have a 2019 MacBook Pro 2.4Ghz Quad-core i5, 8GB RAM with Intel graphics card

  python3 image_from_text.py --text='a happy giraffe eating the world' --seed=7  154.61s user 22.18s system 262% cpu 1:07.40 total

  WARNING:absl:No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

As you can see, it took 1min 7seconds to complete.

I assume it would be much faster with a grunty graphics card

chrisa3y ago

Using an rtx 3090 (NVIDIA gpu with 24GB of RAM):

Mini = 5.33 s

Mega = 14.7 s

Update: about 1/2 that time is just loading the model, so if you load the model and then generate multiple images, it drops to:

Mini = 3.91 s

Mega = 8.86 s

JacobiX3y ago· 2 in thread

bogwog3y ago

That's intentional. When training the models, they filter out human faces and adult content among other things.

l33t23283y ago

That’s annoying

mg3y ago· 2 in thread

What is the maximum resolution possible with this?

If it depends on the hardware, what would be the limit when one rents the biggest machine available in the cloud?

jwitthuhn3y ago

Fixed size of 256x256. It cannot go any bigger or smaller.

01acheru3y ago

Out of curiosity: why it cannot be changed? I know nothing about this field so... thanks!

jcims3y ago· 2 in thread

The google collab link works if you replace the computed path to flax_model.msgpack on line 10 in load_params.py with ‘/content/pretrained/vqgan/flax_model.msgpack’

Edit: actually it's easier to open a terminal and move /content/pretrained/vqgan to /content/min-dalle/pretrained/vqgan

kuprelOP3y ago

Thanks for figuring this out. The problem was that the vqgan repository was being cloned to the wrong directory. The updated colab should work now

DivineTraube3y ago

The updated version gives me this (after successful setup with the example alien thing):

UnfilteredStackTrace Traceback (most recent call last) <ipython-input-2-0e20e3adf861> in <module>() 2 ----> 3 image = generate_image_from_text("alien life", seed=7) 4 display(image)

67 frames UnfilteredStackTrace: TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float16, float32.

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.

--------------------

The above exception was the direct cause of the following exception:

TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float16, float32.

https://github.com/THUDM/CogVideo

epicureanideal3y ago· 2 in thread

Free idea: Same but for making short video clips, and then eventually producing entire movies.

cnity3y ago

Not sure why this is downvoted: seems like the inevitable endpoint of AI vision/image generation research and it warrants consideration and discussion.

danielbln3y ago

sydthrowaway3y ago· 2 in thread

Gamechanger

Clone before kill.

etaioinshrdlu3y ago

I don't think this is infringing on anyone's rights.

sydthrowaway3y ago

openai?

dalle-world3y ago· 1 in thread

I spun up an aws ubuntu ec2 with 2 Tesla M60. When I run python3 image_from_text.py --text='alien life' --seed=7

pmarreck3y ago

pja3y ago· 1 in thread

NB: Needs a weights & biases account in order to download the models.

lars_francke3y ago

You can CTRL+C that prompt and it'll download them anyway but it'll tell you that you can't visualize your results then.

amar-laksh3y ago· 1 in thread

People might also like this one: https://github.com/saharmor/dalle-playground Really easy to work with

enlyth3y ago

I couldn't get this to pick up my graphics card when running it with WSL 2, it's just says no cuda devices found or something so I gave up, not sure if anyone had any luck

sAbakumoff3y ago· 1 in thread

The results are amazingly poor. Try "biden plays chess against napoleon"

tgv3y ago

lastdong3y ago

Now we just need to containarise it (there are a few docker python nvidia images)

WheelsAtLarge3y ago

Good Job. What are the hardware requirements?