Someone can correct me here but AFAIK we don't even know which datasets are used to train these models, so why should we even use "open" to describe Llama? This is more similar to a freeware than an open-source project.
[1] https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/202...