um... please tell me devs don't have access to production data in a healthcare environment (of all places!).
I mean, I understand the need for a representative dataset to develop and test against, but this is people's lives they're playing with!
And, you know, if you had a decent set of anonymised or fictitious customer data to work with, you wouldn't need to run your IDE in docker, and there would be less surface area for attackers to get to the data.
If developers don't have access to production data, then the solution is useless. How do people not understand this in 2019?
Millions of dollars have been dumped into various products centred around the idea of synthesizing 'production-like' data and all have failed. Because synthesizing fake data destroys the signal that makes the original data useful in the first place. If the engineers don't have access to it, then they can't extract the value from it, then what the hell are you doing in the first place?
You think if you give engineers a synthetic dataset and they build a blind solution around it, that the users will be extract value out of that data? That myth was dispelled a decade ago and there doesn't exist a single synthetic data success story since then.
I've had clients coming to us with the notion that they can give us fake imaging data and we can generate diagnostic insights from it. This crap needs to stop. If you can't trust engineers with your data to extract value out of it, then go ahead and munge it in excel.
When I had this same discussion, the response I got was its not a matter of trust. But apparently in the terms & conditions (that consumers do not read anyways, including myself). There is a part of the data not being viewed by anyone not using it for diagnosis or treatment of the patient. Basically, only anyone with a medical degree and part of the team treating the patient can access the data & does not mention about the engineer using the data to build a better system to improve the system used for healthcare. So, ideally they want a candidate with a double degree in software and medicine. This term is applicable to other domains also.
I do not accept the reason, but that is what it is.
... what?
The best they get are aggregations (sufficiently large enough that you can't identity a person). Specific user information is only available if the user explicitly opts in to share it, and is scoped to the specific case for analysis.
It makes training/diagnosing ML models a challenge, to be sure
Even anonymised date (which most healthcare data is btw.) has strict requirements on access control and copying.
As for “develop and test against”. There’s the entire field of data science and modeling that’s not focused with developing systems to expose the data, but doing analysis on the data.
That doesn't have to mean it's worse though. Testing with 10,000,000 fictitious accounts generated to cover lots of permutations of user data is a great idea even if the real data won't ever have some of those permutations. It's a testing technique called "fuzzing", and it's quite common.
Not necessarily, there is a lot of work done on synthetic data which closely matches the properties of the real data both in academia and increasingly in industry.
I see this particular development environment as an advantage for this particular situation. I work in health care, and in R&D we often have to help debug issues with a client implementation, which means having an anonymized snapshot of their data in our reporting system. Typically, we end up passing around DB backups and zip files of the client-specific code. It would be significantly easier to fire up a Docker container that was ready to go.
Have seen it quite a few times, but there needed to be a good reason and other options had to be exhausted. A few examples were due to zero-width characters which were sanitized by whatever tool that anonymized data, but broke something in the program.
There are brittle systems in healthcare.
I made the same mistake once. We should we glad we weren't fired.
On the other hand, you do need devs to be able to look at real data when it's absolutely necessary (generally with a code-enforced gatekeeper and an audit trail). And you need to make sure a dev can SSL into a node for repair/maintenance (again with an audit trail).
That's the argument for using one of the many Identity and Access Management tools/providers out there, including the systems that come baked into the cloud providers.
Code on your Chromebook, tablet, and laptop with a consistent dev environment.
If you have a Windows or Mac workstation, more easily develop for Linux.
Take advantage of large cloud servers to speed up tests, compilations, downloads, and more.
Preserve battery life when you're on the go.
All intensive computation runs on your server.
You're no longer running excess instances of Chrome.
I imagine not everybody is going to want to run this on some Kubernetes cluster. The ability to do this locally seems that it could be really productive, actually. And having it in Docker can provide snapshotting via `docker commit` as well as the ability to cap its cpu/ram resources.I might actually try this and a Docker registry to get some semblance of an editor per project. In some contexts I want to run many, many extensions,but for other work I'd rather not have that bloat to contend with. Also I've been really feeling the pain of navigating a PC running Unraid (lots of bare metal VMs) and a Mac laptop, trying to do development on each. My desktop is beefy, but I need to work on the go sometimes, and at times I need to use a Windows box. Right now they all have different VSCode setups. I've been meaning to get around to setting up some scheme of making my config portable, but with different paths across Ubuntu, MacOS, and Windows that seems a bit daunting to get all of my dep paths straight, like eslint and phpcs.
Okay, enough comment writing, I'm giving this a go.
1. I first tried to install this on my Win10 VM, which needed to have Docker installed. That was a terrible idea. I completely broke my VM as Docker tried to enable Hyper-V. Friends don't let friends attempt nested virtualization. I should have just run the container on the host instead, which it supports quite well.
2. The repo worked as the blog post described on my mac. Its quick and has been able to run some tricky extensions. I need to experiment with running some external dependancies still.
3. Docker commit worked nicely, making a layer for the changes I made. Still playing with this, but wow that could be very productive if it enabled me to roll back to a tested base environment, or share a full IDE image with somebody on my team.
E.g. you can't (easily) have autocomplete in C++ on VS Code on your Mac if your project doesn't target Mac and can't build there (or doesn't have the dependencies etc).
But you can do it inside a Docker image.
That being said it's still an idea, but here's hoping it works!
You can develop all your code in a fully specified environment, which makes it much easier to reproduce and deploy models and analysis.
You can (after enabling security) move your IDE to the data. Instead of transferring data back and forth you can develop where your data is stored.
Last - and most important for me - in industries like my own (healthcare), you work with highly regulated data that has to be stored securely, where having multiple copies of data on multiple laptops can pose an unacceptably large risk.
Running containers like this within a secure environment with access to the data helps us to have an ideal development environment, while ensuring the protected data remains in a secure, single location with no unnecessary duplication.
Article says right there, whereas you haven't explained why this would be a bad usecase? maybe it's wasteful but if a person wants additional security via ephemerality then it seems finehttps://blog.docker.com/2013/09/docker-can-now-run-within-do...
Repeat ad infinitum until you feel secure and ephemeral enough.
I am running emacs with spacemacs.org
And if you haven't tried about Org Mode, it is not exaggeration if I say it is life changing. It can help you organize notes, todos, agendas etc.
For ultimate multi-workplace setup, you can run Emacs in server mode on a cloud instance, and allow network connections to it, or port-forward to its Unix socket via SSH.
Now you can run Emacs in client mode from whatever machine you may have, several of them, or ssh to the cloud box and run Emacs in terminal mode in a crunch. All your sessions will share the same set of files, but workspace layout is per client, so you can work comfortably both from an 11" laptop screen and from a 27" 4K screen.
As said above, you can use tramp to access whatever other remote files accessible via ssh, and also run a decent (though a bit limited) terminal right from Emacs, to say nothing of running REPLs of all kinds directly, and excellent git integration with Magit.
This can even give you a sort of VPN-like access, when the cloud box where the Emacs server runs has access to machines that are not directly accessible to you from the machine you're connecting from.
OTOH VS Code likely can be run in a similar setup.
In general I very much like the modularization of IDEs: instead of a monolith form 1990s you can mix and match your favorite editor with language servers, REPLs, build servers, etc, all separate and in many cases running remotely.
I want to love it. It makes a very specific use case I use much nicer. I can leave code on a remote server with all the compute power I need to build and run my project, and edit the file I'm working on with VS code's editor without having to sync files around. It does, however, have a few big caveats that killed it for me.
It doesn't block any of the browser things that would leave the webpage. Notably, if you hit Ctrl-W to close a file tab because of muscle memory, you'll close the browser tab. Also, if you hit back on accident like I apparently do all the time, you'll go back to the blank tab page. In both of these cases, you'll lose any unsaved state.
Also, the extension repo it's pointing at isn't MS's live repo. There are apparently reasons for this, but it means you don't get the latest version of extensions, which was annoying for a specific extension I've gotten used to.
I also had issues with VS Code getting confused about state when my connection to the remote box was less than ideal.
All in all, I really wanted to like it, but for truly remote cases, I'm back to using Mosh to interact with the remote box, and a simple tool I wrote ages ago to handle rsyncing the local files to the remote box to build and run them there.
It's promising, but it's got a little ways to go.
Edit: I use CMAKE_TOOLCHAIN_FILE to describe the target env.
I've only used it briefly for some node.js work but as the whole thing is just sitting in a normal linux docker container, you should be able to do anything docker/linux can do.
I really don't understand the localhost use case though. I'm on MacOS. Why would I spawn a VM (docker for macos) with limited access to my system (container promise) to run an editor already running in a VM?
I only end up having a resources and disk space hungry, slow and inconvenient editor?
It would lead one to believe that VSCode and thus by extension VSCodium could be run in Docker and accessed from a web browser
In fact, you can run "Coder" (https://coder.com/); a product, which according to their GitHub had some non-trivial effort put into it to make it run as such.
Not least of all, looking through their issue list, is the fact they compile extensions themselves and they are therefore somewhat outdated (according to issue comments from their users).
It's nice, but it's not VSCode per-se and sadly means no dice for Codium users.