Rice University leads $11M effort in big data software analytics (opens in new tab)

(news.rice.edu)

50 pointsColinCera11y ago33 comments

33 comments

“We envision a system where the programmer writes a few of lines of code, hits a button and the rest of the code appears. And not only that, the rest of the code should work seamlessly with the code that’s already been written.”

I'm skeptical...

'Writing computer programs could become as easy as searching the Internet. A Rice University-led team of software experts has launched an $11 million effort to create a sophisticated tool called PLINY that will both “autocomplete” and “autocorrect” code for programmers, much like the software that completes search queries and corrects spelling on today’s Web browsers and smartphones.'

Interested, but very, very skeptical...

teraflop11y ago

I'm reminded of one of Alan Perlis's "epigrams in programming":

"When someone says 'I want a programming language in which I need only say what I wish done,' give him a lollipop."

simplemath11y ago

I've seen plenty of platforms that write their own code from UI-type elements. That code is scary.

AngrySkillzz11y ago

More or less scary than the code generated by a compiler? The JavaScript you get out of a ClojureScript compiler is pretty scary, but it still runs reasonably well. Though I suppose it would be pretty hard to write additional JavaScript to interact with the generated code.

simplemath11y ago

>Though I suppose it would be pretty hard to write additional JavaScript to interact with the generated code.

otoburb11y ago

Rather like Adobe Dreamweaver back in the day. The code it generated was atrocious, but it really helped a lot of non-technical people build sites quickly with minimal "fuss".

bhhaskin11y ago

I am right there with you. we already have "auto-complete". I am not sure what "auto-correct" would entail. My guess is fixing syntax errors, but i don't see how it could even address logic errors.

mojobot11y ago

I am curious about how they will define correctness. With many parts in a system, code can look correct in its small unit, but not until it is running in the real world, is it obviously wrong.

atwebb11y ago

Able to identify common methods/algos/calls and correct a programmer's use of them, add in missing checks, remove unnecessary blocks/lines?

bhhaskin11y ago

A number of IDE's already do this. Eclipse has some nice features that do this.

melling11y ago

A lot of programming is repetitive. It would be great if we could make some advances. On iOS, for example, Apple will generate a lot of code for you if, say, you want to create a uitableview. I could easily imagine we could tell it a few things and the entire method could be generated.

If we look at duplicate code across all projects on the Internet, maybe can simply pull from a larger database.

http://stackoverflow.com/questions/191614/how-to-detect-code...

Anyway, it's easy to say that it can't be done. It's probably more worthwhile to try and tackle the problem and make some forward progress.

[Update]

Just saw this HN submission.

https://news.ycombinator.com/item?id=8562635

Microsoft can now autocomplete C# code by using Bing Code Search Engine based off of your comments.

///how to read file pth line by line<TAB>

Pretty slick.

lesterbuck11y ago

"We envision a system where all Defense Department computers worldwide are connected with reliable networks based on packet switching." I'm sure language like that was in the original proposal for ARPAnet circa 1968, and I'm sure most professionals looked at that and were "very, very skeptical". DARPA has that "advanced research" in its name. If most professionals aren't "very, very skeptical" about the ultimate commercial success of their projects, I'd say they are doing it wrong.

hollerith11y ago

I wish the people here would upvote more posts by actual researchers and programmers and fewer press releases from university PR departments.

mring3362111y ago

This is easy. IDE bot scrapes stackoverflow as you type. Ctrl-space to paste 'best' solution.

Igglyboo11y ago

http://gkoberger.github.io/stacksort/

danabenson11y ago

VisualStudio has a similar plugin: https://visualstudiogallery.msdn.microsoft.com/a1166718-a2d9...

j_s11y ago

Invencia analyzes stack exchange sites to classify malware based on method calls:

http://www.youtube.com/watch?v=u6a7afsD39A

ep10311y ago

that's basically exactly what they're talking about. Though deployed in house it could be a nice way to see if the code you want to write already exists in a similar function elsewhere (and therefore perhaps should be refctored to run in both places). It might also catch copy pasters before they ruin a code base too. But otherwise yeah, shitty SO bot.

jimbokun11y ago

From this article, I have absolutely no idea what this software will actually do.

Can anyone enlighten me as to what they are actually trying to do, from the perspective of an actual software developer?

Or did they just successfully string together the correct sequence of buzz words to unlock the grant money?

leeber11y ago

> Or did they just successfully string together the correct sequence of buzz words to unlock the grant money?

This, absolutely. Big data, data mining, and machine learning are really cool topics but the words became overused, overhyped, used out of context, especially by people who don't really understand what these are.

I have an old co-worker who spent a lot of time working with large excel spreadhseets, using some formulas, sorting it to look for things, etc.

He lists "data mining" on his linkedin, and has a ton of people who endorsed him for it.

This became a little off topic, but I hate how venture capital, grant funds, the media, and misinformed people completely butcher these topics.

borplk11y ago

"big data" .... because my Excel spreadsheets were like ... really big ... like you had to scroll down for 5 seconds big.

VLM11y ago

As usual, the future is already here, just unevenly distributed.

https://github.com/capitaomorte/yasnippet

I do wonder if you gave $11M to João Távora what the end result would be. Probably pretty cool.

bhudman11y ago

Grants like there baffle me. Having worked for the university, it is a lot about how you phrase the proposal so you can get the approvers to issue the grant. Someone had suggested ya-snippet and it already solves many of their goals. I am skeptical about the auto-correct feature

mafribe11y ago

One of the PIs is a Goedel award winner (M. Vardi). Track record matters in getting funding.

lostpixel11y ago

I half-remember they tried to do a subset of this with some LISP/SCHEME, but it didn't pan out too well.

http://en.wikipedia.org/wiki/DWIM

boardstretcher11y ago

It's a great idea. But, what code base is this autocomplete going to run off of?

If they are thinking of sourcing the internet itself, there had better be some kind of omniscient, all powerful proofreader in place, because there are a lot of people that submit a lot of code that is HORRIBLY insecure, inaccurate, prone to breakage or just plain spaghetti.

I'd hate to be working on a missile guidance system, only to press <tab> to complete a code block and end up getting some Intel Pentium FDIV instructions.

saurabh20n11y ago

The announcement itself is pretty sparse on the proposed approach, but given the research interests of Swarat Chaudhuri [1] and Moshe Vardi [2], I would guess they will attempt to combine recent advancements in program synthesis, program verification, and code mining.

Program synthesis: There has been a lot of interest in the formal methods community to automatically generate programs (for small instances) with the target specification coming from input-output examples (e.g., Excel Flash Fill [3]), program templates or holes (called Sketches [4]), reactive models of adversarial environments, formal invariants etc. Also the solution techniques used vary considerably: from game theoretic solving, SAT solvers, model checkers, to version-space algebras and others. The community has not yet fixated on a specification language, or a solving technology. The industrial nature of the tools being leveraged (e.g., model checkers and SAT solvers from the hardware community) gives hope for promising developments. A Berkeley course [5] covers a good spectrum of the current developments.

If I were to guess, maybe the Rice researchers are approaching the code completion/correction problem as mining for fragments of large codebases that are incomplete/incorrect and applying program synthesis to fill those fragments. Of course that would mean that they would also need to mine the specification requirements for those fragments. All of this is easier said than done, and it would be an ambitious project. Swarat has also done some really cool work on "probabilistic reasoning for programs" and "verification of probabilistic programs", so that might be part of it too. (Of course, I may be completely off-base! After all, we are commenting on a non-technical funding announcement here.)

[1] Swarat's publications: http://www.cs.rice.edu/~sc40/pubs/

[2] Moshe's publications: http://www.cs.rice.edu/~vardi/papers/index.html

[3] Excel's FlashFill from Sumit Gulwani, researcher@MSR: http://research.microsoft.com/en-us/um/people/sumitg/flashfi...

[4] The Sketch program synthesizer: https://bitbucket.org/gatoatigrado/sketch-frontend/wiki/Home

[5] Ras Bodik/Emina Torlak: Berkeley course material on Program Synthesis: http://www.cs.berkeley.edu/~bodik/cs294fa12

infinite8s11y ago

I put my money on this comment being reflective of the actual research award vs all the other comments complaining that its an $11M grant to build an autocomplete that scrapes the internet for code snippets.

geobmx54011y ago

Posted just a bit later on HN: http://codesnippet.research.microsoft.com/

chadmckenna11y ago

"PLINY is part of DARPA’s Mining and Understanding Software Enclaves (MUSE) program, an initiative that seeks to gather hundreds of billions of lines of publicly available open-source computer code and to mine that code to create a searchable database of properties, behaviors and vulnerabilities."

I feel that the reason DARPA is willing to fund this is because of that last part: "vulnerabilities".

RA_Fisher11y ago

Hopefully they publish in public jounals!

m3sh11y ago

all those millions are given for explaining stuff with that papers and not laughing once.

blktiger11y ago

Autocomplete + the Internet already erodes many programmers skills to the point where they can barely write code without help. I can't imagine what this kind of tool would do.

Not that there is anything wrong with autocomplete. I certainly use it, but I've seen a lot of programmers that barely understand the code they are writing.

j / k navigate · click thread line to collapse

33 comments

ColinCeraOP11y ago

I'm skeptical...

Interested, but very, very skeptical...

teraflop11y ago

I'm reminded of one of Alan Perlis's "epigrams in programming":

"When someone says 'I want a programming language in which I need only say what I wish done,' give him a lollipop."

simplemath11y ago

I've seen plenty of platforms that write their own code from UI-type elements. That code is scary.

AngrySkillzz11y ago

simplemath11y ago

>Though I suppose it would be pretty hard to write additional JavaScript to interact with the generated code.

otoburb11y ago

Rather like Adobe Dreamweaver back in the day. The code it generated was atrocious, but it really helped a lot of non-technical people build sites quickly with minimal "fuss".

bhhaskin11y ago

I am right there with you. we already have "auto-complete". I am not sure what "auto-correct" would entail. My guess is fixing syntax errors, but i don't see how it could even address logic errors.

mojobot11y ago

I am curious about how they will define correctness. With many parts in a system, code can look correct in its small unit, but not until it is running in the real world, is it obviously wrong.

atwebb11y ago

Able to identify common methods/algos/calls and correct a programmer's use of them, add in missing checks, remove unnecessary blocks/lines?

bhhaskin11y ago

A number of IDE's already do this. Eclipse has some nice features that do this.

melling11y ago

If we look at duplicate code across all projects on the Internet, maybe can simply pull from a larger database.

http://stackoverflow.com/questions/191614/how-to-detect-code...

Anyway, it's easy to say that it can't be done. It's probably more worthwhile to try and tackle the problem and make some forward progress.

[Update]

Just saw this HN submission.

https://news.ycombinator.com/item?id=8562635

Microsoft can now autocomplete C# code by using Bing Code Search Engine based off of your comments.

///how to read file pth line by line<TAB>

Pretty slick.

lesterbuck11y ago

hollerith11y ago

I wish the people here would upvote more posts by actual researchers and programmers and fewer press releases from university PR departments.

mring3362111y ago

This is easy. IDE bot scrapes stackoverflow as you type. Ctrl-space to paste 'best' solution.

Igglyboo11y ago

http://gkoberger.github.io/stacksort/

danabenson11y ago

VisualStudio has a similar plugin: https://visualstudiogallery.msdn.microsoft.com/a1166718-a2d9...

j_s11y ago

Invencia analyzes stack exchange sites to classify malware based on method calls:

http://www.youtube.com/watch?v=u6a7afsD39A

ep10311y ago

jimbokun11y ago

From this article, I have absolutely no idea what this software will actually do.

Can anyone enlighten me as to what they are actually trying to do, from the perspective of an actual software developer?

Or did they just successfully string together the correct sequence of buzz words to unlock the grant money?

leeber11y ago

> Or did they just successfully string together the correct sequence of buzz words to unlock the grant money?

I have an old co-worker who spent a lot of time working with large excel spreadhseets, using some formulas, sorting it to look for things, etc.

He lists "data mining" on his linkedin, and has a ton of people who endorsed him for it.

This became a little off topic, but I hate how venture capital, grant funds, the media, and misinformed people completely butcher these topics.

borplk11y ago

"big data" .... because my Excel spreadsheets were like ... really big ... like you had to scroll down for 5 seconds big.

VLM11y ago

As usual, the future is already here, just unevenly distributed.

https://github.com/capitaomorte/yasnippet

I do wonder if you gave $11M to João Távora what the end result would be. Probably pretty cool.

bhudman11y ago

mafribe11y ago

One of the PIs is a Goedel award winner (M. Vardi). Track record matters in getting funding.

lostpixel11y ago

I half-remember they tried to do a subset of this with some LISP/SCHEME, but it didn't pan out too well.

http://en.wikipedia.org/wiki/DWIM

boardstretcher11y ago

It's a great idea. But, what code base is this autocomplete going to run off of?

I'd hate to be working on a missile guidance system, only to press <tab> to complete a code block and end up getting some Intel Pentium FDIV instructions.

saurabh20n11y ago

[1] Swarat's publications: http://www.cs.rice.edu/~sc40/pubs/

[2] Moshe's publications: http://www.cs.rice.edu/~vardi/papers/index.html

[3] Excel's FlashFill from Sumit Gulwani, researcher@MSR: http://research.microsoft.com/en-us/um/people/sumitg/flashfi...

[4] The Sketch program synthesizer: https://bitbucket.org/gatoatigrado/sketch-frontend/wiki/Home

[5] Ras Bodik/Emina Torlak: Berkeley course material on Program Synthesis: http://www.cs.berkeley.edu/~bodik/cs294fa12

infinite8s11y ago

geobmx54011y ago

Posted just a bit later on HN: http://codesnippet.research.microsoft.com/

chadmckenna11y ago

I feel that the reason DARPA is willing to fund this is because of that last part: "vulnerabilities".

RA_Fisher11y ago

Hopefully they publish in public jounals!

m3sh11y ago

all those millions are given for explaining stuff with that papers and not laughing once.

blktiger11y ago

Autocomplete + the Internet already erodes many programmers skills to the point where they can barely write code without help. I can't imagine what this kind of tool would do.

Not that there is anything wrong with autocomplete. I certainly use it, but I've seen a lot of programmers that barely understand the code they are writing.

j / k navigate · click thread line to collapse