> Given: A protein string P of length at most 1000 aa.
> Return: The total weight of P. Consult the monoisotopic mass table.
The "monoisotopic mass table" appears to be a link. I get a pop-up, but nothing appears in it, other than a spinner. I had to do a web search to find http://rosalind.info/glossary/monoisotopic-mass-table/ .
The page continues:
> Sample Dataset - SKADYEK
> Sample Output - 821.392
Using the monoisotopic mass table I computed:
>>> d = {'A': 71.03711, 'C': 103.00919, 'D': 115.02694,
'E': 129.04259, 'F': 147.06841, 'G': 57.02146,
'H': 137.05891, 'I': 113.08406, 'K': 128.09496,
'L': 113.08406, 'M': 131.04049, 'N': 114.04293,
'P': 97.05276, 'Q': 128.05858, 'R': 156.10111,
'S': 87.03203, 'T': 101.04768, 'V': 99.06841,
'W': 186.07931, 'Y': 163.06333}
>>> sum(d[c] for c in "SKADYEK")
821.3919199999999
This matches the example. BUT!!!!This is NOT the correct answer because as the expanded text says, "the mass of a protein is the sum of masses of all its residues plus the mass of a single water molecule."
The table says "the monoisotopic mass of water is considered to be 18.01056" so
>>> 821.3919199999999 + 18.01056
839.40248
This latter number matches the value given by https://web.expasy.org/cgi-bin/compute_pi/pi_tool .Which means the example answer ... is wrong. Yes?
How (in)correct are the other answers? I-am-not-a-bioinformatics-programmer.
Re: the problem - not a hundred percent on this, but I think the issue is that they are vague on the fact that this is a theoretical question, not a practical one. The key is that the question itself does not mention the addition of the water molecule, just that you have a sequence P with a dictionary of weights.
Edit 1: If memory serves me correct, after the initial ionization phase of mass spectroscopy, the additional water molecule is discarded, making it insignificant in the analysis of your peptide sequences.
Edit 2: If anyone is interested in following through this site, I would highly recommended using the existing problem tracks http://rosalind.info/problems/list-view/?location=bioinforma... These will help lay out the problems in a logical order an ensure you have the skills you need to progress. Alignment problems are a great way to learn dynamic programming and will allow you to move onto some of these other problems (like mass spec and HMMs) more reasonably (at least, in my experience!) Good luck!
In high school I tried to build a mass spectrometer. It didn't work - I couldn't get a high enough vacuum, and a few years later as a physics undergrad did I find that that was only one of several problems I had. It was fun to try though.
But I do know that the ionized particle has a charge, and that electron affects the overall mass, by about 1/1836 Dalton . That's 0.00054 Dalton, while the table lists masses down to even higher accuracy, like 71.03711 .
The example output gives a value down to 3 decimal digits, so at that precision there's a 50% chance that the electron mass will affect the result.
Isn't this problem therefore implicitly teaching an excessive trust in significant digits?
Now, I suspect that the mass spectrometers they use aren't that accurate. But it's bugging me now.
As mbreese wrote elsewhere here, I'm (clearly) reading too much into the problem. I don't think bioinformatics is the right field for me.
The first paragraph of the expanded question text has: "every pair of adjacent amino acids has lost one molecule of water, meaning that a polypeptide containing n amino acids has had n−1 water molecules removed"
The second paragraph has: "Thus, the mass of a protein is the sum of masses of all its residues plus the mass of a single water molecule."
The fifth paragraph has: "The mass of a protein is the sum of the monoisotopic masses of its amino acid residues plus the mass of a single water molecule"
And the monoisotopic mass table says "Note: the monoisotopic mass of water is considered to be 18.01056 Da."
So I thought that the water molecule was important in the calculation.
However, the last paragraph (which I only now closely read) says it isn't important, with "In the following several problems on applications of mass spectrometry, we avoid the complication of having to distinguish between residues and non-residues by only considering peptides excised from the middle of the protein. This is a relatively safe assumption because in practice, peptide analysis is often performed in tandem mass spectrometry."
Since it didn't mention "water", and instead used the specialist term "residue", I missed the connection earlier.
That said, the text seems to use "residue" inconsistently. There's the definition "a residue is a molecule from which a water molecule has been removed; every amino acid in a protein are residues except the leftmost and the rightmost ones."
but there's also the usage: "the mass of a protein is the sum of masses of all its residues plus the mass of a single water molecule"
Surely that should be "the mass of a protein is the sum of masses of all its residues plus the mass of its leftmost and rightmost amino acids minus the mass of a single water molecule", yes?
So I looked up the definition of "amino acid residue". It appears to be https://goldbook.iupac.org/terms/view/A00279 "α-Amino-acid residues are therefore structures that lack a hydrogen atom of the amino group (–NH–CHR–COOH), or the hydroxyl moiety of the carboxyl group (NH2–CHR–CO–), or both (–NH–CHR–COO–); all units of a peptide chain are therefore amino-acid residues".
https://en.wikipedia.org/wiki/Protein_sequencing#Whole-mass_... also agrees that "residue" includes the two amino acids at the ends, saying "The protein’s whole mass is the sum of the masses of its amino-acid residues plus the mass of a water molecule and adjusted for any post-translational modifications"
Which means ... I don't think the author uses the term "residue" correctly?
Or, more likely, I'm confused by the specialist terminology. Can someone clear up my confusion?
I would be happy if I were you though. The point of this exercise is to learn, and I'll bet you'll remember that water molecule for a long time :-)
In the real world you are a mixture of isotopes, so it's better to use the average mass ( average of the different isotope masses, corrected for abundance ) if you want to compare to experimentally determined masses - say from mass spec.
It's not as if average mass is more complex - for the sake of these calculations it's still just a number looked up from a table...
ie why oh why use the wrong value when it's just as easy to use the right one ()?
(
) true it's biology so there isn't a right one in all circumstances - lots of interesting effects eg enzymes having slightly different rates of incorporation for different isotopes - however it's closer to the truth than mono-isotopic.> The monoisotopic mass is not used frequently in fields outside of mass spectrometry because other fields cannot distinguish molecules of different isotopic composition. For this reason, mostly the average molecular mass or even more commonly the molar mass is used. For most purposes such as weighing out bulk chemicals only the molar mass is relevant since what one is weighing is a statistical distribution of varying isotopic compositions.
> This concept is most helpful in mass spectrometry because individual molecules (or atoms, as in ICP-MS) are measured, and not their statistical average as a whole. Since mass spectrometry is often used for quantifying trace-level compounds, maximizing the sensitivity of the analysis is usually desired. By choosing to look for the most abundant isotopic version of a molecule, the analysis is likely to be most sensitive, which enables even smaller amounts of the target compounds to be quantified. Therefore, the concept is very useful to analysts looking for trace-level residues of organic molecules, such as pesticide residue in foods and agricultural products.
[1] https://discourse.julialang.org/t/biostar-handbook-computati...
[1]: https://en.wikipedia.org/wiki/Rosalind_Franklin
It was her X-ray image that led to the discovery of the molecular structure of DNA.
(Reposts are fine after a year: https://news.ycombinator.com/newsfaq.html)
The second thing is picking the first example (the character counting problem). Clicking on the thing, it told me that the important words are highlighted, and that the words 'figure N' refer to the figures on the right -- which felt unnecessary, because it's something that anyone visiting wikipedia, or browsing a book, would know.
The form of learning which I call “problem based” learning is a great format for me. You learn from reading up on a topic. You learn from trying different solutions. Finally, you learn from seeing other people’s answers once you’ve solved it.
Also check out:
Hackerrank.com - all around focus Project Euler- math focus Leetcode - more oriented towards interview training but still useful and fun.
The problems are interesting and fun to solve, they didn’t have a lot of context, though They seemed to have added some at the start of each problem.