The instructions for building any living organism is encoded in its DNA. We now have the ability to sequence (that is, get a full listing) of a person's DNA and look to see which particular genes that person carries, genes which might correspond to specific characteristics or diseases. A DNA sequence is a very long string composed only of the letters, A, C, G, T. For this homework you will use use following sample sequence:
String dna = "ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCC" + "CCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGC" + "CTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGG" + "AAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCC" + "CTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAG" + "TTTAATTACAGACCTGAA";
Your program will ask the user for a query, which will be a combination of the letters ACGT, and you will program will then print out the index position (in the dna
string) of all the places where that query matches. But, you will also need to allow for one character errors. That is, if there is only one letter difference, any letter, then the match should still count. Allowing for these errors is important in Biology because the DNA sequences do contain many errors because of limitations on the sequencing methods.
Enter query:ACAAGAT Match at position 0 Enter query:ACAAGAA Match at position 0 Enter query:CCTGGAGGG Match at position 70 Enter query:CCTGGGGGG Match at position 70 Enter query:ACAA Match at position 0 Match at position 96 Match at position 125 Match at position 131 Match at position 253 Match at position 270 Match at position 272 Match at position 315 Match at position 319 Match at position 321 Match at position 345 Match at position 357 Enter query:CCCCCCC Match at position 82 Match at position 243 Match at position 244 Match at position 245 Match at position 246
Hint: You will need to use a nested loop.
This homework is due Tuesday, September 22 @noon. As always, email your program to your TAs.
For Hackers only: If you feel this problem was too easy and want a little bit of a challenge, you can try to solve the welcome to code jam problem from the Google Code Jam 2009. It is similar to our problem but yet soooo much harder to solve. And, NO, you will not get any bonus points for solving it. If you can solve it, you should not be taking 145. You should be in 350.
2 comments:
any more hints?
hint 2: compare the query string character-by-character, that way you can count mismatches.
Post a Comment