Cancer Patent

Compositions and methods for the therapy and diagnosis of lung cancer

Cancer Patent Abstract



Compositions and methods for the therapy and diagnosis of cancer, particularly lung cancer, are disclosed. Illustrative compositions comprise one or more lung tumor polypeptides, immunogenic portions thereof, polynucleotides that encode such polypeptides, antigen presenting cell that expresses such polypeptides, and T cells that are specific for cells expressing such polypeptides. The disclosed compositions are useful, for example, in the diagnosis, prevention and/or treatment of diseases, particularly lung cancer.

Cancer Patent Claims
What is claimed:

1. An isolated polynucleotide comprising a sequence selected from the group consisting of: (a) the sequence provided in SEQ ID No. 160; and (b) degenerate variants of the sequence provided in SEQ ID No. 160, wherein the degenerate variants encode the polypeptide sequence provided in SEQ ID No. 161.

2. An expression vector comprising the polynucleotide of claim 1 operably linked to an expression control sequence.

3. An isolated host cell transformed or transfected with the expression vector according to claim 2.

4. A composition comprising a first component selected from the group consisting of physiologically acceptable carriers and immunostimulants, and a second component comprising the isolated polynucleotide according to claim 1.

5. A method for stimulating an immune response in a patient, comprising administering to the patient a suitable dose of the composition of claim 4, thereby stimulating an immune response in the patient.

6. An isolated polynucleotide comprising the complement of the sequence of SEQ ID No. 160.

Cancer Patent Description
TECHNICAL FIELD OF THE INVENTION

The present invention relates generally to therapy and diagnosis of cancer, such as lung cancer. The invention is more specifically related to polypeptides, comprising at least a portion of a lung tumor protein, and to polynucleotides encoding such polypeptides. Such polypeptides and polynucleotides are useful in pharmaceutical compositions, e.g., vaccines, and other compositions for the diagnosis and treatment of lung cancer.

BACKGROUND OF THE INVENTION

1. Field of the Invention

Cancer is a significant health problem throughout the world. Although advances have been made in detection and therapy of cancer, no vaccine or other universally successful method for prevention and/or treatment is currently available. Current therapies, which are generally based on a combination of chemotherapy or surgery and radiation, continue to prove inadequate in many patients.

2. Description of Related Art

Lung cancer is the primary cause of cancer death among both men and women in the U.S., with an estimated 172,000 new cases being reported in 1994. The five-year survival rate among all lung cancer patients, regardless of the stage of disease at diagnosis, is only 13%. This contrasts with a five-year survival rate of 46% among cases detected while the disease is still localized. However, only 16% of lung cancers are discovered before the disease has spread.

In spite of considerable research into therapies for these and other cancers, lung cancer remains difficult to diagnose and treat effectively. Accordingly, there is a need in the art for improved methods for detecting and treating such cancers. The present invention fulfills these needs and further provides other related advantages.

SUMMARY OF THE INVENTION

In one aspect, the present invention provides polynucleotide compositions comprising a sequence selected from the group consisting of:

(a) sequences provided in SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479, 483, 485, and 489;

(b) complements of the sequences provided in SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479, 483, 485, and 489;

(c) sequences consisting of at least 10, 15, 20, 25, 30, 35, 40, 45, 50, 75 and 100 contiguous residues of a sequence provided in SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479, 483, 485, and 489;

(d) sequences that hybridize to a sequence provided in SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479, 483, 485, and 489, under moderate or highly stringent conditions;

(e) sequences having at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98% or 99% identity to a sequence of SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479, 483, 485, and 489; and

(f) degenerate variants of a sequence provided in SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479, 483, 485, and 489.

In one preferred embodiment, the polynucleotide compositions of the invention are expressed in at least about 20%, more preferably in at least about 30%, and most preferably in at least about 50% of lung tumors samples tested, at a level that is at least about 2-fold, preferably at least about 5-fold, and most preferably at least about 10-fold higher than that for normal tissues.

The present invention, in another aspect, provides polypeptide compositions comprising an amino acid sequence that is encoded by a polynucleotide sequence described above.

The present invention further provides polypeptide compositions comprising an amino acid sequence selected from the group consisting of sequences recited in SEQ ID NO:152, 155, 156, 165, 166, 169, 170, 172, 174, 176, 226-252, 338-344, 346, 350, 357, 361, 363, 365, 367, 369, 376-382, 387-419, 423, 427, 430, 433, 441, 443, 446, 449, 451-466, 468-477, 480-482, 484, 486, and 490-560.

In certain preferred embodiments, the polypeptides and/or polynucleotides of the present invention are immunogenic, i.e., they are capable of eliciting an immune response, particularly a humoral and/or cellular immune response, as further described herein.

The present invention further provides fragments, variants and/or derivatives of the disclosed polypeptide and/or polynucleotide sequences, wherein the fragments, variants and/or derivatives preferably have a level of immunogenic activity of at least about 50%, preferably at least about 70% and more preferably at least about 90% of the level of immunogenic activity of a polypeptide sequence set forth in SEQ ID NO:152, 155, 156, 165, 166, 169, 170, 172, 174, 176, 226-252, 338-344, 346, 350, 357, 361, 363, 365, 367, 369, 376-382, 387-419, 423, 427, 430, 433, 441, 443, 446, 449, 451-466, 486, and 490-560, or a polypeptide sequence encoded by a polynucleotide sequence set forth in SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479, 483, 485, and 489.

The present invention further provides polynucleotides that encode a polypeptide described above, expression vectors comprising such polynucleotides and host cells transformed or transfected with such expression vectors.

Within other aspects, the present invention provides pharmaceutical compositions comprising a polypeptide or polynucleotide as described above and a physiologically acceptable carrier.

Within a related aspect of the present invention, the pharmaceutical compositions, e.g., vaccine compositions, are provided for prophylactic or therapeutic applications. Such compositions generally comprise an immunogenic polypeptide or polynucleotide of the invention and an immunostimulant, such as an adjuvant.

The present invention further provides pharmaceutical compositions that comprise: (a) an antibody or antigen-binding fragment thereof that specifically binds to a polypeptide of the present invention, or a fragment thereof; and (b) a physiologically acceptable carrier.

Within further aspects, the present invention provides pharmaceutical compositions comprising: (a) an antigen presenting cell that expresses a polypeptide as described above and (b) a pharmaceutically acceptable carrier or excipient. Illustrative antigen presenting cells include dendritic cells, macrophages, monocytes, fibroblasts and B cells.

Within related aspects, pharmaceutical compositions are provided that comprise: (a) an antigen presenting cell that expresses a polypeptide as described above and (b) an immunostimulant.

The present invention further provides, in other aspects, fusion proteins that comprise at least one polypeptide as described above, as well as polynucleotides encoding such fusion proteins, typically in the form of pharmaceutical compositions, e.g., vaccine compositions, comprising a physiologically acceptable carrier and/or an immunostimulant. The fusions proteins may comprise multiple immunogenic polypeptides or portions/variants thereof, as described herein, and may further comprise one or more polypeptide segments for facilitating the expression, purification and/or immunogenicity of the polypeptide(s).

Within further aspects, the present invention provides methods for stimulating an immune response in a patient, preferably a T cell response in a human patient, comprising administering a pharmaceutical composition described herein. The patient may be afflicted with lung cancer, in which case the methods provide treatment for the disease, or patient considered at risk for such a disease may be treated prophylactically.

Within further aspects, the present invention provides methods for inhibiting the development of a cancer in a patient, comprising administering to a patient a pharmaceutical composition as recited above. The patient may be afflicted with lung cancer, in which case the methods provide treatment for the disease, or patient considered at risk for such a disease may be treated prophylactically.

The present invention further provides, within other aspects, methods for removing tumor cells from a biological sample, comprising contacting a biological sample with T cells that specifically react with a polypeptide of the present invention, wherein the step of contacting is performed under conditions and for a time sufficient to permit the removal of cells expressing the protein from the sample.

Within related aspects, methods are provided for inhibiting the development of a cancer in a patient, comprising administering to a patient a biological sample treated as described above.

Methods are further provided, within other aspects, for stimulating and/or expanding T cells specific for a polypeptide of the present invention, comprising contacting T cells with one or more of: (i) a polypeptide as described above; (ii) a polynucleotide encoding such a polypeptide; and/or (iii) an antigen presenting cell that expresses such a polypeptide; under conditions and for a time sufficient to permit the stimulation and/or expansion of T cells. Isolated T cell populations comprising T cells prepared as described above are also provided.

Within further aspects, the present invention provides methods for inhibiting the development of a cancer in a patient, comprising administering to a patient an effective amount of a T cell population as described above.

The present invention further provides methods for inhibiting the development of a cancer in a patient, comprising the steps of: (a) incubating CD4.sup.+ and/or CD8.sup.+ T cells isolated from a patient with one or more of: (i) a polypeptide comprising at least an immunogenic portion of polypeptide disclosed herein; (ii) a polynucleotide encoding such a polypeptide; and (iii) an antigen-presenting cell that expressed such a polypeptide; and (b) administering to the patient an effective amount of the proliferated T cells, and thereby inhibiting the development of a cancer in the patient. Proliferated cells may, but need not, be cloned prior to administration to the patient.

Within further aspects, the present invention provides methods for determining the presence or absence of a cancer, preferably a lung cancer, in a patient comprising: (a) contacting a biological sample obtained from a patient with a binding agent that binds to a polypeptide as recited above; (b) detecting in the sample an amount of polypeptide that binds to the binding agent; and (c) comparing the amount of polypeptide with a predetermined cut-off value, and therefrom determining the presence or absence of a cancer in the patient. Within preferred embodiments, the binding agent is an antibody, more preferably a monoclonal antibody.

The present invention also provides, within other aspects, methods for monitoring the progression of a cancer in a patient. Such methods comprise the steps of: (a) contacting a biological sample obtained from a patient at a first point in time with a binding agent that binds to a polypeptide as recited above; (b) detecting in the sample an amount of polypeptide that binds to the binding agent; (c) repeating steps (a) and (b) using a biological sample obtained from the patient at a subsequent point in time; and (d) comparing the amount of polypeptide detected in step (c) with the amount detected in step (b) and therefrom monitoring the progression of the cancer in the patient.

The present invention further provides, within other aspects, methods for determining the presence or absence of a cancer in a patient, comprising the steps of: (a) contacting a biological sample obtained from a patient with an oligonucleotide that hybridizes to a polynucleotide that encodes a polypeptide of the present invention; (b) detecting in the sample a level of a polynucleotide, preferably mRNA, that hybridizes to the oligonucleotide; and (c) comparing the level of polynucleotide that hybridizes to the oligonucleotide with a predetermined cut-off value, and therefrom determining the presence or absence of a cancer in the patient. Within certain embodiments, the amount of mRNA is detected via polymerase chain reaction using, for example, at least one oligonucleotide primer that hybridizes to a polynucleotide encoding a polypeptide as recited above, or a complement of such a polynucleotide. Within other embodiments, the amount of mRNA is detected using a hybridization technique, employing an oligonucleotide probe that hybridizes to a polynucleotide that encodes a polypeptide as recited above, or a complement of such a polynucleotide.

In related aspects, methods are provided for monitoring the progression of a cancer in a patient, comprising the steps of: (a) contacting a biological sample obtained from a patient with an oligonucleotide that hybridizes to a polynucleotide that encodes a polypeptide of the present invention; (b) detecting in the sample an amount of a polynucleotide that hybridizes to the oligonucleotide; (c) repeating steps (a) and (b) using a biological sample obtained from the patient at a subsequent point in time; and (d) comparing the amount of polynucleotide detected in step (c) with the amount detected in step (b) and therefrom monitoring the progression of the cancer in the patient.

Within further aspects, the present invention provides antibodies, such as monoclonal antibodies, that bind to a polypeptide as described above, as well as diagnostic kits comprising such antibodies. Diagnostic kits comprising one or more oligonucleotide probes or primers as described above are also provided.

These and other aspects of the present invention will become apparent upon reference to the following detailed description. All references disclosed herein are hereby incorporated by reference in their entirety as if each was incorporated individually.

A BRIEF DESCRIPTION OF THE SEQUENCE IDENTIFIERS

SEQ ID NO:1 is the determined cDNA sequence for LST-S1-2

SEQ ID NO:2 is the determined cDNA sequence for LST-S1-28

SEQ ID NO:3 is the determined cDNA sequence for LST-S1-90

SEQ ID NO:4 is the determined cDNA sequence for LST-S1-144

SEQ ID NO:5 is the determined cDNA sequence for LST-S1-133

SEQ ID NO:6 is the determined cDNA sequence for LST-S1-169

SEQ ID NO:7 is the determined cDNA sequence for LST-S2-6

SEQ ID NO:8 is the determined cDNA sequence for LST-S2-11

SEQ ID NO:9 is the determined cDNA sequence for LST-S2-17

SEQ ID NO:10 is the determined cDNA sequence for LST-S2-25

SEQ ID NO:11 is the determined cDNA sequence for LST-S2-39

SEQ ID NO:12 is a first determined cDNA sequence for LST-S2-43

SEQ ID NO:13 is a second determined cDNA sequence for LST-S2-43

SEQ ID NO:14 is the determined cDNA sequence for LST-S2-65

SEQ ID NO:15 is the determined cDNA sequence for LST-S2-68

SEQ ID NO:16 is the determined cDNA sequence for LST-S2-72

SEQ ID NO:17 is the determined cDNA sequence for LST-S2-74

SEQ ID NO:18 is the determined cDNA sequence for LST-S2-103

SEQ ID NO:19 is the determined cDNA sequence for LST-S2-N-1-1F

SEQ ID NO:20 is the determined cDNA sequence for LST-S2-N-1-2A

SEQ ID NO:21 is the determined cDNA sequence for LST-S2-N-1-4H

SEQ ID NO:22 is the determined cDNA sequence for LST-S2-N-1-5A

SEQ ID NO:23 is the determined cDNA sequence for LST-S2-N-1-6B

SEQ ID NO:24 is the determined cDNA sequence for LST-S2-N-1-7B

SEQ ID NO:25 is the determined cDNA sequence for LST-S2-N-1-7H

SEQ ID NO:26 is the determined cDNA sequence for LST-S2-N-1-8A

SEQ ID NO:27 is the determined cDNA sequence for LST-S2-N-1-8D

SEQ ID NO:28 is the determined cDNA sequence for LST-S2-N-1-9A

SEQ ID NO:29 is the determined cDNA sequence for LST-S2-N-1-9E

SEQ ID NO:30 is the determined cDNA sequence for LST-S2-N-1-10A

SEQ ID NO:31 is the determined cDNA sequence for LST-S2-N1-10G

SEQ ID NO:32 is the determined cDNA sequence for LST-S2-N1-11A

SEQ ID NO:33 is the determined cDNA sequence for LST-S2-N1-12C

SEQ ID NO:34 is the determined cDNA sequence for LST-S2-N-1-12E

SEQ ID NO:35 is the determined cDNA sequence for LST-S2-B1-3D

SEQ ID NO:36 is the determined cDNA sequence for LST-S2-B1-6C

SEQ ID NO:37 is the determined cDNA sequence for LST-S2-B1-5D

SEQ ID NO:38 is the determined cDNA sequence for LST-S2-B1-5F

SEQ ID NO:39 is the determined cDNA sequence for LST-S2-B1-6G

SEQ ID NO:40 is the determined cDNA sequence for LST-S2-B1-8A

SEQ ID NO:41 is the determined cDNA sequence for LST-S2-B1-8D

SEQ ID NO:42 is the determined cDNA sequence for LST-S2-B1-10A

SEQ ID NO:43 is the determined cDNA sequence for LST-S2-B1-9B

SEQ ID NO:44 is the determined cDNA sequence for LST-S2-B1-9F

SEQ ID NO:45 is the determined cDNA sequence for LST-S2-B1-12D

SEQ ID NO:46 is the determined cDNA sequence for LST-S2-I2-2B

SEQ ID NO:47 is the determined cDNA sequence for LST-S2-I2-5F

SEQ ID NO:48 is the determined cDNA sequence for LST-S2-I2-6B

SEQ ID NO:49 is the determined cDNA sequence for LST-S2-I2-7F

SEQ ID NO:50 is the determined cDNA sequence for LST-S2-I2-8G

SEQ ID NO:51 is the determined cDNA sequence for LST-S2-I2-9E

SEQ ID NO:52 is the determined cDNA sequence for LST-S2-I2-12B

SEQ ID NO:53 is the determined cDNA sequence for LST-S2-H2-2C

SEQ ID NO:54 is the determined cDNA sequence for LST-S2-H2-1G

SEQ ID NO:55 is the determined cDNA sequence for LST-S2-H2-4G

SEQ ID NO:56 is the determined cDNA sequence for LST-S2-H2-3H

SEQ ID NO:57 is the determined cDNA sequence for LST-S2-H2-5G

SEQ ID NO:58 is the determined cDNA sequence for LST-S2-H2-9B

SEQ ID NO:59 is the determined cDNA sequence for LST-S2-H2-10H

SEQ ID NO:60 is the determined cDNA sequence for LST-S2-H2-12D

SEQ ID NO: 61 is the determined cDNA sequence for LST-S3-2

SEQ ID NO: 62 is the determined cDNA sequence for LST-S3-4

SEQ ID NO: 63 is the determined cDNA sequence for LST-S3-7

SEQ ID NO: 64 is the determined cDNA sequence for LST-S3-8

SEQ ID NO: 65 is the determined cDNA sequence for LST-S3-12

SEQ ID NO: 66 is the determined cDNA sequence for LST-S3-13

SEQ ID NO: 67 is the determined cDNA sequence for LST-S3-14

SEQ ID NO: 68 is the determined cDNA sequence for LST-S3-16

SEQ ID NO: 69 is the determined cDNA sequence for LST-S3-21

SEQ ID NO: 70 is the determined cDNA sequence for LST-S3-22

SEQ ID NO: 71 is the determined cDNA sequence for LST-S1-7

SEQ ID NO: 72 is the determined cDNA sequence for LST-S1-A-1E

SEQ ID NO: 73 is the determined cDNA sequence for LST-S1-A-1G

SEQ ID NO: 74 is the determined cDNA sequence for LST-S1-A-3E

SEQ ID NO: 75 is the determined cDNA sequence for LST-S1-A-4E

SEQ ID NO: 76 is the determined cDNA sequence for LST-S1-A-6D

SEQ ID NO: 77 is the determined cDNA sequence for LST-S1-A-8D

SEQ ID NO: 78 is the determined cDNA sequence for LST-S1-A-10A

SEQ ID NO: 79 is the determined cDNA sequence for LST-S1-A-10C

SEQ ID NO: 80 is the determined cDNA sequence for LST-S1-A-9D

SEQ ID NO: 81 is the determined cDNA sequence for LST-S1-A-10D

SEQ ID NO: 82 is the determined cDNA sequence for LST-S1-A-9H

SEQ ID NO: 83 is the determined cDNA sequence for LST-S1-A-11D

SEQ ID NO: 84 is the determined cDNA sequence for LST-S1-A-12D

SEQ ID NO: 85 is the determined cDNA sequence for LST-S1-A-11E

SEQ ID NO: 86 is the determined cDNA sequence for LST-S1-A-12E

SEQ ID NO: 87 is the determined cDNA sequence for L513S (T3).

SEQ ID NO: 88 is the determined cDNA sequence for L513S contig 1.

SEQ ID NO: 89 is a first determined cDNA sequence for L514S.

SEQ ID NO: 90 is a second determined cDNA sequence for L514S.

SEQ ID NO: 91 is a first determined cDNA sequence for L516S.

SEQ ID NO: 92 is a second determined cDNA sequence for L516S.

SEQ ID NO: 93 is the determined cDNA sequence for L517S.

SEQ ID NO: 94 is the extended cDNA sequence for LST-S1-169 (also known as L519S).

SEQ ID NO: 95 is a first determined cDNA sequence for L520S.

SEQ ID NO: 96 is a second determined cDNA sequence for L520S.

SEQ ID NO: 97 is a first determined cDNA sequence for L521S.

SEQ ID NO: 98 is a second determined cDNA sequence for L521S.

SEQ ID NO: 99 is the determined cDNA sequence for L522S.

SEQ ID NO: 100 is the determined cDNA sequence for L523S.

SEQ ID NO: 101 is the determined cDNA sequence for L524S.

SEQ ID NO: 102 is the determined cDNA sequence for L525S.

SEQ ID NO: 103 is the determined cDNA sequence for L526S.

SEQ ID NO: 104 is the determined cDNA sequence for L527S.

SEQ ID NO: 105 is the determined cDNA sequence for L528S.

SEQ ID NO: 106 is the determined cDNA sequence for L529S.

SEQ ID NO: 107 is a first determined cDNA sequence for L530S.

SEQ ID NO: 108 is a second determined cDNA sequence for L530S.

SEQ ID NO: 109 is the determined full-length cDNA sequence for L531S short form

SEQ ID NO: 110 is the amino acid sequence encoded by SEQ ID NO: 109.

SEQ ID NO: 111 is the determined full-length cDNA sequence for L531S long form

SEQ ID NO: 112 is the amino acid sequence encoded by SEQ ID NO: 111.

SEQ ID NO: 113 is the determined full-length cDNA sequence for L520S.

SEQ ID NO: 114 is the amino acid sequence encoded by SEQ ID NO: 113.

SEQ ID NO: 115 is the determined cDNA sequence for contig 1.

SEQ ID NO: 116 is the determined cDNA sequence for contig 3.

SEQ ID NO: 117 is the determined cDNA sequence for contig 4.

SEQ ID NO: 118 is the determined cDNA sequence for contig 5.

SEQ ID NO: 119 is the determined cDNA sequence for contig 7.

SEQ ID NO: 120 is the determined cDNA sequence for contig 8.

SEQ ID NO: 121 is the determined cDNA sequence for contig 9.

SEQ ID NO: 122 is the determined cDNA sequence for contig 10.

SEQ ID NO: 123 is the determined cDNA sequence for contig 12.

SEQ ID NO: 124 is the determined cDNA sequence for contig 11.

SEQ ID NO: 125 is the determined cDNA sequence for contig 13 (also known as L761P).

SEQ ID NO: 126 is the determined cDNA sequence for contig 15.

SEQ ID NO: 127 is the determined cDNA sequence for contig 16.

SEQ ID NO: 128 is the determined cDNA sequence for contig 17.

SEQ ID NO: 129 is the determined cDNA sequence for contig 19.

SEQ ID NO: 130 is the determined cDNA sequence for contig 20.

SEQ ID NO: 131 is the determined cDNA sequence for contig 22.

SEQ ID NO: 132 is the determined cDNA sequence for contig 24.

SEQ ID NO: 133 is the determined cDNA sequence for contig 29.

SEQ ID NO: 134 is the determined cDNA sequence for contig 31.

SEQ ID NO: 135 is the determined cDNA sequence for contig 33.

SEQ ID NO: 136 is the determined cDNA sequence for contig 38.

SEQ ID NO: 137 is the determined cDNA sequence for contig 39.

SEQ ID NO: 138 is the determined cDNA sequence for contig 41.

SEQ ID NO: 139 is the determined cDNA sequence for contig 43.

SEQ ID NO: 140 is the determined cDNA sequence for contig 44.

SEQ ID NO: 141 is the determined cDNA sequence for contig 45.

SEQ ID NO: 142 is the determined cDNA sequence for contig 47.

SEQ ID NO: 143 is the determined cDNA sequence for contig 48.

SEQ ID NO: 144 is the determined cDNA sequence for contig 49.

SEQ ID NO: 145 is the determined cDNA sequence for contig 50.

SEQ ID NO: 146 is the determined cDNA sequence for contig 53.

SEQ ID NO: 147 is the determined cDNA sequence for contig 54.

SEQ ID NO: 148 is the determined cDNA sequence for contig 56.

SEQ ID NO: 149 is the determined cDNA sequence for contig 57.

SEQ ID NO: 150 is the determined cDNA sequence for contig 58.

SEQ ID NO: 151 is the full-length cDNA sequence for L530S.

SEQ ID NO: 152 is the amino acid sequence encoded by SEQ ID NO: 151

SEQ ID NO: 153 is the full-length cDNA sequence of a first variant of L514S

SEQ ID NO: 154 is the full-length cDNA sequence of a second variant of L514S

SEQ ID NO: 155 is the amino acid sequence encoded by SEQ ID NO: 153.

SEQ ID NO: 156 is the amino acid sequence encoded by SEQ ID NO: 154.

SEQ ID NO: 157 is the determined cDNA sequence for contig 59.

SEQ ID NO: 158 is the full-length cDNA sequence for L763P (also referred to as contig 22).

SEQ ID NO: 159 is the amino acid sequence encoded by SEQ ID NO: 158.

SEQ ID NO: 160 is the full-length cDNA sequence for L762P (also referred to as contig 17).

SEQ ID NO: 161 is the amino acid sequence encoded by SEQ ID NO: 160.

SEQ ID NO: 162 is the determined cDNA sequence for L515S.

SEQ ID NO: 163 is the full-length cDNA sequence of a first variant of L524S.

SEQ ID NO: 164 is the full-length cDNA sequence of a second variant of L524S.

SEQ ID NO: 165 is the amino acid sequence encoded by SEQ ID NO: 163.

SEQ ID NO: 166 is the amino acid sequence encoded by SEQ ID NO: 164.

SEQ ID NO: 167 is the full-length cDNA sequence of a first variant of L762P.

SEQ ID NO: 168 is the full-length cDNA sequence of a second variant of L762P.

SEQ ID NO: 169 is the amino acid sequence encoded by SEQ ID NO: 167.

SEQ ID NO: 170 is the amino acid sequence encoded by SEQ ID NO: 168.

SEQ ID NO: 171 is the full-length cDNA sequence for L773P (also referred to as contig 56).

SEQ ID NO: 172 is the amino acid sequence encoded by SEQ ID NO: 171.

SEQ ID NO: 173 is an extended cDNA sequence for L519S.

SEQ ID NO: 174 is the amino acid sequence encoded by SEQ ID NO: 174.

SEQ ID NO: 175 is the full-length cDNA sequence for L523S.

SEQ ID NO: 176 is the amino acid sequence encoded by SEQ ID NO: 175.

SEQ ID NO: 177 is the determined cDNA sequence for LST-sub5-7A.

SEQ ID NO: 178 is the determined cDNA sequence for LST-sub5-8G.

SEQ ID NO: 179 is the determined cDNA sequence for LST-sub5-8H.

SEQ ID NO: 180 is the determined cDNA sequence for LST-sub5-10B.

SEQ ID NO: 181 is the determined cDNA sequence for LST-sub5-10H.

SEQ ID NO: 182 is the determined cDNA sequence for LST-sub5-12B.

SEQ ID NO: 183 is the determined cDNA sequence for LST-sub5-11C.

SEQ ID NO: 184 is the determined cDNA sequence for LST-sub6-1c.

SEQ ID NO: 185 is the determined cDNA sequence for LST-sub6-2f.

SEQ ID NO: 186 is the determined cDNA sequence for LST-sub6-2G.

SEQ ID NO: 187 is the determined cDNA sequence for LST-sub6-4d.

SEQ ID NO: 188 is the determined cDNA sequence for LST-sub6-4e.

SEQ ID NO: 189 is the determined cDNA sequence for LST-sub6-4f.

SEQ ID NO: 190 is the determined cDNA sequence for LST-sub6-3h.

SEQ ID NO: 191 is the determined cDNA sequence for LST-sub6-5d.

SEQ ID NO: 192 is the determined cDNA sequence for LST-sub6-5h.

SEQ ID NO: 193 is the determined cDNA sequence for LST-sub6-6h.

SEQ ID NO: 194 is the determined cDNA sequence for LST-sub6-7a.

SEQ ID NO: 195 is the determined cDNA sequence for LST-sub6-8a.

SEQ ID NO: 196 is the determined cDNA sequence for LST-sub6-7d.

SEQ ID NO: 197 is the determined cDNA sequence for LST-sub6-7e.

SEQ ID NO: 198 is the determined cDNA sequence for LST-sub6-8e.

SEQ ID NO: 199 is the determined cDNA sequence for LST-sub6-7g.

SEQ ID NO: 200 is the determined cDNA sequence for LST-sub6-9f.

SEQ ID NO: 201 is the determined cDNA sequence for LST-sub6-9h.

SEQ ID NO: 202 is the determined cDNA sequence for LST-sub6-11b.

SEQ ID NO: 203 is the determined cDNA sequence for LST-sub6-11c.

SEQ ID NO: 204 is the determined cDNA sequence for LST-sub6-12c.

SEQ ID NO: 205 is the determined cDNA sequence for LST-sub6-12e.

SEQ ID NO: 206 is the determined cDNA sequence for LST-sub6-12f.

SEQ ID NO: 207 is the determined cDNA sequence for LST-sub6-11g.

SEQ ID NO: 208 is the determined cDNA sequence for LST-sub6-12g.

SEQ ID NO: 209 is the determined cDNA sequence for LST-sub6-12h.

SEQ ID NO: 210 is the determined cDNA sequence for LST-sub6-II-1a.

SEQ ID NO: 211 is the determined cDNA sequence for LST-sub6-II-2b.

SEQ ID NO: 212 is the determined cDNA sequence for LST-sub6-II-2g.

SEQ ID NO: 213 is the determined cDNA sequence for LST-sub6-II-1h.

SEQ ID NO: 214 is the determined cDNA sequence for LST-sub6-II-4a.

SEQ ID NO: 215 is the determined cDNA sequence for LST-sub6-II-4b.

SEQ ID NO: 216 is the determined cDNA sequence for LST-sub6-II-3e.

SEQ ID NO: 217 is the determined cDNA sequence for LST-sub6-II-4f.

SEQ ID NO: 218 is the determined cDNA sequence for LST-sub6-II-4g.

SEQ ID NO: 219 is the determined cDNA sequence for LST-sub6-II-4h.

SEQ ID NO: 220 is the determined cDNA sequence for LST-sub6-II-5c.

SEQ ID NO: 221 is the determined cDNA sequence for LST-sub6-II-5e.

SEQ ID NO: 222 is the determined cDNA sequence for LST-sub6-II-6f.

SEQ ID NO: 223 is the determined cDNA sequence for LST-sub6-II-5g.

SEQ ID NO: 224 is the determined cDNA sequence for LST-sub6-II-6g.

SEQ ID NO: 225 is the amino acid sequence for L528S.

SEQ ID NO: 226-251 are synthetic peptides derived from L762P.

SEQ ID NO: 252 is the expressed amino acid sequence of L514S.

SEQ ID NO: 253 is the DNA sequence corresponding to SEQ ID NO: 252.

SEQ ID NO: 254 is the DNA sequence of a L762P expression construct.

SEQ ID NO: 255 is the determined cDNA sequence for clone 23785.

SEQ ID NO: 256 is the determined cDNA sequence for clone 23786.

SEQ ID NO: 257 is the determined cDNA sequence for clone 23788.

SEQ ID NO: 258 is the determined cDNA sequence for clone 23790.

SEQ ID NO: 259 is the determined cDNA sequence for clone 23793.

SEQ ID NO: 260 is the determined cDNA sequence for clone 23794.

SEQ ID NO: 261 is the determined cDNA sequence for clone 23795.

SEQ ID NO: 262 is the determined cDNA sequence for clone 23796.

SEQ ID NO: 263 is the determined cDNA sequence for clone 23797.

SEQ ID NO: 264 is the determined cDNA sequence for clone 23798.

SEQ ID NO: 265 is the determined cDNA sequence for clone 23799.

SEQ ID NO: 266 is the determined cDNA sequence for clone 23800.

SEQ ID NO: 267 is the determined cDNA sequence for clone 23802.

SEQ ID NO: 268 is the determined cDNA sequence for clone 23803.

SEQ ID NO: 269 is the determined cDNA sequence for clone 23804.

SEQ ID NO: 270 is the determined cDNA sequence for clone 23805.

SEQ ID NO: 271 is the determined cDNA sequence for clone 23806.

SEQ ID NO: 272 is the determined cDNA sequence for clone 23807.

SEQ ID NO: 273 is the determined cDNA sequence for clone 23808.

SEQ ID NO: 274 is the determined cDNA sequence for clone 23809.

SEQ ID NO: 275 is the determined cDNA sequence for clone 23810.

SEQ ID NO: 276 is the determined cDNA sequence for clone 23811.

SEQ ID NO: 277 is the determined cDNA sequence for clone 23812.

SEQ ID NO: 278 is the determined cDNA sequence for clone 23813.

SEQ ID NO: 279 is the determined cDNA sequence for clone 23815.

SEQ ID NO: 280 is the determined cDNA sequence for clone 25298.

SEQ ID NO: 281 is the determined cDNA sequence for clone 25299.

SEQ ID NO: 282 is the determined cDNA sequence for clone 25300.

SEQ ID NO: 283 is the determined cDNA sequence for clone 25301

SEQ ID NO: 284 is the determined cDNA sequence for clone 25304

SEQ ID NO: 285 is the determined cDNA sequence for clone 25309.

SEQ ID NO: 286 is the determined cDNA sequence for clone 25312.

SEQ ID NO: 287 is the determined cDNA sequence for clone 25317.

SEQ ID NO:288 is the determined cDNA sequence for clone 25321.

SEQ ID NO:289 is the determined cDNA sequence for clone 25323.

SEQ ID NO:290 is the determined cDNA sequence for clone 25327.

SEQ ID NO:291 is the determined cDNA sequence for clone 25328.

SEQ ID NO:292 is the determined cDNA sequence for clone 25332.

SEQ ID NO:293 is the determined cDNA sequence for clone 25333.

SEQ ID NO:294 is the determined cDNA sequence for clone 25336.

SEQ ID NO:295 is the determined cDNA sequence for clone 25340.

SEQ ID NO:296 is the determined cDNA sequence for clone 25342.

SEQ ID NO:297 is the determined cDNA sequence for clone 25356.

SEQ ID NO:298 is the determined cDNA sequence for clone 25357.

SEQ ID NO:299 is the determined cDNA sequence for clone 25361.

SEQ ID NO:300 is the determined cDNA sequence for clone 25363.

SEQ ID NO:301 is the determined cDNA sequence for clone 25397.

SEQ ID NO:302 is the determined cDNA sequence for clone 25402.

SEQ ID NO:303 is the determined cDNA sequence for clone 25403.

SEQ ID NO:304 is the determined cDNA sequence for clone 25405.

SEQ ID NO:305 is the determined cDNA sequence for clone 25407.

SEQ ID NO:306 is the determined cDNA sequence for clone 25409.

SEQ ID NO:307 is the determined cDNA sequence for clone 25396.

SEQ ID NO:308 is the determined cDNA sequence for clone 25414.

SEQ ID NO:309 is the determined cDNA sequence for clone 25410.

SEQ ID NO:310 is the determined cDNA sequence for clone 25406.

SEQ ID NO:311 is the determined cDNA sequence for clone 25306.

SEQ ID NO:312 is the determined cDNA sequence for clone 25362.

SEQ ID NO:313 is the determined cDNA sequence for clone 25360.

SEQ ID NO:314 is the determined cDNA sequence for clone 25398.

SEQ ID NO:315 is the determined cDNA sequence for clone 25355.

SEQ ID NO:316 is the determined cDNA sequence for clone 25351.

SEQ ID NO:317 is the determined cDNA sequence for clone 25331.

SEQ ID NO:318 is the determined cDNA sequence for clone 25338.

SEQ ID NO:319 is the determined cDNA sequence for clone 25335.

SEQ ID NO:320 is the determined cDNA sequence for clone 25329.

SEQ ID NO:321 is the determined cDNA sequence for clone 25324.

SEQ ID NO:322 is the determined cDNA sequence for clone 25322.

SEQ ID NO:323 is the determined cDNA sequence for clone 25319.

SEQ ID NO:324 is the determined cDNA sequence for clone 25316.

SEQ ID NO:325 is the determined cDNA sequence for clone 25311.

SEQ ID NO:326 is the determined cDNA sequence for clone 25310.

SEQ ID NO:327 is the determined cDNA sequence for clone 25302.

SEQ ID NO:328 is the determined cDNA sequence for clone 25315.

SEQ ID NO:329 is the determined cDNA sequence for clone 25308.

SEQ ID NO:330 is the determined cDNA sequence for clone 25303.

SEQ ID NO:331-337 are the cDNA sequences of isoforms of the p53 tumor suppressor homologue, p63 (also referred to as L530S).

SEQ ID NO:338-344 are the amino acid sequences encoded by SEQ ID NO:331-337, respectively

SEQ ID NO:345 is a second cDNA sequence for the antigen L763P.

SEQ ID NO:346 is the amino acid sequence encoded by the sequence of SEQ ID NO: 345.

SEQ ID NO:347 is a determined full-length cDNA sequence for L523S.

SEQ ID NO:348 is the amino acid sequence encoded by SEQ ID NO: 347.

SEQ ID NO:349 is the cDNA sequence encoding the N-terminal portion of L773P.

SEQ ID NO:350 is the amino acid sequence of the N-terminal portion of L773P.

SEQ ID NO:351 is the DNA sequence for a fusion of Ra12 and the N-terminal portion of L763P.

SEQ ID NO:352 is the amino acid sequence of the fusion of Ra12 and the N-terminal portion of L763P.

SEQ ID NO:353 is the DNA sequence for a fusion of Ra12 and the C-terminal portion of L763P.

SEQ ID NO:354 is the amino acid sequence of the fusion of Ra12 and the C-terminal portion of L763P.

SEQ ID NO:355 is a primer.

SEQ ID NO:356 is a primer.

SEQ ID NO:357 is the protein sequence of expressed recombinant L762P.

SEQ ID NO:358 is the DNA sequence of expressed recombinant L762P.

SEQ ID NO:359 is a primer.

SEQ ID NO:360 is a primer.

SEQ ID NO:361 is the protein sequence of expressed recombinant L773P A.

SEQ ID NO:362 is the DNA sequence of expressed recombinant L773P A.

SEQ ID NO:363 is an epitope derived from clone L773P polypeptide.

SEQ ID NO:364 is a polynucleotide encoding the polypeptide of SEQ ID NO:363.

SEQ ID NO:365 is an epitope derived from clone L773P polypeptide.

SEQ ID NO:366 is a polynucleotide encoding the polypeptide of SEQ ID NO:365.

SEQ ID NO:367 is an epitope consisting of amino acids 571-590 of SEQ ID NO:161, clone L762P.

SEQ ID NO:368 is the full-length DNA sequence for contig 13 (SEQ ID NO:125), also referred to as L761P.

SEQ ID NO:369 is the protein sequence encoded by the DNA sequence of SEQ ID NO:368.

SEQ ID NO:370 is an L762P DNA sequence from nucleotides 2071-2130.

SEQ ID NO:371 is an L762P DNA sequence from nucleotides 1441-1500.

SEQ ID NO:372 is an L762P DNA sequence from nucleotides 1936-1955.

SEQ ID NO:373 is an L762P DNA sequence from nucleotides 2620-2679.

SEQ ID NO:374 is an L762P DNA sequence from nucleotides 1801-1860.

SEQ ID NO:375 is an L762P DNA sequence from nucleotides 1531-1591.

SEQ ID NO:376 is the amino acid sequence of the L762P peptide encoded by SEQ ID NO:373.

SEQ ID NO:377 is the amino acid sequence of the L762P peptide encoded by SEQ ID NO:370.

SEQ ID NO:378 is the amino acid sequence of the L762P peptide encoded by SEQ ID NO:372.

SEQ ID NO:379 is the amino acid sequence of the L762P peptide encoded by SEQ ID NO:374.

SEQ ID NO:380 is the amino acid sequence of the L762P peptide encoded by SEQ ID NO:371.

SEQ ID NO:381 is the amino acid sequence of the L762P peptide encoded by SEQ ID NO:375.

SEQ ID NO:382 is the amino acid sequence of an epitope of L762P.

SEQ ID NO:383-386 are PCR primers.

SEQ ID NO:387-395 are the amino acid sequences of L773P peptides.

SEQ ID NO:396-419 are the amino acid sequences of L523S peptides.

SEQ ID NO:420 is the determined cDNA sequence for clone #19014.

SEQ ID NO:421 is the forward primer PDM-278 for the L514s-13160 coding region.

SEQ ID NO:422 is the reverse primer PDM-278 for the L514S-13160 coding region.

SEQ ID NO:423 is the amino acid sequence for the expressed recombinant L514S.

SEQ ID NO:424 is the DNA coding sequence for the recombinant L514S.

SEQ ID NO:425 is the forward primer PDM-414 for the L523S coding region.

SEQ ID NO:426 is the reverse primer PDM-414 for the L523S coding region.

SEQ ID NO:427 is the amino acid sequence for the expressed recombinant L523S.

SEQ ID NO:428 is the DNA coding sequence for the recombinant L523S.

SEQ ID NO:429 is the reverse primer PDM-279 for the L762PA coding region.

SEQ ID NO:430 is the amino acid sequence for the expressed recombinant L762PA.

SEQ ID NO:431 is the DNA coding sequence for the recombinant L762PA.

SEQ ID NO:432 is the reverse primer PDM-300 for the L773P coding region.

SEQ ID NO:433 is the amino acid sequence of the expressed recombinant L773P.

SEQ ID NO:434 is the DNA coding sequence for the recombinant L773P.

SEQ ID NO:435 is the forward primer for TCR Valpha8.

SEQ ID NO:436 is the reverse primer for TCR Valpha8.

SEQ ID NO:437 is the forward primer for TCR Vbeta8.

SEQ ID NO:438 is the reverse primer for TCR Vbeta8.

SEQ ID NO:439 is the TCR Valpha DNA sequence of the TCR clone specific for the lung antigen L762P.

SEQ ID NO:440 is the TCR Vbeta DNA sequence of the TCR clone specific for the lung antigen L762P.

SEQ ID NO:441 is the amino acid sequence of L763 peptide #2684.

SEQ ID NO:442 is the predicted full-length cDNA for the cloned partial sequence of clone L529S (SEQ ID NO:106).

SEQ ID NO:443 is the deduced amino acid sequence encoded by SEQ ID NO:442.

SEQ ID NO:444 is the forward primer PDM-734 for the coding region of clone L523S.

SEQ ID NO:445 is the reverse primer PDM-735 for the coding region of clone L523S.

SEQ ID NO:446 is the amino acid sequence for the expressed recombinant L523S.

SEQ ID NO:447 is the DNA coding sequence for the recombinant L523S.

SEQ ID NO:448 is another forward primer PDM-733 for the coding region of clone L523S.

SEQ ID NO:449 is the amino acid sequence for a second expressed recombinant L523S.

SEQ ID NO:450 is the DNA coding sequence for a second recombinant L523S.

SEQ ID NO:451 corresponds to amino acids 86-110, an epitope of L514S-specific in the generation of antibodies.

SEQ ID NO:452 corresponds to amino acids 21-45, an epitope of L514S-specific in the generation of antibodies.

SEQ ID NO:453 corresponds to amino acids 121-135, an epitope of L514S-specific in the generation of antibodies.

SEQ ID NO:454 corresponds to amino acids 440-460, an epitope of L523S-specific in the generation of antibodies.

SEQ ID NO:455 corresponds to amino acids 156-175, an epitope of L523S-specific in the generation of antibodies.

SEQ ID NO:456 corresponds to amino acids 326-345, an epitope of L523S-specific in the generation of antibodies.

SEQ ID NO:457 corresponds to amino acids 40-59, an epitope of L523S-specific in the generation of antibodies.

SEQ ID NO:458 corresponds to amino acids 80-99, an epitope of L523S-specific in the generation of antibodies.

SEQ ID NO:459 corresponds to amino acids 160-179, an epitope of L523S-specific in the generation of antibodies.

SEQ ID NO:460 corresponds to amino acids 180-199, an epitope of L523S-specific in the generation of antibodies.

SEQ ID NO:461 corresponds to amino acids 320-339, an epitope of L523S-specific in the generation of antibodies.

SEQ ID NO:462 corresponds to amino acids 340-359, an epitope of L523S-specific in the generation of antibodies.

SEQ ID NO:463 corresponds to amino acids 370-389, an epitope of L523S-specific in the generation of antibodies.

SEQ ID NO:464 corresponds to amino acids 380-399, an epitope of L523S-specific in the generation of antibodies.

SEQ ID NO:465 corresponds to amino acids 37-55, an epitope of L523S-recognized by the L523S-specific CTL line 6B1.

SEQ ID NO:466 corresponds to amino acids 41-51, the mapped antigenic epitope of L523S-recognized by the L523S-specific CTL line 6B1.

SEQ ID NO:467 corresponds to the DNA sequence which encodes SEQ ID NO:466.

SEQ ID NO:468 corresponds to the amino acids of peptide 16, 17 of hL523S.

SEQ ID NO:469 corresponds to the amino acids of peptide 16, 17 of mL523S.

SEQ ID NO:470 corresponds to the amino acids of the 20-mer peptide #4 of L523S.

SEQ ID NO:471 corresponds to the amino acids of the overlapping 20-mer peptides #14-#19 of L523S.

SEQ ID NO:472 corresponds to the amino acids of the overlapping 20-mer peptides #20-#25 of L523S.

SEQ ID NO:473 corresponds to the amino acids of the overlapping 20-mer peptides #26-#30.5 of L523S.

SEQ ID NO:474 corresponds to the amino acids of the overlapping 20-mer peptides #31-#36 of L523S.

SEQ ID NO:475 corresponds to the amino acids of the overlapping 20-mer peptides #37-#40.5 of L523S.

SEQ ID NO:476 corresponds to the amino acids of the overlapping 20-mer peptides #41-#46.5 of L523S.

SEQ ID NO:477 corresponds to the amino acids of the overlapping 20-mer peptides #47-#53 of L523S.

SEQ ID NO:478 is the cDNA encoding the full length ORF of L523S.

SEQ ID NO:479 is the cDNA sequence of Adenovirus-L523s, an Adenovirus vector containing the cDNA encoding the full-length ORF of L523S.

SEQ ID NO:480 is the amino acid sequence of the full-length L523S protein as expressed from the Adenovirus vector set forth in SEQ ID NO:479.

SEQ ID NO:481 is amino acids 9-27 of L523S containing a CD8 T cell epitope as described in example 37.

SEQ ID NO:482 is amino acids 33-75 of L523S containing a CD4 T cell epitope as described in example 37.

SEQ ID NO:483 is the determined cDNA sequence for the Rhesus macaque L523S homologue.

SEQ ID NO:484 is the predicted amino acid sequence for the Rhesus macaque L523S homologue, encoded by the polynucleotide sequence set forth in SEQ ID NO:483.

SEQ ID NO:485 is the full-length L523S cDNA, together with its Kozak consensus sequence and a C-terminal 10.times. His Tag for expression in insect cells using a baculovirus system.

SEQ ID NO:486 is the full-length L523S amino acid sequence encoded by the polynucleotide set forth in SEQ ID NO:485.

SEQ ID NO:487 is the L523F1 PCR primer.

SEQ ID NO:488 is the L523RV1 PCR primer.

SEQ ID NO:489 is the cDNA encoding the minimal epitope of L514S set forth in SEQ ID NO:490.

SEQ ID NO:490 is the amino acid sequence of peptide #10 minimal epitope of L514S.

SEQ ID NO:491 is a minimal 9-mer CTL epitope of L523S.

SEQ ID NO:492 is the amino acid sequence of peptide #2 of NY-ESO-1.

SEQ ID NO:493 is the amino acid sequence of peptide #3 of NY-ESO-1.

SEQ ID NO:494 is the amino acid sequence of peptide #10 of NY-ESO-1.

SEQ ID NO:495 is the amino acid sequence of peptide #17 of NY-ESO-1.

SEQ ID NO:496 is the amino acid sequence of peptide #5 of NY-ESO-1.

SEQ ID NO:497 is the amino acid sequence of peptide #42 of L523S.

SEQ ID NO:498 is the amino acid sequence of IMP-1 peptide #42.

SEQ ID NO:499 is the amino acid sequence of IMP-2 peptide #42.

SEQ ID NO:500 is the amino acid sequence of IMP-1.

SEQ ID NO:501 is the amino acid sequence of IMP-2.

SEQ ID NO:502 is the amino acid sequence of IMP-1 peptide #32.

SEQ ID NO:503 is the amino acid sequence of IMP-2 peptide #32.

SEQ ID NO:504 is the amino acid sequence of peptide #1 of L523S.

SEQ ID NO:505 is the amino acid sequence of peptide #2 of L523S.

SEQ ID NO:506 is the amino acid sequence of peptide #3 of L523S.

SEQ ID NO:507 is the amino acid sequence of peptide #4 of L523S.

SEQ ID NO:508 is the amino acid sequence of peptide #5 of L523S.

SEQ ID NO:509 is the amino acid sequence of peptide #6 of L523S.

SEQ ID NO:510 is the amino acid sequence of peptide #7 of L523S.

SEQ ID NO:511 is the amino acid sequence of peptide #8 of L523S.

SEQ ID NO:512 is the amino acid sequence of peptide #9 of L523S.

SEQ ID NO:513 is the amino acid sequence of peptide #10 of L523S.

SEQ ID NO:514 is the amino acid sequence of peptide #11 of L523S.

SEQ ID NO:515 is the amino acid sequence of peptide #12 of L523S.

SEQ ID NO:516 is the amino acid sequence of peptide #13 of L523S.

SEQ ID NO:517 is the amino acid sequence of peptide #14 of L523S.

SEQ ID NO:518 is the amino acid sequence of peptide #15 of L523S.

SEQ ID NO:519 is the amino acid sequence of peptide #16 of L523S.

SEQ ID NO:520 is the amino acid sequence of peptide #17 of L523S.

SEQ ID NO:521 is the amino acid sequence of peptide #18 of L523S.

SEQ ID NO:522 is the amino acid sequence of peptide #19 of L523S.

SEQ ID NO:523 is the amino acid sequence of peptide #20 of L523S.

SEQ ID NO:524 is the amino acid sequence of peptide #21 of L523S.

SEQ ID NO:525 is the amino acid sequence of peptide #22 of L523S.

SEQ ID NO:526 is the amino acid sequence of peptide #23 of L523S.

SEQ ID NO:527 is the amino acid sequence of peptide #24 of L523S.

SEQ ID NO:528 is the amino acid sequence of peptide #25 of L523S.

SEQ ID NO:529 is the amino acid sequence of peptide #26 of L523S.

SEQ ID NO:530 is the amino acid sequence of peptide #27 of L523S.

SEQ ID NO:531 is the amino acid sequence of peptide #28 of L523S.

SEQ ID NO:532 is the amino acid sequence of peptide #29 of L523S.

SEQ ID NO:533 is the amino acid sequence of peptide #30 of L523S.

SEQ ID NO:534 is the amino acid sequence of peptide #30.5 of L523S.

SEQ ID NO:535 is the amino acid sequence of peptide #31 of L523S.

SEQ ID NO:536 is the amino acid sequence of peptide #32 of L523S.

SEQ ID NO:537 is the amino acid sequence of peptide #33 of L523S.

SEQ ID NO:538 is the amino acid sequence of peptide #34 of L523S.

SEQ ID NO:539 is the amino acid sequence of peptide #35 of L523S.

SEQ ID NO:540 is the amino acid sequence of peptide #36 of L523S.

SEQ ID NO:541 is the amino acid sequence of peptide #37 of L523S.

SEQ ID NO:542 is the amino acid sequence of peptide #38 of L523S.

SEQ ID NO:543 is the amino acid sequence of peptide #38.5 of L523S.

SEQ ID NO:544 is the amino acid sequence of peptide #39 of L523S.

SEQ ID NO:545 is the amino acid sequence of peptide #40 of L523S.

SEQ ID NO:546 is the amino acid sequence of peptide #40.5 of L523S.

SEQ ID NO:547 is the amino acid sequence of peptide #41 of L523S.

SEQ ID NO:548 is the amino acid sequence of peptide #42 of L523S.

SEQ ID NO:549 is the amino acid sequence of peptide #43 of L523S.

SEQ ID NO:550 is the amino acid sequence of peptide #44 of L523S.

SEQ ID NO:551 is the amino acid sequence of peptide #45 of L523S.

SEQ ID NO:552 is the amino acid sequence of peptide #46 of L523S.

SEQ ID NO:553 is the amino acid sequence of peptide #46.5 of L523S.

SEQ ID NO:554 is the amino acid sequence of peptide #47 of L523S.

SEQ ID NO:555 is the amino acid sequence of peptide #48 of L523S.

SEQ ID NO:556 is the amino acid sequence of peptide #49 of L523S.

SEQ ID NO:557 is the amino acid sequence of peptide #50 of L523S.

SEQ ID NO:558 is the amino acid sequence of peptide #51 of L523S.

SEQ ID NO:559 is the amino acid sequence of peptide #52 of L523S.

SEQ ID NO:560 is the amino acid sequence of peptide #53 of L523S.

DETAILED DESCRIPTION OF THE INVENTION

U.S. patents, U.S. patent application publications, U.S. patent applications, foreign patents, foreign patent applications and non-patent publications referred to in this specification and/or listed in the Application Data Sheet, are incorporated herein by reference, in their entirety.

The present invention is directed generally to compositions and their use in the therapy and diagnosis of cancer, particularly lung cancer. As described further below, illustrative compositions of the present invention include, but are not restricted to, polypeptides, particularly immunogenic polypeptides, polynucleotides encoding such polypeptides, antibodies and other binding agents, antigen presenting cells (APCs) and immune system cells (e.g., T cells).

The practice of the present invention will employ, unless indicated specifically to the contrary, conventional methods of virology, immunology, microbiology, molecular biology and recombinant DNA techniques within the skill of the art, many of which are described below for the purpose of illustration. Such techniques are explained fully in the literature. See, e.g., Sambrook, et al. Molecular Cloning: A Laboratory Manual (2nd Edition, 1989); Maniatis et al. Molecular Cloning: A Laboratory Manual (1982); DNA Cloning: A Practical Approach, vol. I & II (D. Glover, ed.); Oligonucleotide Synthesis (N. Gait, ed., 1984); Nucleic Acid Hybridization (B. Hames & S. Higgins, eds., 1985); Transcription and Translation (B. Hames & S. Higgins, eds., 1984); Animal Cell Culture (R. Freshney, ed., 1986); Perbal, A Practical Guide to Molecular Cloning (1984).

All publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entirety.

As used in this specification and the appended claims, the singular forms "a," "an" and "the" include plural references unless the content clearly dictates otherwise.

Polypeptide Compositions

As used herein, the term "polypeptide" is used in its conventional meaning, i.e., as a sequence of amino acids. The polypeptides are not limited to a specific length of the product; thus, peptides, oligopeptides, and proteins are included within the definition of polypeptide, and such terms may be used interchangeably herein unless specifically indicated otherwise. This term also does not refer to or exclude post-expression modifications of the polypeptide, for example, glycosylations, acetylations, phosphorylations and the like, as well as other modifications known in the art, both naturally occurring and non-naturally occurring. A polypeptide may be an entire protein, or a subsequence thereof. Particular polypeptides of interest in the context of this invention are amino acid subsequences comprising epitopes, i.e., antigenic determinants substantially responsible for the immunogenic properties of a polypeptide and being capable of evoking an immune response.

Particularly illustrative polypeptides of the present invention comprise those encoded by a polynucleotide sequence set forth in any one of SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479 and 483, or a sequence that hybridizes under moderately stringent conditions, or, alternatively, under highly stringent conditions, to a polynucleotide sequence set forth in any one of SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479 and 483. Certain illustrative polypeptides of the invention comprise amino acid sequences as set forth in any one of SEQ ID NO:152, 155, 156, 165, 166, 169, 170, 172, 174, 176, 226-252, 338-344, 346, 350, 357, 361, 363, 365, 367, 369, 376-382, 387-419, 423, 427, 430, 433, 441, 443, 446, 449, 451-466, 468-477, 480-482, and 484.

The polypeptides of the present invention are sometimes herein referred to as lung tumor proteins or lung tumor polypeptides, as an indication that their identification has been based at least in part upon their increased levels of expression in lung tumor samples. Thus, a "lung tumor polypeptide" or "lung tumor protein," refers generally to a polypeptide sequence of the present invention, or a polynucleotide sequence encoding such a polypeptide, that is expressed in a substantial proportion of lung tumor samples, for example preferably greater than about 20%, more preferably greater than about 30%, and most preferably greater than about 50% or more of lung tumor samples tested, at a level that is at least two fold, and preferably at least five fold, greater than the level of expression in normal tissues, as determined using a representative assay provided herein. A lung tumor polypeptide sequence of the invention, based upon its increased level of expression in tumor cells, has particular utility both as a diagnostic marker as well as a therapeutic target, as further described below.

In certain preferred embodiments, the polypeptides of the invention are immunogenic, i.e., they react detectably within an immunoassay (such as an ELISA or T-cell stimulation assay) with antisera and/or T-cells from a patient with lung cancer. Screening for immunogenic activity can be performed using techniques well known to the skilled artisan. For example, such screens can be performed using methods such as those described in Harlow and Lane, Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, 1988. In one illustrative example, a polypeptide may be immobilized on a solid support and contacted with patient sera to allow binding of antibodies within the sera to the immobilized polypeptide. Unbound sera may then be removed and bound antibodies detected using, for example, .sup.125I-labeled Protein A.

As would be recognized by the skilled artisan, immunogenic portions of the polypeptides disclosed herein are also encompassed by the present invention. An "immunogenic portion," as used herein, is a fragment of an immunogenic polypeptide of the invention that itself is immunologically reactive (i.e., specifically binds) with the B-cells and/or T-cell surface antigen receptors that recognize the polypeptide. Immunogenic portions may generally be identified using well known techniques, such as those summarized in Paul, Fundamental Immunology, 3rd ed., 243-247 (Raven Press, 1993) and references cited therein. Such techniques include screening polypeptides for the ability to react with antigen-specific antibodies, antisera and/or T-cell lines or clones. As used herein, antisera and antibodies are "antigen-specific" if they specifically bind to an antigen (i.e., they react with the protein in an ELISA or other immunoassay, and do not react detectably with unrelated proteins). Such antisera and antibodies may be prepared as described herein, and using well-known techniques.

In one preferred embodiment, an immunogenic portion of a polypeptide of the present invention is a portion that reacts with antisera and/or T-cells at a level that is not substantially less than the reactivity of the full-length polypeptide (e.g., in an ELISA and/or T-cell reactivity assay). Preferably, the level of immunogenic activity of the immunogenic portion is at least about 50%, preferably at least about 70% and most preferably greater than about 90% of the immunogenicity for the full-length polypeptide. In some instances, preferred immunogenic portions will be identified that have a level of immunogenic activity greater than that of the corresponding full-length polypeptide, e.g., having greater than about 100% or 150% or more immunogenic activity.

In certain other embodiments, illustrative immunogenic portions may include peptides in which an N-terminal leader sequence and/or transmembrane domain have been deleted. Other illustrative immunogenic portions will contain a small N- and/or C-terminal deletion (e.g., 1-30 amino acids, preferably 5-15 amino acids), relative to the mature protein.

In another embodiment, a polypeptide composition of the invention may also comprise one or more polypeptides that are immunologically reactive with T cells and/or antibodies generated against a polypeptide of the invention, particularly a polypeptide having an amino acid sequence disclosed herein, or to an immunogenic fragment or variant thereof.

In another embodiment of the invention, polypeptides are provided that comprise one or more polypeptides that are capable of eliciting T cells and/or antibodies that are immunologically reactive with one or more polypeptides described herein, or one or more polypeptides encoded by contiguous nucleic acid sequences contained in the polynucleotide sequences disclosed herein, or immunogenic fragments or variants thereof, or to one or more nucleic acid sequences which hybridize to one or more of these sequences under conditions of moderate to high stringency.

The present invention, in another aspect, provides polypeptide fragments comprising at least about 5, 10, 15, 20, 25, 50, or 100 contiguous amino acids, or more, including all intermediate lengths, of a polypeptide compositions set forth herein, such as those set forth in SEQ ID NO:152, 155, 156, 165, 166, 169, 170, 172, 174, 176, 226-252, 338-344, 346, 350, 357, 361, 363, 365, 367, 369, 376-382 and 387-419, 441, 443, 446, 449, 451-466, 468-477, 480-482, and 484, or those encoded by a polynucleotide sequence set forth in a sequence of SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479 and 483.

In another aspect, the present invention provides variants of the polypeptide compositions described herein. Polypeptide variants generally encompassed by the present invention will typically exhibit at least about 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% or more identity (determined as described below), along its length, to a polypeptide sequences set forth herein.

In one preferred embodiment, the polypeptide fragments and variants provide by the present invention are immunologically reactive with an antibody and/or T-cell that reacts with a full-length polypeptide specifically set for the herein.

In another preferred embodiment, the polypeptide fragments and variants provided by the present invention exhibit a level of immunogenic activity of at least about 50%, preferably at least about 70%, and most preferably at least about 90% or more of that exhibited by a full-length polypeptide sequence specifically set forth herein.

A polypeptide "variant," as the term is used herein, is a polypeptide that typically differs from a polypeptide specifically disclosed herein in one or more substitutions, deletions, additions and/or insertions. Such variants may be naturally occurring or may be synthetically generated, for example, by modifying one or more of the above polypeptide sequences of the invention and evaluating their immunogenic activity as described herein and/or using any of a number of techniques well known in the art.

For example, certain illustrative variants of the polypeptides of the invention include those in which one or more portions, such as an N-terminal leader sequence or transmembrane domain, have been removed. Other illustrative variants include variants in which a small portion (e.g., 1-30 amino acids, preferably 5-15 amino acids) has been removed from the N- and/or C-terminal of the mature protein.

In many instances, a variant will contain conservative substitutions. A "conservative substitution" is one in which an amino acid is substituted for another amino acid that has similar properties, such that one skilled in the art of peptide chemistry would expect the secondary structure and hydropathic nature of the polypeptide to be substantially unchanged. As described above, modifications may be made in the structure of the polynucleotides and polypeptides of the present invention and still obtain a functional molecule that encodes a variant or derivative polypeptide with desirable characteristics, e.g., with immunogenic characteristics. When it is desired to alter the amino acid sequence of a polypeptide to create an equivalent, or even an improved, immunogenic variant or portion of a polypeptide of the invention, one skilled in the art will typically change one or more of the codons of the encoding DNA sequence according to Table 1.

For example, certain amino acids may be substituted for other amino acids in a protein structure without appreciable loss of interactive binding capacity with structures such as, for example, antigen-binding regions of antibodies or binding sites on substrate molecules. Since it is the interactive capacity and nature of a protein that defines that protein's biological functional activity, certain amino acid sequence substitutions can be made in a protein sequence, and, of course, its underlying DNA coding sequence, and nevertheless obtain a protein with like properties. It is thus contemplated that various changes may be made in the peptide sequences of the disclosed compositions, or corresponding DNA sequences which encode said peptides without appreciable loss of their biological utility or activity.

TABLE-US-00001 TABLE I Amino Acids Codons Alanine Ala A GCA GCC GCG GCU Cysteine Cys C UGC UGU Aspartic acid Asp D GAC GAU Glutamic acid Glu E GAA GAG Phenylalanine Phe F UUC UUU Glycine Gly G GGA GGC GGG GGU Histidine His H CAC CAU Isoleucine Ile I AUA AUC AUU Lysine Lys K AAA AAG Leucine Leu L UUA UUG CUA CUC CUG CUU Methionine Met M AUG Asparagine Asn N AAC AAU Proline Pro P CCA CCC CCG CCU Glutamine Gln Q CAA CAG Arginine Arg R AGA AGG CGA CGC CGG CGU Serine Ser S AGC AGU UCA UCC UCG UCU Threonine Thr T ACA ACC ACG ACU Valine Val V GUA GUC GUG GUU Tryptophan Trp W UGG Tyrosine Tyr Y UAC UAU

In making such changes, the hydropathic index of amino acids may be considered. The importance of the hydropathic amino acid index in conferring interactive biologic function on a protein is generally understood in the art (Kyte and Doolittle, 1982, incorporated herein by reference). It is accepted that the relative hydropathic character of the amino acid contributes to the secondary structure of the resultant protein, which in turn defines the interaction of the protein with other molecules, for example, enzymes, substrates, receptors, DNA, antibodies, antigens, and the like. Each amino acid has been assigned a hydropathic index on the basis of its hydrophobicity and charge characteristics (Kyte and Doolittle, 1982). These values are: isoleucine (+4.5); valine (+4.2); leucine (+3.8); phenylalanine (+2.8); cysteine/cystine (+2.5); methionine (+1.9); alanine (+1.8); glycine (-0.4); threonine (-0.7); serine (-0.8); tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); histidine (-3.2); glutamate (-3.5); glutamine (-3.5); aspartate (-3.5); asparagine (-3.5); lysine (-3.9); and arginine (-4.5).

It is known in the art that certain amino acids may be substituted by other amino acids having a similar hydropathic index or score and still result in a protein with similar biological activity, i.e. still obtain a biological functionally equivalent protein. In making such changes, the substitution of amino acids whose hydropathic indices are within .+-.2 is preferred, those within .+-.1 are particularly preferred, and those within .+-.0.5 are even more particularly preferred. It is also understood in the art that the substitution of like amino acids can be made effectively on the basis of hydrophilicity. U.S. Pat. No. 4,554,101 (specifically incorporated herein by reference in its entirety), states that the greatest local average hydrophilicity of a protein, as governed by the hydrophilicity of its adjacent amino acids, correlates with a biological property of the protein.

As detailed in U.S. Pat. No. 4,554,101, the following hydrophilicity values have been assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0.+-.1); glutamate (+3.0.+-.1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (-0.4); proline (-0.5.+-.1); alanine (-0.5); histidine (-0.5); cysteine (-1.0); methionine (-1.3); valine (-1.5); leucine (-1.8); isoleucine (-1.8); tyrosine (-2.3); phenylalanine (-2.5); tryptophan (-3.4). It is understood that an amino acid can be substituted for another having a similar hydrophilicity value and still obtain a biologically equivalent, and in particular, an immunologically equivalent protein. In such changes, the substitution of amino acids whose hydrophilicity values are within .+-.2 is preferred, those within .+-.1 are particularly preferred, and those within .+-.0.5 are even more particularly preferred.

As outlined above, amino acid substitutions are generally therefore based on the relative similarity of the amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, and the like. Exemplary substitutions that take various of the foregoing characteristics into consideration are well known to those of skill in the art and include: arginine and lysine; glutamate and aspartate; serine and threonine; glutamine and asparagine; and valine, leucine and isoleucine.

In addition, any polynucleotide may be further modified to increase stability in vivo. Possible modifications include, but are not limited to, the addition of flanking sequences at the 5' and/or 3' ends; the use of phosphorothioate or 2' O-methyl rather than phosphodiesterase linkages in the backbone; and/or the inclusion of nontraditional bases such as inosine, queosine and wybutosine, as well as acetyl-methyl-, thio- and other modified forms of adenine, cytidine, guanine, thymine and uridine.

Amino acid substitutions may further be made on the basis of similarity in polarity, charge, solubility, hydrophobicity, hydrophilicity and/or the amphipathic nature of the residues. For example, negatively charged amino acids include aspartic acid and glutamic acid; positively charged amino acids include lysine and arginine; and amino acids with uncharged polar head groups having similar hydrophilicity values include leucine, isoleucine and valine; glycine and alanine; asparagine and glutamine; and serine, threonine, phenylalanine and tyrosine. Other groups of amino acids that may represent conservative changes include: (1) ala, pro, gly, glu, asp, gin, asn, ser, thr; (2) cys, ser, tyr, thr; (3) val, ile, leu, met, ala, phe; (4) lys, arg, his; and (5) phe, tyr, trp, his. A variant may also, or alternatively, contain non-conservative changes. In a preferred embodiment, variant polypeptides differ from a native sequence by substitution, deletion or addition of five amino acids or fewer. Variants may also (or alternatively) be modified by, for example, the deletion or addition of amino acids that have minimal influence on the immunogenicity, secondary structure and hydropathic nature of the polypeptide.

As noted above, polypeptides may comprise a signal (or leader) sequence at the N-terminal end of the protein, which co-translationally or post-translationally directs transfer of the protein. The polypeptide may also be conjugated to a linker or other sequence for ease of synthesis, purification or identification of the polypeptide (e.g., poly-His), or to enhance binding of the polypeptide to a solid support. For example, a polypeptide may be conjugated to an immunoglobulin Fc region.

When comparing polypeptide sequences, two sequences are said to be "identical" if the sequence of amino acids in the two sequences is the same when aligned for maximum correspondence, as described below. Comparisons between two sequences are typically performed by comparing the sequences over a comparison window to identify and compare local regions of sequence similarity. A "comparison window" as used herein, refers to a segment of at least about 20 contiguous positions, usually 30 to about 75, 40 to about 50, in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.

Optimal alignment of sequences for comparison may be conducted using the Megalign program in the Lasergene suite of bioinformatics software (DNASTAR, Inc., Madison, Wis.), using default parameters. This program embodies several alignment schemes described in the following references: Dayhoff, M. O. (1978) A model of evolutionary change in proteins--Matrices for detecting distant relationships. In Dayhoff, M. O. (ed.) Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington D.C. Vol. 5, Suppl. 3, pp. 345-358; Hein J. (1990) Unified Approach to Alignment and Phylogenes pp. 626-645 Methods in Enzymology vol. 183, Academic Press, Inc., San Diego, Calif.; Higgins, D. G. and Sharp, P. M. (1989) CABIOS 5:151-153; Myers, E. W. and Muller W. (1988) CABIOS 4:11-17; Robinson, E. D. (1971) Comb. Theor 11:105; Santou, N. Nes, M. (1987) Mol. Biol. Evol. 4:406-425; Sneath, P. H. A. and Sokal, R. R. (1973) Numerical Taxonomy--the Principles and Practice of Numerical Taxonomy, Freeman Press, San Francisco, Calif.; Wilbur, W. J. and Lipman, D. J. (1983) Proc. Natl. Acad., Sci. USA 80:726-730.

Alternatively, optimal alignment of sequences for comparison may be conducted by the local identity algorithm of Smith and Waterman (1981) Add. APL. Math 2:482, by the identity alignment algorithm of Needleman and Wunsch (1970) J. Mol. Biol. 48:443, by the search for similarity methods of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. USA 85: 2444, by computerized implementations of these algorithms (GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wis.), or by inspection.

One preferred example of algorithms that are suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al. (1977) Nucl. Acids Res. 25:3389-3402 and Altschul et al. (1990) J. Mol. Biol. 215:403-410, respectively. BLAST and BLAST 2.0 can be used, for example with the parameters described herein, to determine percent sequence identity for the polynucleotides and polypeptides of the invention. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information. For amino acid sequences, a scoring matrix can be used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T and X determine the sensitivity and speed of the alignment.

In one preferred approach, the "percentage of sequence identity" is determined by comparing two optimally aligned sequences over a window of comparison of at least 20 positions, wherein the portion of the polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) of 20 percent or less, usually 5 to 15 percent, or 10 to 12 percent, as compared to the reference sequences (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the reference sequence (i.e., the window size) and multiplying the results by 100 to yield the percentage of sequence identity.

Within other illustrative embodiments, a polypeptide may be a fusion polypeptide that comprises multiple polypeptides as described herein, or that comprises at least one polypeptide as described herein and an unrelated sequence, such as a known tumor protein. A fusion partner may, for example, assist in providing T helper epitopes (an immunological fusion partner), preferably T helper epitopes recognized by humans, or may assist in expressing the protein (an expression enhancer) at higher yields than the native recombinant protein. Certain preferred fusion partners are both immunological and expression enhancing fusion partners. Other fusion partners may be selected so as to increase the solubility of the polypeptide or to enable the polypeptide to be targeted to desired intracellular compartments. Still further fusion partners include affinity tags, which facilitate purification of the polypeptide.

Fusion polypeptides may generally be prepared using standard techniques, including chemical conjugation. Preferably, a fusion polypeptide is expressed as a recombinant polypeptide, allowing the production of increased levels, relative to a non-fused polypeptide, in an expression system. Briefly, DNA sequences encoding the polypeptide components may be assembled separately, and ligated into an appropriate expression vector. The 3' end of the DNA sequence encoding one polypeptide component is ligated, with or without a peptide linker, to the 5' end of a DNA sequence encoding the second polypeptide component so that the reading frames of the sequences are in phase. This permits translation into a single fusion polypeptide that retains the biological activity of both component polypeptides.

A peptide linker sequence may be employed to separate the first and second polypeptide components by a distance sufficient to ensure that each polypeptide folds into its secondary and tertiary structures. Such a peptide linker sequence is incorporated into the fusion polypeptide using standard techniques well known in the art. Suitable peptide linker sequences may be chosen based on the following factors: (1) their ability to adopt a flexible extended conformation; (2) their inability to adopt a secondary structure that could interact with functional epitopes on the first and second polypeptides; and (3) the lack of hydrophobic or charged residues that might react with the polypeptide functional epitopes. Preferred peptide linker sequences contain Gly, Asn and Ser residues. Other near neutral amino acids, such as Thr and Ala may also be used in the linker sequence. Amino acid sequences which may be usefully employed as linkers include those disclosed in Maratea et al., Gene 40:39-46, 1985; Murphy et al., Proc. Natl. Acad. Sci. USA 83:8258-8262, 1986; U.S. Pat. No. 4,935,233 and U.S. Pat. No. 4,751,180. The linker sequence may generally be from 1 to about 50 amino acids in length. Linker sequences are not required when the first and second polypeptides have non-essential N-terminal amino acid regions that can be used to separate the functional domains and prevent steric interference.

The ligated DNA sequences are operably linked to suitable transcriptional or translational regulatory elements. The regulatory elements responsible for expression of DNA are located only 5' to the DNA sequence encoding the first polypeptides. Similarly, stop codons required to end translation and transcription termination signals are only present 3' to the DNA sequence encoding the second polypeptide.

The fusion polypeptide can comprise a polypeptide as described herein together with an unrelated immunogenic protein, such as an immunogenic protein capable of eliciting a recall response. Examples of such proteins include tetanus, tuberculosis and hepatitis proteins (see, for example, Stoute et al. New Engl. J. Med., 336:86-91, 1997).

In one preferred embodiment, the immunological fusion partner is derived from a Mycobacterium sp., such as a Mycobacterium tuberculosis-derived Ra12 fragment. Ra12 compositions and methods for their use in enhancing the expression and/or immunogenicity of heterologous polynucleotide/polypeptide sequences is described in U.S. Patent Application 60/158,585, the disclosure of which is incorporated herein by reference in its entirety. Briefly, Ra12 refers to a polynucleotide region that is a subsequence of a Mycobacterium tuberculosis MTB32A nucleic acid. MTB32A is a serine protease of 32 KD molecular weight encoded by a gene in virulent and avirulent strains of M. tuberculosis. The nucleotide sequence and amino acid sequence of MTB32A have been described (for example, U.S. Patent Application 60/158,585; see also, Skeiky et al., Infection and Immun. (1999) 67:3998-4007, incorporated herein by reference). C-terminal fragments of the MTB32A coding sequence express at high levels and remain as a soluble polypeptides throughout the purification process. Moreover, Ra12 may enhance the immunogenicity of heterologous immunogenic polypeptides with which it is fused. One preferred Ra12 fusion polypeptide comprises a 14 KD C-terminal fragment corresponding to amino acid residues 192 to 323 of MTB32A.

Other preferred Ra12 polynucleotides generally comprise at least about 15 consecutive nucleotides, at least about 30 nucleotides, at least about 60 nucleotides, at least about 100 nucleotides, at least about 200 nucleotides, or at least about 300 nucleotides that encode a portion of a Ra12 polypeptide.

Ra12 polynucleotides may comprise a native sequence (i.e., an endogenous sequence that encodes a Ra12 polypeptide or a portion thereof) or may comprise a variant of such a sequence. Ra12 polynucleotide variants may contain one or more substitutions, additions, deletions and/or insertions such that the biological activity of the encoded fusion polypeptide is not substantially diminished, relative to a fusion polypeptide comprising a native Ra12 polypeptide. Variants preferably exhibit at least about 70% identity, more preferably at least about 80% identity and most preferably at least about 90% identity to a polynucleotide sequence that encodes a native Ra12 polypeptide or a portion thereof.

Within other preferred embodiments, an immunological fusion partner is derived from protein D, a surface protein of the gram-negative bacterium Haemophilus influenza B (WO 91/18926). Preferably, a protein D derivative comprises approximately the first third of the protein (e.g., the first N-terminal 100-110 amino acids), and a protein D derivative may be lipidated. Within certain preferred embodiments, the first 109 residues of a Lipoprotein D fusion partner is included on the N-terminus to provide the polypeptide with additional exogenous T-cell epitopes and to increase the expression level in E. coli (thus functioning as an expression enhancer). The lipid tail ensures optimal presentation of the antigen to antigen presenting cells. Other fusion partners include the non-structural protein from influenza virus, NS1 (haemagglutinin). Typically, the N-terminal 81 amino acids are used, although different fragments that include T-helper epitopes may be used.

In another embodiment, the immunological fusion partner is the protein known as LYTA, or a portion thereof (preferably a C-terminal portion). LYTA is derived from Streptococcus pneumoniae, which synthesizes an N-acetyl-L-alanine amidase known as amidase LYTA (encoded by the LytA gene; Gene 43:265-292, 1986). LYTA is an autolysin that specifically degrades certain bonds in the peptidoglycan backbone. The C-terminal domain of the LYTA protein is responsible for the affinity to the choline or to some choline analogues such as DEAE. This property has been exploited for the development of E. coli C-LYTA expressing plasmids useful for expression of fusion proteins. Purification of hybrid proteins containing the C-LYTA fragment at the amino terminus has been described (see Biotechnology 10:795-798, 1992). Within a preferred embodiment, a repeat portion of LYTA may be incorporated into a fusion polypeptide. A repeat portion is found in the C-terminal region starting at residue 178. A particularly preferred repeat portion incorporates residues 188-305.

Yet another illustrative embodiment involves fusion polypeptides, and the polynucleotides encoding them, wherein the fusion partner comprises a targeting signal capable of directing a polypeptide to the endosomal/lysosomal compartment, as described in U.S. Pat. No. 5,633,234. An immunogenic polypeptide of the invention, when fused with this targeting signal, will associate more efficiently with MHC class II molecules and thereby provide enhanced in vivo stimulation of CD4.sup.+ T-cells specific for the polypeptide.

Polypeptides of the invention are prepared using any of a variety of well known synthetic and/or recombinant techniques, the latter of which are further described below. Polypeptides, portions and other variants generally less than about 150 amino acids can be generated by synthetic means, using techniques well known to those of ordinary skill in the art. In one illustrative example, such polypeptides are synthesized using any of the commercially available solid-phase techniques, such as the Merrifield solid-phase synthesis method, where amino acids are sequentially added to a growing amino acid chain. See Merrifield, J. Am. Chem. Soc. 85:2149-2146, 1963. Equipment for automated synthesis of polypeptides is commercially available from suppliers such as Perkin Elmer/Applied BioSystems Division (Foster City, Calif.), and may be operated according to the manufacturer's instructions.

In general, polypeptide compositions (including fusion polypeptides) of the invention are isolated. An "isolated" polypeptide is one that is removed from its original environment. For example, a naturally-occurring protein or polypeptide is isolated if it is separated from some or all of the coexisting materials in the natural system. Preferably, such polypeptides are also purified, e.g., are at least about 90% pure, more preferably at least about 95% pure and most preferably at least about 99% pure.

Polynucleotide Compositions

The present invention, in other aspects, provides polynucleotide compositions. The terms "DNA" and "polynucleotide" are used essentially interchangeably herein to refer to a DNA molecule that has been isolated free of total genomic DNA of a particular species. "Isolated," as used herein, means that a polynucleotide is substantially away from other coding sequences, and that the DNA molecule does not contain large portions of unrelated coding DNA, such as large chromosomal fragments or other functional genes or polypeptide coding regions. Of course, this refers to the DNA molecule as originally isolated, and does not exclude genes or coding regions later added to the segment by the hand of man.

As will be understood by those skilled in the art, the polynucleotide compositions of this invention can include genomic sequences, extra-genomic and plasmid-encoded sequences and smaller engineered gene segments that express, or may be adapted to express, proteins, polypeptides, peptides and the like. Such segments may be naturally isolated, or modified synthetically by the hand of man.

As will be also recognized by the skilled artisan, polynucleotides of the invention may be single-stranded (coding or antisense) or double-stranded, and may be DNA (genomic, cDNA or synthetic) or RNA molecules. RNA molecules may include HnRNA molecules, which contain introns and correspond to a DNA molecule in a one-to-one manner, and mRNA molecules, which do not contain introns. Additional coding or non-coding sequences may, but need not, be present within a polynucleotide of the present invention, and a polynucleotide may, but need not, be linked to other molecules and/or support materials.

Polynucleotides may comprise a native sequence (i.e., an endogenous sequence that encodes a polypeptide/protein of the invention or a portion thereof) or may comprise a sequence that encodes a variant or derivative, preferably and immunogenic variant or derivative, of such a sequence.

Therefore, according to another aspect of the present invention, polynucleotide compositions are provided that comprise some or all of a polynucleotide sequence set forth in any one of SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479, 483, 485, and 489, complements of a polynucleotide sequence set forth in any one of SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479, 483, 485, and 489, and degenerate variants of a polynucleotide sequence set forth in any one of SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479, 483, 485, and 489. In certain preferred embodiments, the polynucleotide sequences set forth herein encode immunogenic polypeptides, as described above.

In other related embodiments, the present invention provides polynucleotide variants having substantial identity to the sequences disclosed herein in SEQ ID NO:1-3, 6-8, 10-13, 15-27, 29, 30, 32, 34-49, 51, 52, 54, 55, 57-59, 61-69, 71, 73, 74, 77, 78, 80-82, 84, 86-96, 107-109, 111, 113, 125, 127, 128, 129, 131-133, 142, 144, 148-151, 153, 154, 157, 158, 160, 167, 168, 171, 179, 182, 184-186, 188-191, 193, 194, 198-207, 209, 210, 213, 214, 217, 220-224, 253-337, 345, 347, 349, 358, 362, 364, 365, 368, 370-375, 420, 424, 428, 431, 434, 442, 447, 450, 467, 478, 479, 483, 485, and 489, for example those comprising at least 70% sequence identity, preferably at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or 99% or higher, sequence identity compared to a polynucleotide sequence of this invention using the methods described herein, (e.g., BLAST analysis using standard parameters, as described below). One skilled in this art will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning and the like.

Typically, polynucleotide variants will contain one or more substitutions, additions, deletions and/or insertions, preferably such that the immunogenicity of the polypeptide encoded by the variant polynucleotide is not substantially diminished relative to a polypeptide encoded by a polynucleotide sequence specifically set forth herein). The term "variants" should also be understood to encompasses homologous genes of xenogenic origin.

In additional embodiments, the present invention provides polynucleotide fragments comprising various lengths of contiguous stretches of sequence identical to or complementary to one or more of the sequences disclosed herein. For example, polynucleotides are provided by this invention that comprise at least about 10, 15, 20, 30, 40, 50, 75, 100, 150, 200, 300, 400, 500 or 1000 or more contiguous nucleotides of one or more of the sequences disclosed herein as well as all intermediate lengths there between. It will be readily understood that "intermediate lengths", in this context, means any length between the quoted values, such as 16, 17, 18, 19, etc.; 21, 22, 23, etc.; 30, 31, 32, etc.; 50, 51, 52, 53, etc.; 100, 101, 102, 103, etc.; 150, 151, 152, 153, etc.; including all integers through 200-500; 500-1,000, and the like.

In another embodiment of the invention, polynucleotide compositions are provided that are capable of hybridizing under moderate to high stringency conditions to a polynucleotide sequence provided herein, or a fragment thereof, or a complementary sequence thereof. Hybridization techniques are well known in the art of molecular biology. For purposes of illustration, suitable moderately stringent conditions for testing the hybridization of a polynucleotide of this invention with other polynucleotides include prewashing in a solution of 5.times.SSC, 0.5% SDS, 1.0 mM EDTA (pH 8.0); hybridizing at 50.degree. C.-60.degree. C., 5.times.SSC, overnight; followed by washing twice at 65.degree. C. for 20 minutes with each of 2.times., 0.5.times. and 0.2.times.SSC containing 0.1% SDS. One skilled in the art will understand that the stringency of hybridization can be readily manipulated, such as by altering the salt content of the hybridization solution and/or the temperature at which the hybridization is performed. For example, in another embodiment, suitable highly stringent hybridization conditions include those described above, with the exception that the temperature of hybridization is increased, e.g., to 60-65.degree. C. or 65-70.degree. C.

In certain preferred embodiments, the polynucleotides described above, e.g., polynucleotide variants, fragments and hybridizing sequences, encode polypeptides that are immunologically cross-reactive with a polypeptide sequence specifically set forth herein. In other preferred embodiments, such polynucleotides encode polypeptides that have a level of immunogenic activity of at least about 50%, preferably at least about 70%, and more preferably at least about 90% of that for a polypeptide sequence specifically set forth herein.

The polynucleotides of the present invention, or fragments thereof, regardless of the length of the coding sequence itself, may be combined with other DNA sequences, such as promoters, polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, other coding segments, and the like, such that their overall length may vary considerably. It is therefore contemplated that a nucleic acid fragment of almost any length may be employed, with the total length preferably being limited by the ease of preparation and use in the intended recombinant DNA protocol. For example, illustrative polynucleotide segments with total lengths of about 10,000, about 5000, about 3000, about 2,000, about 1,000, about 500, about 200, about 100, about 50 base pairs in length, and the like, (including all intermediate lengths) are contemplated to be useful in many implementations of this invention.

When comparing polynucleotide sequences, two sequences are said to be "identical" if the sequence of nucleotides in the two sequences is the same when aligned for maximum correspondence, as described below. Comparisons between two sequences are typically performed by comparing the sequences over a comparison window to identify and compare local regions of sequence similarity. A "comparison window" as used herein, refers to a segment of at least about 20 contiguous positions, usually 30 to about 75, 40 to about 50, in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.

Optimal alignment of sequences for comparison may be conducted using the Megalign program in the Lasergene suite of bioinformatics software (DNASTAR, Inc., Madison, Wis.), using default parameters. This program embodies several alignment schemes described in the following references: Dayhoff, M. O. (1978) A model of evolutionary change in proteins--Matrices for detecting distant relationships. In Dayhoff, M. O. (ed.) Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington D.C. Vol. 5, Suppl. 3, pp. 345-358; Hein J. (1990) Unified Approach to Alignment and Phylogenes pp. 626-645 Methods in Enzymology vol. 183, Academic Press, Inc., San Diego, Calif.; Higgins, D. G. and Sharp, P. M. (1989) CABIOS 5:151-153; Myers, E. W. and Muller W. (1988) CABIOS 4:11-17; Robinson, E. D. (1971) Comb. Theor 11:105; Santou, N. Nes, M. (1987) Mol. Biol. Evol. 4:406-425; Sneath, P. H. A. and Sokal, R. R. (1973) Numerical Taxonomy-