Thursday, September 1, 2011

On Dynamically Typed Languages

In programming languages, types are a very fundamental thing.  All expressions have a type, or at least can be represented as types.  For example, in the expression "1 + 2", where 1 and 2 are integers, the type of the expression is most likely that of an integer.  I say "most likely", because usually languages are defined so that "+" with two integers results in an integer.  (In more formal terms, the type signature of "+" is integer -> integer -> integer, meaning it takes two integers as parameters and it returns an integer.)

Typing information is important regarding a language's implementation.  In general, information regarding types must be known in order to execute a program.  For example, the expression "'moo' + 1" is incorrect in regards to the previous definition of the "+" operator, as 'moo' is not an integer.  This is an example of a type error: the expected types do not match the given types.

To illustrate that type errors are a significant problem, consider assembly language.  In general, assembly languages have no concept of types.  Data is passed around, and the type of the data depends only on the context it is used in.  The same data can represent a variety of things: an integer, a floating point number, a string of characters, a memory address, etc.  Everything is considered "valid", though it is very easy to make a nonsensical command.  Such errors show themselves only by the program behaving incorrectly, which are some of the most difficult bugs to work out. 

Clearly, typing is important.  What is also important, and getting back to the title of this post, is when typing information is determined.  In a statically typed language, typing information is assigned at compile time, before a program is executed.  In these languages, type errors reveal themselves as compile time errors.  In a dynamically typed language, typing information is assigned on the fly as the program runs.  In these languages, type errors reveal themselves as runtime errors of varying severity.  (The severity is determined by other language properties not relevant to this discussion.)

Both types of languages exist in the real world, and there are a number in both camps.  For example, C, C++, and Java are all statically typed languages.  Perl, Python, and JavaScript are all examples of dynamically typed languages.

There are pros and cons for both methods.  For statically typed languages, determining types at compile time means that more efficient code can be generated.  There is no overhead at runtime in verifying types, as such verification is performed at compile time.  Additionally, many common programming errors reveal themselves as type errors, resulting in theoretically more reliable code.  The downside is that certain problems are difficult to express in statically typed languages.  One often focuses more on getting the types just right so code will compile as opposed to actually solving the problem at hand.  Additionally, for many languages, specifying type information requires bulky annotations which take up a significant chunk of the code.  Not to mention the specification of these annotations becomes tedious.

The second listed drawback of statically typed languages, that of bulky type annotations, is frequently heralded as the key advantage of dynamically typed languages.  I will be the first to admit that such annotations are an utter pain in languages like Java, accounting for probably between 10-20% of the characters typed.  Imagine code being 10-20% shorter, just because the language is dynamically typed.

There is, however, a middle ground: type inferencers.  These are built into the compilers of certain statically typed languages, such as Scala, Haskell, and OCaml.  Type inferencers allow the programmer to skip specifying most, if not all, type annotations in code.  When code is compiled, the type inferencer goes to work, filling in the gaps. 

That's the theory, anyway.  Different inferencers have different capabilities.  Personally, I'm very familiar with Scala's inferencer.  In Scala, type annotations must be specified for input parameters to functions.  Additionally, annotations must be provided for the return values of overloaded functions and for recursive functions.  There are also a few edge cases where annotations must be specified, though these are quite rare.  Even though all of these cases require annotations, it cuts down the number significantly.  Comparing Scala code to Java code, I would safely say in half.  This pales in comparison to Haskell's type inferencer, which is so powerful that frequently all type annotations can be omitted.  Haskell code often looks like types are determined dynamically, precisely because explicit type information is notably absent.

One might have an aversion to such inferencers, at least in theory.  What if the inference is wrong, for example?  In my experience with Scala, I can say that this is rare, but it does happen.  That said, when it happens, it usually means I'm doing something REALLY wrong.  In this case, it means I'm consistently doing the wrong thing.  Any inconsistency results in a failure to infer type, as the inferred type is different in different contexts.  As such, the code is still valid, but it suffers from a logical error.  One can do this in a traditional statically typed language as well, but it is even more rare given how frequently typing information must be specified.  The repetition makes it so one tends to notice more.

The way I view it, the typing argument is a continuum between statically typed languages and dynamically typed languages, which various type inferencing mechanisms in between. 

I will admit that true dynamic typing has advantages.  For example, one of Lisp's greatest strengths is its macros, which allow for the creation of entirely new language-level abstractions.  Such macros expand to arbitrary code, and can usually accept arbitrary code.  With a perfect type inferencer that never requires type annotations, this isn't a problem.  The issue is that there is no such thing; inevitably typing information will need to be specified.  With a macro, this type information may need to be inserted in the final expanded code, instead of as a parameter to the macro.  If this is true, then macros can't be implemented properly: one would need a way to manipulate the expanded code, which largely defeats the purpose of macros. 

Personally, I prefer statically typed languages with type inferencers.  I tend to hit a wall with true dynamically typed languages after about a thousand or so lines.  By this point, I have defined several simple data structures, along with operations that act on them.  In a dynamically typed language, I can pass around the wrong data structure into the wrong operation quite easily.  Worse yet, depending on how the data structure is implemented, I may never know except for incorrect program behavior.  For example, if all the data structures are lists of length >= 1, and I have an operation that looks at only the first element and performs some very general operation on the element, then probably every data structure can be passed to this operation without a resulting type error.  By the thousand line mark, I can't keep it all in my head at the same time, and I make these simple errors.  In a statically typed language, this is revealed immediately at compile time, but not so with a dynamically typed language.  At this point I spend more time testing code than writing code, tests that I get "for free" from a statically typed language.

To put it shortly, most of my programmatic errors reveal themselves as type errors, so static typing is important to me.  This issue is so important that I feel that people with a strong preference for dynamically typed languages must think in a way fundamentally different than I.  I know people who can write correct code just as fast in dynamically typed languages as I can in statically typed languages, so there must be something.  Maybe it's practice, or maybe something else is going on.  That would make an interesting case study...

Monday, May 16, 2011

Meta-Research

...or the research of research, as I see it.  I recently read an article published in PLoS about how most published research is actually false (link). 

This very idea makes me cringe.  If that's true, why do we even bother?  Why spend enormous amounts of money to support something that's false? 

The paper doesn't really address either of those points, but rather it talks about how it can make such a claim.  Statistically, it is difficult to confirm things without absolutely enormous amounts of data.  Of course, getting data sets that are large enough for arbitrary experiments can range from difficult to impossible, with infeasibility being common.  This can put the statisticians at odds with the scientists.  The author's argument drives at this, pointing out that the data sets usually used are not large enough to be able to get a statistically valid answer. 

There is another problem.  People are people.  We are inherently biased.  It has been said that data is objective, and I used to believe that was true, at least in theory.  But then the question was posed to me: why don't scientists measure _everything measurable_ regarding an experiment?  Of course, this would mean an enormous amount of data, of which most of it is probably irrelevant.  But do we really know it's irrelevant?  The answer is no.  Our bias isn't shown so much in what we measure, but rather in what we choose not to measure - those things we think are irrelevant. 

Research costs money.  This usually means getting a grant.  Getting a grant usually means convincing someone that your research is going to do some good, be it cure a disease or (more commonly) make money.  With that in mind, why would anyone pay any amount of money for an application that reads "We want to do X.  We drew it out of a hat.  We have no idea what it does, and we have no idea what could come of this."  That is mostly unbiased (who put the ideas in the hat?).  It is also the least convincing argument I have ever heard for giving someone money. 

Now try "We want to research X, because it seems to have an effect on weight retention.  If this is true, we could develop an effective drug for weight loss."  Now we have something profitable.  But here's the problem: everyone involved wants it to be true.  More than likely, even someone working within ethical bounds is going to act differently when the desired outcome is known ahead of time.  I've been told repeatedly that one should never do data analysis until all the data is in.  However, we do not do this.  I have watched people stare at long running experiments that appear to deviate from expectations.  Frequently the target is personified, "Why are you putting a band there?  You're supposed to put it over here!"  I do this type of thing myself.  We already know what the experiment is "going" to do; we just need the formality of it actually doing it.

All in all, I think the paper is particularly interesting.  It gives a feel for how heterogeneous science really is, and it illustrates the ever-present (though shunned) human element.

Research shows that most research is wrong.

Monday, May 9, 2011

Vaccinations and Autism

And now for something completely different.  I read the original paper that supposedly linked autism to the measles/mumps/rubella (MMR) vaccine (link).  I know that this can get to be a heated topic, but for the moment I'm going to try to focus on the paper itself.  (Of course I'm going to be biased, but I'll try not to be!)

The paper suggests that a new disorder has been discovered.  Characteristic of this disorder is a combination of inflammatory bowel disease (IBF)-like symptoms, combined with autism-like symptoms.  The most notable feature of this disorder is that sufferers consistently presented with it between 24 hours and 18 months of receiving the MMR vaccine.  Most sufferers presented with symptoms within two weeks.  Such a disorder would be quite interesting, as the gastrointestinal tract and brain are two very different areas.  The author's original data in support of this disorder was a case study of 12 people.  Shortly after the paper was published with the original 12, an additional 40 patients were observed, of whom 39 were found to have this new syndrome.

Those are the facts, as presented by the authors.  Without going beyond the paper, this is not very convincing data of a new disorder.  Within the paper itself, there is only complete data presented for the original 12.  Of these 12, there is still considerable variability between patients.  Additionally, there is no control group; these 12 were hand picked by the authors.  The authors openly acknowledge this.  This was published as an "Early Report", and was more or less intended to be a springboard from which further research could be conducted.  To directly quote the paper, "We did not prove an association between measles, mumps, and rubella vaccine and the syndrome described."  Though the evidence suggests an association, there is simply not enough data to be able to make a scientifically valid determination.  Even if there is sufficient data to back an association, then one must determine if the relationship is causative or merely correlation.  (For example, when hot cocoa drinking is up, the crime rate goes down.  The reason is that it's typically cold when people drink hot cocoa, and the crime rate is known to drop in cold weather.)  Medical case studies need hundreds if not thousands of patients to be able to draw any hard and fast conclusions, and 12 patients is not enough to make such a claim.

Now I'll go beyond the paper.  For one, the main author (Dr. Wakefield) was covertly being paid by a law firm that was intending to sue the MMR vaccine manufacturers.  This is a conflict of interest.  Generally, conflicts of interest are rare in published research.  If they exist at all, they should be openly acknowledged.  (Here is a link to a paper with an open acknowledgment of a conflict of interest.)  This is a red flag.  Science is supposed to be as objective as possible, but with a conflict of interest it can be disadvantageous to be objective.

The more troubling problem is that most of the data itself is just plain not true.  Although 10/12 patients were listed as having something classifiable as autism (9/12 if you ignore data with question marks next to it), it was revealed that 3 of them never had a formal diagnosis.  Only a professional can make such a diagnosis.  Many of the symptoms of autism appear in other disorders, and only someone skilled in seeing all these disorders can actually make this judgment.  (I'm sorry, you cannot diagnose yourself as having a complex disorder just by reading a few pages on Wikipedia.)  As such, this is fraud. 

Another point is that earlier drafts of the paper used lengthier values for the time between exposure to MMR and first signs of symptoms.  As it came closer to the final draft, these time intervals shrank dramatically.

A third point is that much of the data was acquired not directly by doctors at the time of visit, but rather by parents at other times.  In the case of one of the children, such data was not acquired until 2 1/2 years until after symptoms first appeared.  For something as complex as autism, nonspecific data acquisition is not sufficient.  There are particular things that professionals look for, preferably directly as opposed to through a medical file.

I could go on and on about the different kinds of fraud and deception that occur in this paper.  A complete description of all these things can be found in here.  Note that this is from BMJ, which is a peer-reviewed source of legitimate medical information.  This is not some random website that some anonymous person made.  I must make that point clear, as there is a lot of misinformation on the Internet regarding this situation.

There have been a substantial number of follow-up peer-reviewed publications that have shown no link between autism and vaccinations, including this one.  However, the damage has already been done.  Many members of the general public think that there is a link because of this paper.  It has left a bad taste in people's mouths, with big bad science coming along to give our children autism.  This blog post is just going to be part of the fodder in this battle, which will likely continue without merit for years to come. 

People who still think there is a link will likely associate me with some evil corporate machine, and dismiss me.  Fine.  It would not be the first time someone has written me off that easily.  Let's assume there is a link, that this paper was correct, that it should never have been retracted, and this is all part of some conspiracy to cover the truth.  So if that's all true, then why does no one relate vaccinations to inflammatory bowel disease?  The bulk of the data of the paper is in support of IBF, not autism.  Dr. Wakefield is neither a psychologist nor a pediatrician, though he does specialize in the gastrointestinal tract.  The paper is not suggesting a link between autism and MMR - it is suggesting a link between autism, MMR and IBF.  It brands the combination of these three under a new disorder.  Removing one element means something else entirely, something the authors were not discussing.  In other words, if one believes what this paper is claiming, then it is self-contradictory to say that there is a link between autism and MMR without IBF involved.  As to how it happened to be autism and not IBF that was picked up by the media I'll never know.

Monday, May 2, 2011

Recycling: It Can Save Your Life

...assuming you're a lung cell.  I recently read an article published in the Public Library of Science (link) about how Pseudomonas aeruginosa infects people.  The bacterium can infect the lungs of people with other preexisting lung conditions, including pneumonia and chronic obstructive pulmonary disease (COPD).

Pseudomonas aeruginosa is an interesting infection, mostly because it requires a bit of sophistication on the part of the bacterium.  In the lungs, there is a protective mucous membrane that coats the outer layer of cells.  This outer layer of cells is known as the epithelium.  This mucous prevents most everything that is foreign to the lungs from directly contacting the epithelium, which can prevent many kinds of damage and infection.  Pseudomonas aeruginosa can't break through this layer, so it devises a strategy: send specially manufactured vesicles that can.  These vesicles have proteins on the surface that allow them to bind and fuse with cells in the epithelium, and they contain proteins that cause cellular change.  They are somewhat analogous to so called "bunker buster" bombs, which are able to penetrate a formidable outer shell and deliver a payload to the inside of the structure.  Only these are released with little guidance.


As for the payload, Pseudomonas aeruginosa causes a slight but severe change in infected cells.  In healthy cells there is a protein, namely CFTR, that regulates the amount of mucous there is in the lungs.  The protein must reside on the surface of cells to have any effect.  As part of normal cellular activities, this protein is occasionally ubiquitinated, meaning a ubiquitin group is bound to it.  This triggers a pipeline of events to occur.  The ubiquitinated protein is first sequestered from the cell membrane.  It then can follow one of two paths.  In one path, the ubiquitin group is removed, and the protein returns to the cell membrane.  In the other path, the ubiquitin group remains bound, and the protein is eventually degraded.  In healthy cells, these two paths run in tandem.  This is necessary to remove CFTR proteins from the membrane that no longer function, and are essentially just wasting space on the membrane.

What was previously known is that Pseudomonas aeruginosa infection somehow selectively shuts down the path that causes CFTR to return to the cell membrane.  As such, all the sequestered CFTR ends up being degraded.  The cell ends up degrading more CFTR than it can spare, and proper function is lost.  Without CFTR to regulate mucous properly, mucous builds up.  This is beneficial to Pseudomonas aeruginosa, as the once protective mucous ends up being its home, but this is at the detriment of its victim.  This mucous buildup is how people can literally drown in their own lung fluids, not to mention that it makes for a friendly environment for other opportunistic pathogens.

This paper investigated exactly how Pseudomonas aeruginosa is able to shut down the recycling pathway, forcing all ubiquitinated CFTR to be degraded.  The authors found that the vesicle payload contains a protein called Cif.  Although they were unable to determine exactly how, they found that Cif prevents the enzyme that deubiquitinates CFTR from functioning properly.  The reason why is somewhat complicated.  There is another protein, namely G3BP1 that normally inhibits the deubiquitination enzyme from function.  This protein is naturally occurring in lung cells, and it is presumably necessary for normal function.  G3BP1 can bind to the deubiquitination enzyme, temporarily preventing it from functioning.  In healthy cells, G3BP1 does not bind with very high affinity, presumably without other naturally occurring factors to help it along, so the net effect on the deubiquitination enzyme is minimal.

This is where Cif comes in for infected lung cells.  Cif stabilizes the interaction between G3BP1 and the deubiquitination enzyme, preventing the enzyme from functioning for much longer than with G3BP1 alone.  The effect is that the overwhelming majority of ubiquitinated CFTR never ends up getting deubiquitinated, as the deubiquitination enzyme has been inhibited by the interaction between G3BP1 and Cif.

I have a few questions regarding this mechanism, which could make for good future work.  For one, I suspect that some people are naturally immune to Pseudomonas aeruginosa infection, simply because they have mutations in either G3BP1 or the deubiquitination enzyme that prevent Cif from binding well.  It should be possible to conduct a clinical study on people with preexisting lung disorders, looking for those who for some reason never develop Pseudomonas aeruginosa infections, despite the significantly likelihood.

I also think that knowledge of this mechanism could lead to a novel drug treatment that prevents Cif from working properly.  Such a drug would somehow induce a conformational change in Cif that would prevent its proper binding to G3BP1.

The overall infection mechanism could be exploited for other purposes, as well.  Classically, specific drug delivery is a problem.  However, Pseudomonas aeruginosa is able to release vesicles that seem specific to lung tissues and contain specific payloads for said tissues.  With genetic engineering, it should be possible to change the payload to be whatever is necessary at the moment.  The result would be a targeted drug delivery system, injecting a specific drug into a specific tissue at a (below) microscopic level.  Perhaps we could even deliver an anti-Cif drug via the same mechanism used to inject Cif in the first place, of all ironic things.

Monday, April 25, 2011

Chatty Bacteria

I recently read an article published in the Public Library of Science (link) about biofilm development on nematodes.  Before getting into the article, some background is needed.

Bacteria are classically seen as unicellular organisms that exist independently of one another.  These cells do not communicate with each other, and are really just a large group of individuals.  Cell in multicellular organisms, in contrast, communicate with each other extensively through a variety of means.  There are individuals, but individuals exist for the good of the whole.  (Cancer is an example of individuals acting in the interest of individuals, as opposed to acting in the interest of the whole organism.)

This model is nice and simple, but untrue.  Different species and strains of bacteria show certain levels of communication.  Though none of these forms of communication are as extensive as those seen in multicellular organisms, they are still significant.  A fairly common type of bacterial communication is known as quorum sensing.  In quorum sensing, bacterial cells are able to send a message to each other that essentially reads "we have reached a certain size".

As to how bacteria respond to this message depends on the particular species.  For certain pathogenic bacteria, it is interpreted as an attack message.  For a small group of bacteria, attacking a host would be certain death.  The numbers are too small to cause significant damage to the host, minimizing the amount of gain from an attack.  More importantly, the host will mount defenses in the form of an immune response, and a small group could very quickly be eradicated.  For a small group, it is much more advantageous to sit and wait.  The groups numbers slowly build, but the bacteria are proverbially under the radar of the host.  As long as the bacteria are not actually harming the host, the host has no advantage in expending energy and attacking the bacteria.  At some point, the bacterial numbers become significant, to the point where an immune response would not be able to dispatch the bacteria so quickly.  It is at this point where the size signal is sent, triggering the bacteria to attack the host.  Such behavior is quite advantageous, showing the power of such a seemingly simple signal.

In the paper, the authors looked at biofilm development of a certain group of bacterium, namely Yersinia.  (Yersinia includes the infamous Yersinia pestis, which causes the black plague.)  Biofilms are the closest bacteria get to being multicellular.  Within a biofilm, bacteria live in close quarters with each other, producing a variety of compounds that benefit the group as a whole.  Biofilms act as a platform for growth, and as a whole tend to be resistant to things that would otherwise kill off bacteria, including antibiotics.  The creation of biofilms is no simple feat for bacteria, and it is often mediated by the production of chemical signals to each other.

Enter the poor nematode.  This is a simple, very tiny worm, which is often used as a model organism in biology.  Yersinia can actually make its home on nematodes, and is even capable of making biofilms on nematodes.  The authors investigated how such biofilms were made.  Given that the nematodes are capable of (and do) move around, such biofilms seem to be an interesting area of study, as many biofilms tend to develop on static surfaces.  Sure enough, the construction of these motile biofilms is mediated by the same quorum sensing signals as seen in other bacteria.  Biofilms are loaded with the quarum sensing signal, namely N-acylhomoserine lactone (AHL).  The authors genetically engineered bacteria that were incapable of making AHL, and the resulting bacteria were unable to develop substantial biofilms.  In addition to biofilm production, they also found that quorum sensing signals triggered pathogenesis in general, as evidenced by the need for AHLs to make virulon proteins.

Though quorum sensing appears to be widely utilized by bacteria, there appears to be a large amount of variation on the common theme.  There are a lot of different ways in which a "we number this many" signal can be used advantageously for bacteria.  Life, through evolution, tends to explore many of niches, and experimentally it seems that quorum sensing is no exception.  The authors note how a number of other pathogens utilize quorum sensing in their own specific ways.

This leads to an interesting topic for experimental drugs.  Without the quorum sensing signal, certain pathogens never actually express pathogenic behavior.  If we can develop a drug that prevents this signal from ever reaching its target, be it through destroying the signal, blocking its receptor, or some other means, then the bacteria in question never mount an attack.  While they are still there, they are effectively harmless.  It seems that quorum sensing is specific to bacteria, so presumably such drugs would target bacteria specifically.  Additionally, being that quorum sensing is a common theme for pathogens, such drugs may specifically target pathogenic bacteria, sparing "good" bacteria.  This is unlike modern broad spectrum antibiotics, which usually kill off everything.  (Many of the negative side effects of antibiotics are due to good bacteria getting killed.)  There seems to be a lot of good that could come of quorum sensing research, and I'm excited to see what the future holds for it in terms of medicine.

Monday, April 18, 2011

Recursive Pathogens

I recently read an article published in Nature letters (link).  The topic of the article is that of a newly discovered pathogen: the virophage. 

The virophage is something completely out of the ordinary, compared to usual pathogens.  Virophages, like viruses, are not actually alive.  They lack their own molecular machinery for reproduction, and must rely on the host's machinery for this purpose.  For a typical virus, this is fairly simple conceptually.  A typical virus hijacks the molecular machinery of the cell, using it to produce viral proteins and induce other behavior advantageous to the virus.  The cell is forced to create new viruses with its own machinery, allowing for the creation and spreading of even more viruses.

In the respect of hijacking a host's machinery, the virophage is no different from a typical virus.  What is atypical, however, is that virophages actually hijack the already hijacked cellular machinery.  That is, virophages require that some other virus has already modified the molecular machinery of a cell in a certain way that the virophage can use it.  The virophage alone cannot infect a cell; it requires both the cell and another virus infecting the cell.

For this matter, it may be wrong to say the virophage infects the cell.  Based on the results of the paper, it seems more accurate to say that the virophage infects the other virus, which happens to reside in a cell.  Infection with virophage caused many of the normal viral components produced to be nonfunctional.  That is, the virophage impeded the spread of the infecting virus.  The virophage actually had a beneficial effect for cells.  Significantly fewer cells died when infected with virus + virophage instead of just virus (virophage + cells was no different than cells alone).

Although this is not too difficult to understand, it's a very different way of thinking.  The common terms "pathogen" and "host" which used to have clear definitions become blurred.  The virophage is not a cellular pathogen, but rather a viral pathogen.  Given that viruses are not alive, this is a paradox: how can something nonliving be a pathogen to something else that is nonliving?  This gets at the very root of what it means to be "alive", which has been hotly debated in the past by people across a wide variety of fields.

I think there are a lot of directions in which this research could go.  For one, it is suggested that virophages are extremely common in oceans, and perhaps elsewhere.  So far, all virophages discovered have come from common cooling towers, so they exist out of the ocean as well.  I wonder how many different kind of virophages there are.  Perhaps we could find a virophage for existing viral human pathogens, although this is probably jumping the gun.

A logical next step is to determine exactly how the virophage is hijacking the other virus.  The nonfunctional viral particles produced are very strange, and it does not seem obvious how they come about. 

Another question that comes to mind is selection advantage and the evolution of virophages.  Consider an extremely virulent virus.  This virus usually kills its host.  For a virus, it is unfavorable to kill off the host, since the host is required for reproduction.  Additionally, it is unfavorable to adversely affect the host significantly.  Generally, very sick people partially quarantine themselves from the rest of the population, namely by bedrest.  It is in the virus' best interest to spread to as many people as possible, and a very sick host cannot do that.  This is partially why the cold virus is so ubiquitous - people rarely get sick to the point of avoiding others, which in turn spreads the virus.  In summation, a highly virulent virus is bad both for the host it infects and the virus itself.

This is where I see a virophage coming in.  Although the virophage is a viral pathogen, in this case, it is actually in the virus' best interest not to be so virulent.  If the virophage prevents the host virus from being so pathogenic, then the end result is that the host virus can spread to more people.  Granted, much less of it is spreading, but considering that only one virus is theoretically needed to start an infection, this reduction may be acceptable.  The virophage is also beneficial to the cell, as cells simultaneously infected with virophage and virus usually do much better than cells infected with only virus.

That's my suspicion anyway.  As stated before, there are a lot of paths this research can take from here, and I only scratched the surface with these ideas.  Time to revise the textbooks.

Monday, April 11, 2011

Congenitally Blind "Sight"

I recently read an article in PNAS about how sounds are processed in the brains of humans who are blind from birth (link).  Before this study, it was already known that the brains of people who are blind from birth process sounds in a fundamentally different way than that of sighted people.  In sighted people, a part of the brain known as the visual cortex deals with image processing.  Of course, in blind people, there are no images to process.  Studies have shown that this does not, however, mean that this region of the brain is inactive in congenitally blind people (people blind from birth).  The area has shown to be active instead in the processing of sound, in addition to the normal parts of the brain that deal with sound.  One may hypothesize that this additional activity forms the basis of improved sound perception in congenitally blind people, although such is speculation.

While it has been previously shown that the visual cortex processes sounds in the congenitally blind, this is not all too specific.  Different neural pathways go through the visual cortex.  Relevant to this paper are two pathways.  One is a "what" pathway that is involved in object recognition.  The other is a "where" pathway that is responsible for understanding the spacial relationships between objects.

This study picks up where others have left off, and attempts to determine which pathways are active for different sounds in congenitally blind people.  The authors manipulated the pitches and locations of sounds, carefully recording which pathways became active via fMRI.  The authors found that when pitch was varied, the "what" visual pathway became active.  Given that pitch is one of the descerning properties of sounds, and that the "what" visual pathway more generically analyzes the properties of objects, this result makes sense.  When the location of a given sound was varied, the "where" visual pathway became active.  This is also not a major cognitive leap; the location a sound came from is analogous to where an object is currently located in one's field of vision.

I find it fascinating that there is sufficient plasticity in the brain so that these two similar systems can be mapped so well.  For me, this study presents many more questions than answers.  For example, most of the brain's neuronal connections in the visual cortex are at their adult state by age 11.  This implies that if someone were to become blind after this age, then the brain could not rewire itself in such an elegant or effective way as seen in the congenitally blind.  It is assumed that such people would not be able to achieve better hearing, or at least hearing comparable to the congenitally blind.  It would also be interesting to see the opposite case - a congenitally blind human achieving sight after age 11.  In this case, the brain would have to rewire itself for vision, and it may not be capable of this.  Again, "may" is the operating word; actual studies would need to be conducted to verify such hypothesis.  The plasticity of the brain is fascinating, though it seems that the brain is somehow able to map to related systems when the primary system does not work.  On the same note, this study make it look like the brain attempts to achieve better utilization of existing inputs when some inputs are nonexistent.  Fascinating stuff!

Sunday, April 3, 2011

Sources of Anxiety

I recently read an article in Nature that explores the nature of anxiety, found here.  The paper notes previous studies on conditioned anxiety.  That is, anxiety that isn't inborn, but rather learned.  However, the paper instead explores anxiety that is hard-wired into the brains of mice, which are assumed to have similar anxiety pathways as humans.  They looked at the amygdala, which is known to mediate emotional learning, or the attaching of certain emotions to memories.  Given that many anxieties have a root at some traumatic, memorable experience, it makes sense that this region is explored.  But again, they were not looking for evidence of anxiety due to such memories, but rather anxiety without a root memory.

In mice, there are some documented memory-less anxieties.  Specifically, the authors looked at a specific mouse behavior that is reminiscent of agoraphobia.  Mice naturally tend to avoid wide open areas, and show anxiety when in such open areas.  Given that mice are small prey animals, one can see the selective advantaged conferred by such a behavior.  With anxiety of open areas, mice avoid such exposed positions, limiting their chances of becoming the next meal for a predator.  Of course, usually one can be a meal only once, so this would not really work as a learned behavior.

The authors constructed experimental mazes which would make it apparent when mice showed anxiety or not.  In addition, they conducted cell physiology work, in which it is possible to measure the activity of a single protein pump on a single neuron.  Using these techniques, it is possible to very accurately quantify neuronal activity.  Using the constructed apparatus, they were able to trigger the anxious agoraphobic behavior in mice.  Watching videos of the mice moving in the experiments (included in the paper's supplemental material) makes it very clear that the authors are, in fact, able to control the anxiety.  The mice show the anxiety within a second or two of activation of the apparatus, and they return to normal just as quickly when the procedure is stopped.

I would like to go into more detail than this, but I honestly do not understand the methods in their entirety.  I do not think the methods are the important take-home point anyway.  The authors were able to prove that there is such a thing as totally inborn anxiety, and they were able to map out a significant part of the neuronal circuitry involved with it.  This could lead to a dramatic improvement in anti-anxiety medications.  The authors point out that we currently do not understand anxiety in its entirety, and that current anti-anxiety medications do not directly target the pathways that trigger anxiety.  Given that we do not know the pathways, it is no wonder that the medications cannot be that specific, eliciting broad, undesirable side effects.

For that matter, there is another subtle point of the paper: there are certain anxieties that are beyond our direct control.  Directly controlling this region of the brain is like directly controlling heart rate.  Yes, it is possible to both reduce and increase heart rate temporarily depending on the actions one performs, but it is not as simple as saying one wants a change in heart rate.  Who knows what inborn fears are lurking in the deep recesses of our amygdala?

Type Erasure Part II: Bytecode and You

My last post ended in a mystery.  It seems that type erasure can occasionally be circumvented using different return types, despite the fact that return types are not actually part of a method's signature.  I asked this question on Stack Overflow, and I got an excellent response from "irreputable".  

For all of these examples, I'm importing java.util.*.  So here's something that doesn't compile:
 public class Test {
        public int method( List< Integer > list ) {
                return 0;
        }
        public int method( List< Double > list ) {
                return 1;
        }
}



The error is that they "have the same erasure", meaning that the Integer and Double are both replaced with Object, which is how type erasure typically works.  With this in mind, these methods are clearly the same.

Here's something else that doesn't work (bold portions changed from last):
public class Test {
        public int method( List< Integer > list ) {
                return 0;
        }
        public double method( List< Integer > list ) {
                return 1;
        }
}


In this case, the error is actually different: "method is already defined".  This is where things start to get weird.  Return types are not part of method signatures, so that part should be able to be ignored.  Even before type erasure, these methods still have the same type signatures, so this shouldn't work.  Hence, this is why the "method is already defined" is more appropriate than "have the same erasure".


But here's something that does work:
public class Test {
        public int method( List< Integer > list ) {
                return 0;
        }
        public double method( List< Double > list ) {
                return 1;
        }
}


No errors, no warnings, and some simple testing shows that it works exactly like it looks like it should.  The return types aren't part of method signatures, so that can be ignored.  Ignoring return type, this is the exact same method as the first example, which didn't compile for type erasure reasons.  But this inexplicably works.

It turns out that method signatures in bytecode actually do include return types, and that this information can be utilized for overloading, despite the fact that the more basic tutorial-style documentation makes it seem like this doesn't happen.  With this in mind, going through all the examples, the bytecode signatures are as follows (factoring in type erasure):
  1. List -> int; List -> int
  2. List -> int; List -> double
  3. List -> int; List -> double
For #1, this very clearly isn't going to work.  But for #2 and #3, this does appear possible, since there is enough information to differentiate the methods.  #2 doesn't work because the compiler seems to look for duplicate methods before type erasure occurs, and at this stage List< Integer > and List< Integer > are clearly the same type.  #3 gets past this point because List< Integer > and List< Double > are different before type erasure.  Sometime after this point, type erasure occurs, and the conflict of a traditional method signature appears here.  But since return types are technically part of the method signature in bytecode, this isn't an issue - it's possible to store both methods in the bytecode without having a conflict of signatures.  

So the bytecode contains the correct methods.  But how can the compiler know which method is referred to when one calls it?  One could infer this based on how the return type is used, as in double myDouble = method( List< Double > ).  However, we could have just as easily called it like method( List< Double > ), discarding the return value and leaving the compiler without any additional typing information.  So if we can't choose the correct method based on the return type, then how can we choose the method if type erasure makes everything an Object

The answer is that it's only an Object at run time, not compile time.  Since overloading occurs at compile time, this information is available to the compiler.  To my knowledge, however, there is no way to access this information at compile time through code, despite the fact that it is available somewhere.

So what's the problem?  Why can't I overload with different generic type parameters if the compiler can make this judgement call?  The answer is you can, as long as the return types differ.  Again, the problem isn't in the compiler, it's in the actual bytecode/JVM.  It doesn't store generic types in the type signature, so there is no way to differentiate methods by the same name that differ only in generic type parameters.  But since it also stores return type in the method signature (unlike the typical language-neutral definition of "method signature"), it can use this return type to separate the otherwise identical methods.


I've tested this with both Java and Scala, both of which show this behavior, and both of which run on the JVM.  If all of this is correct, then this behavior should be true of all languages that run on the JVM, assuming they allow for method overloading and static typing.  (If they don't then this doesn't apply to you anyway.)


Long story short: return types are part of the method signature in the JVM bytecode.

Saturday, April 2, 2011

Anonymous Functions and Type Erasure in Scala

Before I get into anything, I would like to make it clear that I'm using Scala 2.7.7.final, not the newest (and significantly different) 2.8.1.final.  I'm using for my thesis, and I learned the hard way long ago that you should try to avoid changing your toolset once your underway.

Recently, I've started to really get into the use of anonymous functions.  I'm writing my own higher-order functions, and I like the simplicity of everything.  They can be used as alternatives to a number of design patterns, including Template Method, and tend to have a lot less superfluous syntax associated with them.

However, today I encountered a bit of an issue.  I wrote two higher order functions in the same class whose type signatures differed only in the types used for the anonymous functions.  I was greeted with a double definition error message.  What?

So I delved a bit into the implementation of anonymous functions.  The Scala library defines a series of traits called Function0 through Function22.  These appear to refer to anonymous functions, where the number refers to the number of parameters the function takes.

First thing's first: let's see what happens when you have more than 22 parameters, because that sounds like it'd be entertaining.  So...
scala> ( ( a: Int, b: Int, c: Int, d: Int, e: Int, f: Int, g: Int, h: Int, i: Int, j: Int, k: Int, l: Int, m: Int, n: Int, o: Int, p: Int, q: Int, r: Int, s: Int, t: Int, u: Int, v: Int, w: Int ) => "hahaha" )
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 23
    at scala.tools.nsc.typechecker.Typers$Typer.decompose$2(Typers.scala:1502)
    at scala.tools.nsc.typechecker.Typers$Typer.typedFunction(Typers.scala:1504)
    at scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:3153)
    at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:3358)
    at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:3406)
    at scala.tools.nsc.typechecker.Typers$Typer.computeType(Typers.scala:3457)
    at scala.tools.nsc.typechecker.Namers$Namer.typeSig(Namers.scala:859)
    at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$typeCompleter$1.apply(Namers.scala:415)
    at scala.tools.nsc.typechecker.Namers$Namer$$anonfun$typeCompleter$1.apply(Namers.scala:413)
    at scala.tools.nsc.typechecker.Namers$$anon$1.complete(Namers.scala:982)
    at scala.tools.nsc.symtab.Symbols$Symbol.info(Symbols.scala:555)
    at scala.tools.nsc.symtab.Symbols$Symbol.initialize(Symbols.scala:669)
    at scala.tools.nsc.typechecker.Typers$Typer.addGetterSetter(Typers.scala:1139)
    at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$10.apply(Typers.scala:1219)
    at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$10.apply(Typers.scala:1219)
    at scala.List.flatMap(List.scala:1132)
    at scala.tools.nsc.typechecker.Typers$Typer.typedTemplate(Typers.scala:1219)
    at scala.tools.nsc.typechecker.Typers$Typer.typedModuleDef(Typers.scala:1114)
    at scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:3091)
    at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:3358)
    at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:3395)
    at scala.tools.nsc.typechecker.Typers$Typer.typedStat$1(Typers.scala:1598)
    at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$19.apply(Typers.scala:1643)
    at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$19.apply(Typers.scala:1643)
    at scala.List$.loop$1(List.scala:300)
    at scala.List$.mapConserve(List.scala:317)
    at scala.tools.nsc.typechecker.Typers$Typer.typedStats(Typers.scala:1643)
    at scala.tools.nsc.typechecker.Typers$Typer.typedTemplate(Typers.scala:1221)
    at scala.tools.nsc.typechecker.Typers$Typer.typedModuleDef(Typers.scala:1114)
    at scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:3091)
    at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:3358)
    at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:3395)
    at scala.tools.nsc.typechecker.Typers$Typer.typedStat$1(Typers.scala:1598)
    at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$19.apply(Typers.scala:1643)
    at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$19.apply(Typers.scala:1643)
    at scala.List$.loop$1(List.scala:300)
    at scala.List$.mapConserve(List.scala:317)
    at scala.tools.nsc.typechecker.Typers$Typer.typedStats(Typers.scala:1643)
    at scala.tools.nsc.typechecker.Typers$Typer.typedTemplate(Typers.scala:1221)
    at scala.tools.nsc.typechecker.Typers$Typer.typedModuleDef(Typers.scala:1114)
    at scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:3091)
    at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:3358)
    at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:3395)
    at scala.tools.nsc.typechecker.Typers$Typer.typedStat$1(Typers.scala:1598)
    at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$19.apply(Typers.scala:1643)
    at scala.tools.nsc.typechecker.Typers$Typer$$anonfun$19.apply(Typers.scala:1643)
    at scala.List$.loop$1(List.scala:300)
    at scala.List$.mapConserve(List.scala:317)
    at scala.tools.nsc.typechecker.Typers$Typer.typedStats(Typers.scala:1643)
    at scala.tools.nsc.typechecker.Typers$Typer.typed1(Typers.scala:3084)
    at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:3358)
    at scala.tools.nsc.typechecker.Typers$Typer.typed(Typers.scala:3395)
    at scala.tools.nsc.typechecker.Analyzer$typerFactory$$anon$2.apply(Analyzer.scala:41)
    at scala.tools.nsc.Global$GlobalPhase.applyPhase(Global.scala:267)
    at scala.tools.nsc.Global$GlobalPhase$$anonfun$run$1.apply(Global.scala:246)
    at scala.tools.nsc.Global$GlobalPhase$$anonfun$run$1.apply(Global.scala:246)
    at scala.Iterator$class.foreach(Iterator.scala:414)
    at scala.collection.mutable.ListBuffer$$anon$1.foreach(ListBuffer.scala:266)
    at scala.tools.nsc.Global$GlobalPhase.run(Global.scala:246)
    at scala.tools.nsc.Global$Run.compileSources(Global.scala:574)
    at scala.tools.nsc.Interpreter$Request.compile(Interpreter.scala:820)
    at scala.tools.nsc.Interpreter.interpret(Interpreter.scala:505)
    at scala.tools.nsc.Interpreter.interpret(Interpreter.scala:494)
    at scala.tools.nsc.InterpreterLoop.interpretStartingWith(InterpreterLoop.scala:242)
    at scala.tools.nsc.InterpreterLoop.command(InterpreterLoop.scala:230)
    at scala.tools.nsc.InterpreterLoop.repl(InterpreterLoop.scala:142)
    at scala.tools.nsc.InterpreterLoop.main(InterpreterLoop.scala:298)
    at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:141)
    at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
 


This crashed the REPL, sending me back to the command line.  Yup, entertaining.  (If you need more than 22 parameters, or any number of parameters even close to 22 for that matter, you shouldn't be allowed within 500 feet of a computer.)


Back to actual work.  I constructed the simplest example that showed the same error:
class MyClass {
  def method( fun: Int => Int ) {}
  def method( fun: Double => Double ) {}
}


...which fails at the REPL with...
<console>:6: error: double definition:
method method:((Double) => Double)Unit and
method method:((Int) => Int)Unit at line 5
have same type after erasure: (Function1)Unit
       def method( fun: Double => Double ) {}
           ^


Type erasure.  My old nemesis.  These traits do their magic with generic types.  The syntax Int => Int is really just syntactic sugar for Function1[Int,Int], which explains the type erasure error.  Ho hum.

But wait.  Here's something that does work:
class MyClass {
  def method( fun: Int => Int ) = fun( 0 )
  def method( fun: Double => Double ) = fun( 1.0 )
}

...what?  The only change from the nonfunctional version is that the return types have changed from Unit to Int and Double, respectively.  Although the parameters are specified with generics, the return types (based on those parameters) are not.  However, this still shouldn't matter, as the return type doesn't get factored into the signature.  After all, one can simply not have anything grab the returned value, in which case there is no way to extract return type information.


I probed into this a little further.  Calling getDeclaredMethods on MyClass's class gets two relevant methods with the following signatures:
public double MyClass.method(scala.Function1)
public int MyClass.method(scala.Function1)


Ok, now I'm really confused.  It doesn't appear that there is any additional magic that Scala's performing behind the scene.  So the next move was to get the Scala compiler out of the picture, and do this in Java with:
 import scala.*;

public class Test {
    public int method( Function1< Integer, Integer > fun ) {
        return 0;
    }
    public double method( Function1< Double, Double > fun ) {
        return 1.0;
    }
}



This compiles.  What?  What?  So the next move was to get Scala out entirely.  Two classes were needed:

public class Pair< T, U > {
    public T first;
    public U second;
}

public class Test2 {
    public int method( Pair< Integer, Integer > pair ) {
        return 0;
    }
    public double method( Pair< Double, Double > pair ) {
        return 1.0;
    }
}
This compiles.  However, if you change the return types to be the same, it doesn't.  I'm actually at a loss at this point.  I went back and read a number of basic materials from Oracle, including materials on type erasure and method overloading (note that the method overloading information includes type signature information).  Based on the documentation, this shouldn't work.  If anyone has any insights, that would be great.  I posted the reason why in my next post here.  Tricky, tricky!

Sunday, March 20, 2011

Diamonds: A Girl's Best Friend, and Cancer's Worst Enemy

I recently read an article about the potential use of nanodiamonds in the treatment of certain cancers, published in Science Translational Medicine (link).  Before getting into the diamonds themselves, a little background into why they are needed in the first place is necessary.

All things considered, the body is typically very good at getting rid of toxins.  Our world is surrounded by all sorts of things which could be considered toxic, but the typical healthy person is usually unaffected.  This is thanks in part to certain proteins in the body that cells use to pump toxins out.  In a healthy person, this is certainly a useful function.  If the toxins are allowed to remain in cells, then they are permitted to run their course, doing damage on the cellular level.  Ultimately this can lead to things like cell death or cancer.

Such toxin removing pumps seem to be a good thing.  However, consider now a chemotheraputic drug, a drug intended to treat cancer.  Such drugs are usually themselves toxic.  In fact, most chemotheraputic drugs work because they are disproportionally toxic to cancer cells as opposed to body cells.  Unlike the majority of antibiotics which specifically target bacteria, chemotheraputic drugs tend to be more general.  This is why there are often so many negative side effects of chemotheraputic drugs: normal body cells are killed along with cancer cells, but cancer cells are killed far more frequently than body cells.

With that in mind, consider a cancer cell exposed to a chemotheraputic drug.  This is a toxin that threatens to destroy the cell.  Obviously, it is in the cell's best interest to get rid of such a toxin before it can run its course.  If the cell is equipped with pumps for getting rid of toxins, then suddenly this seemingly life-saving device becomes a means through which cancer can proliferate.  The cancer cells pump out the drug, preventing damage to themselves while still damaging surrounding healthy tissue.

At this point, the cancer cells have become drug resistant.  Worse yet, it's possible for the same pump to get rid of multiple different chemotheraputic drugs.  In other words, if the cancer has built up a tolerance for drug X, then there is a significant chance that it has built up a tolerance to drugs Y and Z, as well.  In the real world, this is an all to common occurrence; the paper cites that this occurs in more than 90% of treatment failures of metastatic cancers.

Ultimately, the problem isn't that the drug has lost effectiveness, but that the cancer is able to get rid of the drug before it can do any significant damage.  If there was a way to bypass the pumps and force the drug to stay alongside the cancer, then the treatment would still be effective.

This is where the nanodiamonds come in.  Through a specific process, one can attach chemotheraputic drugs to nanodiamonds, and then inject the coated diamonds into patients.  Though the diamonds are small, they are large enough to become lodged in tissue for several days at a time.  While the coated diamonds are in tissues, they constantly expose the given tissue to the drug.  Simply pumping the drug out is not possible, as the drug is physically forcibly in the tissue thanks to the diamond.

Although it may sound dangerous to have such diamonds in the body, the authors of the paper found no negative side effects of diamonds alone.  They left the body typically within 10 days, with the vast majority of them leaving in under seven days.  The body showed no active immune response in trying to get rid of the diamonds, nor did it show any attempts to break down the diamonds.  It was almost as if the body didn't detect they were there, and they didn't seem to cause any harm on their own.

Using a certain chemotheraptic drug, they ran a series of trials on mice.  They found that the drug alone is more effective than the drug bound to the diamonds on cancers that are not drug resistant.  Given that the entire point of the diamonds is to provide a mean to treat drug-resistant cancers, this isn't particularly important treatment-wise.  What is important is that the drug with the diamonds was far more effective than just the drug on resistant cancers.  In fact, for one group of mice with drug-resistant cancers, the effective dose of the drug without the diamonds was sufficient to kill the mice.  With the diamonds, they could give an effective dose at levels which were far less harmful to the mice, to the point where mice didn't die from the treatment but were still being treated effectively.

To me, the diamonds seem to have a lot of potential, though there is a long way to go.  The paper notes that people have been trying all sorts of similar delivery systems of chemotheraputic drugs to drug resistant cancers with limited success.  Usually the delivery system itself is toxic, though in the case of nanodiamonds it tentatively looks safe.  However, there is still a long way to go.  I would have liked to see more indicators that the diamonds were nontoxic.  Additionally, a trial on humans is necessary before we can actually know if this would be effective or not in the real world.  That said, things sound hopeful.

Sunday, February 20, 2011

"Cancer" is too Generic

An article was recently published in Nature that looked at genome-wide differences between seven different prostate cancer tumors (link).  The results are pretty amazing.  The authors took advantage of massively parallel sequencing, which makes it possible to analyze entire human genomes in a fraction of the time that previous methods required.  By looking at the whole genome at once, instead of individual genes as with the traditional approach, it becomes possible to see the proverbial forest among the trees.

By looking at the whole genome, the authors were able to see all the different kinds of mutations that were occurring in prostate tumor cells at once.  This revealed the general types of mutations that were occurring.  For one, it was found that a certain mutation that causes a DNA strand to be off by around 2 basepairs were fairly common in the tumor cells.  Such mutations are also seen in breast cancer tumor cells, though they are much less common.  In fact, the most common type of mutation seen in breast cancer tumor cells is not as common in the prostate cancer tumor cells.  Considering that these are both cancerous tumors, this is somewhat surprising.  They look and act in similar ways, yet internally they are quite different.

They also found that chromosomes were frequently rearranged in a variety of ways in the prostate cancer tumor cells.  These rearrangements, although common, seemed to be a side effect of the cancer, instead of its cause.  I say this because there was little pattern among the tumors; of the seven, there were seven very different patterns of chromosomal rearrangement.  If prostate cancer is being caused by some DNA repair mechanism being turned off, then these results are to be expected.  Such rearrangements may naturally occur by chance, but a cell with a functioning repair system would either be able to fix them, or destroy itself via apoptosis (sacrificing itself for the benefit of the person as a whole).  Without such functioning repair mechanisms, such gross mutations would be able to occur with little bound.

While the rearrangements were seemingly random, certain genes were found to commonly be mutated.  Some of these genes are very crucial to the proper functioning of the cell, namely histones.  Histones wrap around DNA, and can prevent the genes in wrapped DNA from being expressed.  Considering that all your cells are genetically identical, expressing all genes at once for a single cell is almost certainly detrimental.  A liver cell is a liver cell because it expresses different genes than a lung cell.  With the mutation of histones, however, gene expression can radically change.  Cancer is a logical result of such a dramatic, mostly random change in gene expression.  That said, I still think the histones are more of a side-effect, or perhaps a co-effect.  They are at a deeper level than the chromosomal rearrangements, but I'm not sure if they are at the root of the cause.

The greatest take-home point of the paper is that prostate cancer is very unique.  Given that the technology to make this study was only recently made available, there is a lacking of data on the genome-wide uniqueness of other cancer types.  Personally, I would expect such studies on other tumor cell types to reveal more uniqueness.  Even though cancer generally presents in the same sort of ways across all tumor types, this type of study shows that the similarities may be more superficial than we think.  At a deeper level, there are great differences between the different kinds of cancers.  A cure for breast cancer or prostate cancer may be possible, but a general, all-purpose cure for cancer may be far fetched.  Although this isn't the positive message one may have hoped for, it does open new lines of inquiry.  Besides this, it is helpful to know that these things are more different than they appear, lest we spend resources unnecessarily to find a general cure that can't possibly exist.

Ethical Issues of Direct-to-Consumer Genetic Tests

I recently read a study that analyzed the effects of direct-to-consumer genomewide profiling on customers of such services (link).  Before I get into the study itself, some background is neccessary.

Arguably all diseases and disorders have a genetic component to them.  For certain ones, such as cystic fibrosis, there is a clear genetic link.  For other disorders, such as Parkinson's disease, the link is not so clear.  These two disorders are at opposite ends of the spectrum. 

Now consider something like high cholesterol.  High cholesterol is a major health concern in the US.  Diet and exercise influence cholesterol levels greatly, but there is an established genetic component as well.  For some people, cholesterol levels are naturally elevated simply due to their genetic makeup.  Such people may not know this, or may not know what to do about it.

This is where a genetic test can be helpful.  Such tests can reveal underlying health information that would otherwise be unavailable.  In theory, one can gain a better understanding of certain health risks through these tests.  In the case of high cholesterol, if one sees that he is genetically at a higher risk than the general population, he could take action as a preventative measure.

Such is the theory, anyway.  The problem is that this isn't what people do.  The study found that few people made any life changes whatsoever after gaining insight into their genetic makeup.  They also found that in the majority of cases, there was no substantial amount of additional anxiety due to the tests, although many people did share their tests results with a doctor.  Part of this might be explained by the fact that the test group wasn't representative of the general population, but of people with more than a passing familiarity of such tests.

It cannot be emphasized enough that this was not the general population.  I suspect that the general population would show a much greater degree of anxiety, with people taking improper actions in response to getting their tests results.  Why do I think this?  Because the general public doesn't understand what the results mean.  Even for something relatively simple like cystic fibrosis, where one can or cannot have the disorder (with little in between), there can be confusion.  Someone may find that he is a carrier of the allele that causes cystic fibrosis.  This would mean that there is increased risk of one's offspring of having the disorder (assuming his mate's cystic fibrosis gene status is unknown).  It does not, however, mean he has the disease, nor does it mean he will ever get the disease.  That said, the expected reaction of the general public to seeing a positive result for anything involving cystic fibrosis would be of horror.  This is not the proper response, but without anything else to go on, panic tends to be common.

I'm not trying to say that people are stupid, or anything like that.  The study itself points out that even ~90% of doctors don't feel they have the necessary background to be able to understand and interpret the results of such genetic tests.  This is not some systematic failure of education, but simply due to the fact that most doctors never need to know such information.  It's just not part of the job description.

Everything previously mentioned assumes that the test makers are doing everything ethically and right.  Of course, reality is something far different.  Test makers know that there isn't a good general understanding of genetics, and they use this to their advantage to market products.  For example, many of the genomewide tests test for very hazy things, such as diabetes or Parkinson's.  We still don't know exactly what causes these diseases, and at best we have identified a fairly wide assortment of genes that seem to be linked to the disorders.  As to how they are linked, we usually don't know.  If we can't even decide if a gene is involved with a disease, how is knowing what allele we have for this gene helpful in determining our risk factor for the disease?  Yes, there is a test, and it is looking for a specific allele, but we don't know what it means to have that allele.  In this case, we get a result, but no one really has any clue what it means.  To someone with no understanding, one sees "Diabetes....positive", and it leads to immediate panic and perhaps irrational decisions.

Many of these tests are in ethically murky territory.  The companies making the tests are in it to make a profit, not to actually help anyone, though this isn't clear without a deep analysis of what they are testing.  There are few lies, but lots of deception (personally I define lying as saying something one knows to be false, and deception as neglecting to say something one knows is true).  Perhaps to save face, many companies offer genetic counseling after one receives test results.  Now, the test may be a few hundred dollars, but usually the counseling is several hundred, if not over a thousand dollars.  Even in the study, where genetic counseling was offered for free, most people didn't take advantage of these services.  Personally, this baffles me, and the only thing that comes to mind is that the majority of the people in the (biased) sample set already had a thorough understanding of what the test results meant.

I think that there is good that can come of such testing, but it has to be well-regulated.  The current system takes advantage of naive people for the sole purpose of profit, and they are very good at it.  To expect a decent enough understanding of genetics for the general person to understand such tests is absurd; others must get involved for the purposes of genetic counseling.  With regulation, there is hope for it, but in its current state it needs to be stopped completely. 

(Random note: such testing is illegal in New York state, as one cannot see the results of a genetic test without being in the presence of a doctor, so that the doctor may explain the results.)

Thursday, February 10, 2011

The JVM and Type Erasure

Type erasure is the term for what is, in my opinion, one of the dirtiest hacks in existence.  Before getting into that, let's start with a story.

Originally, Java did not have support for generics.  That meant that code like:
List< Integer > list = new ArrayList< Integer >();
list.add( new Integer( 4 ) );
System.out.println( list.get( 0 ).intValue() + 5 );

...would instead be...
List list = new ArrayList();
list.add( new Integer( 4 ) );
System.out.println( ((Integer)list.get( 0 )).intValue() + 5 );

Ew.  Ew.  Granted, the type definition is shorter, but that's only because it has less information to store to begin with.  Then we have to do this ugly cast.  Not only that, we're free to do such terrible things as:
list.add( new Integer( 4 ) );
list.add( list );
list.add( new Random() ); 

The compiler puts absolutely no restriction on what goes in, as long as its an object.  This can obviously lead to bugs, as someone can end up with a heterogeneous collection of objects if one isn't careful.  Though this can be checked at compile time, this check is deferred until the class cast at run time.  Just ew overall.

In Java J2SE 5.0, this all changed.  Support for generics was included, and the people cheered.  But, at what cost?

The JVM can't do this sort of magic.  (I've heard that .NET can, hence why this post is titled "The JVM and Type Erasure".  I've never dabbled with any of the .NET languages, though I'd consider this a plus for them.)  Java runs on the JVM, so then Java can't do it either.

But it does do it, right?  I mean, it has to.  We have type-safe collections and generic types!

No, it's lies, all lies!  Yes, there is a check performed, and this check asserts type safety.  However, the semantics are identical.  In other words, you're wonderful, generic-using type-safe code below:
List< Integer > list = new ArrayList< Integer >();
list.add( new Integer( 1 ) );
list.add( new Integer( 2 ) );
list.add( new Integer( list.get( 0 ).intValue() + list.get( 1 ).intValue() ) );

...is actually converted into this at compile time:
List list = new ArrayList();
list.add( new Integer( 1 ) );
list.add( new Integer( 2 ) );
list.add( new Integer( ((Integer)list.get( 0 )).intValue() + ((Integer)list.get( 1 )).intValue() ) )


It's a trick!  A dirty trick!  Yes, it's type safe, but this safety is asserted by the compiler at a certain stage of compilation.  After this stage, it technically isn't, but the previous stage made sure that nothing bad would happen from these class casts.

Maybe I'm being too hard on it.  It is type safe, and it accomplishes roughly the same thing as a C++ template.  Who cares that it's implemented in such a way, besides maybe someone obsessed with performance?

Well, the evil does descend into other places.  For example, even though one has to pass solid types to instantiate a generic type, this typing information isn't actually available to the programmer.  Consider the type variable "T" in a generic class.  Even though "T" must be instantiated to some actual type to form instances of the class, the person writing the class has no idea what "T" is at compile time.  For example, consider the following overloaded method definition:
public int myMethod( List< Integer > list ) {...}
public int myMethod( List< String > list ) {...}


If you try to compile this, javac greets you with the following error message:
name clash: myMethod(java.util.List<java.lang.String>) and myMethod(java.util.List<java.lang.Integer>) have the same erasure

This is because after type erasure, these two methods actually both look like:
public int myMethod( List< Object > list ) {...}
 
...because erasure simply replaces whatever the type is with "Object".

A related issue is that Java doesn't let you make a generic array.  For instance, if you try:
public class Test< T > {
  public T[] array = new T[5 ];

  ...
}

...javac greets you with:
generic array creation

So what if you want to create a type safe, generic array?

You don't.  Yay Java.

Well, ok, that's not *quite* true, but it's close.  First, you're strongly encouraged to use something that's already type safe like an ArrayList, which can do pretty much everything an array can do.  Then if you're still complaining, people tell you to use newInstance, as part of the Array class.  newInstance will return a type safe array, but you need to pass it a Class<T> object.  How do you get this?  You explicitly specify it.  Seriously?  Seriously?  This is not a solution, this is just asking for more errors.  Effectively, you end up specifying it twice, once as < ClassName > for the generic type, and a second time as ClassName.class for the class object correlating to the class.  There is nothing from stopping you from specifying < String > and Integer.classSo you end up with a type safe array, but it has been replaced with a very type unsafe operation.  So much ick.

Don't get me wrong.  Generics are better than no generics.  I'm just saying that there are a tremendous number of associated gotchas.