Bug #4702: KAR to module conversion utility - Kepler - Ecoinformatics Redmine

Actions

Copy link

Bug #4702

closed

KAR to module conversion utility

Added by Aaron Aaron over 14 years ago. Updated about 14 years ago.

Status:

Resolved

Priority:

Immediate

Assignee:

David Welker

Category:

core

Target version:

2.0.0

Start date:

01/27/2010

Due date:

% Done:

Estimated time:

Bugzilla-Id:

4702

Description

Create a utility that converts KAR files (version 1.0 and 2.0) to Kepler modules.

This was previously bug 1750.

Related issues

Actions

Copy link

Updated by Chad Berkley about 14 years ago

What was the reason for needing this again? Do we need this for 2.0?

Actions

Copy link

Updated by Aaron Aaron about 14 years ago

This bug is for backwards compatibility. Since 2.0 kars do not allow for java classes or jars this utility would allow people to convert any 1.0 KARs that contained jars or java classes into modules that can be used in the 2.0 system.

It may also help solve the .kepler upgrade from version 1.0 to 2.0. A utility to write actors and workflows from the v1.0 .kepler "cache" to v1.0 KARs could be developed, and the output of that utility would be fed into the input of this utility to create modules that are 2.0 compatible.

Actions

Copy link

Updated by David Welker about 14 years ago

Do any actual kar files from 1.0 actually contain jars? If so, is the number sufficient to justify building a utility to do this conversion automatically rather than having people just create modules using the tools already in place?

If the goal is backwards compatibility such that a 1.0 kar that actually uses the ability of 1.0 kars to store jars actually loads in 2.0 has significant implications. The main problem is determining where this automatically-created module should be inserted into modules.txt. Perhaps it should be inserted at the highest priority position. But this does risk havoc in case the jar provided by the kar contains incompatible overrides, contains unstable code, or contains semantically incompatible code with the code that already exists in the modules specified in modules.txt. In effect, automatic conversion is another name for distributing suites to people that have not been tested. The result would be that Kepler crashes or behaves in an unpredictable manner in some cases. (Although, perhaps such overrides would be rare.)

Should a module automatically created from a 1.0 KAR that contains jars be inserted into modules.txt automatically? Should any module? Does it violate expectations to modify modules.txt without user or developer control? Up to now, modules.txt has been under the control of developers with the option for advanced users to tweak it using the Module Manager. To make these sorts of 1.0 KARS "automatically" compatible with 2.0 would require modifying modules.txt. At the very least, we would need to prompt the user asking them if they really want to do this. Keep in mind also that if such a 1.0 kar were loaded, this would require a restart of Kepler.

My own preference is for any 1.0 KARS that do contain jars to be converted to modules manually. I am not even sure any such 1.0 KARS exist. But, if they do, I believe that manual conversion is best, since that way there is an opportunity to test them as part of a working suite before distributing them. Automatic conversion and modification of modules.txt is basically a method of distributing untested software. It is true that the Module Manager allows the mixing and matching of modules to create unique modules.txt files that may or may not work. But this is an advanced feature which will be used only by advanced users/developers who have the time and skills to determine an appropriate ordering for modules.txt or determine the source of inevitable problems when they do occur OR by more casual users who are following the advice of more advanced users/developers who have already tested a particular combination of modules and found that it works. Basically, if you are playing around with modules.txt, you should not be surprised if Kepler crashes or exhibits other unexpected behavior. Therefore, playing around with modules.txt is not something that would be recommended for users who are not prepared to deal with unpredictability and frustration. Because an automatic conversion utility that allowed Kepler 2.0 to read 1.0 KARS that contain jars would modify modules.txt (in essence, as soon as the 1.0 KAR is loaded, Kepler becomes essentially "untested" software), I do not think that such a utility is a good idea.

On a related note, I think the most common use case for the Module Manager for everyday users will be to simply load a suite. A suite represents an ordering of modules in modules.txt that has been tested. A suite, therefore has much more reliability than other ad hoc ordering of modules.txt that are enabled by the Module Manager or that would be created by automatic mechanisms, as proposed here.

I think moving forward post 2.0, we might think of situations where automatic insertion of modules into modules.txt might be considered "safer" (although it will never be entirely safe, since a failure in the execution of code in one module can always bring down the JVM and, therefore, the rest of Kepler along with it.) It may even be useful in the future to even think about the situations where certain modules could be added to the mix without requiring a restart. (Such modules would need to be known to contain no overrides, to address a discrete area of functionality that is non-foundational, and be semantically compatible with existing modules - i.e would need to conform to the principles that enable hot deployment.)

For now, the following points must be kept in mind:

(1) Any changes to modules.txt could destabilize the system. Therefore, such changes must not be invoked casually.

(2) A different ordering of modules.txt is a different software product. Therefore, any automatic changes to modules.txt is, in effect, distributing new and untested software. Users accessing features that make automatic changes to modules.txt (if we decide to implement any such features) should not be surprised if Kepler crashes or exhibits unexpected behavior.

Even if in the future, we were to research and make use of hot deployment to the greatest extent possible, these points would still be true. At best, hot deployment would be a feature that is more convenient for developers, as it would allow them to change the mix of modules without restarting Kepler. But, since a newly loaded module is capable of causing the JVM to crash, even hot deployment is not going to guarantee stability for our users. There is no and never can be any entirely safe way to change modules.txt, except by having developers actually test and certify particular orderings. This is precisely what makes a suite a suite. A suite is an ordering of modules.txt that has been purposely developed by a particular group or individual and tested. In that sense, a suite is as much or more a social construct as it is a technical construct.

Actions

Copy link

Updated by David Welker about 14 years ago

I should point out that there is a less ambitious interpretation of this feature request. And that is, not that we are developing a tool that allows Kepler to automatically load 1.0 KAR files that contain jars but that we are developing a utility that simply turns such KARs into modules. Such modules would then be tested in the context of a particular modules.txt file by a developer, who would then distribute an appropriate suite containing the new module. But I do not think there would be any justification for this less ambitious developer-centric tool, unless there existed a significant number of 1.0 KAR files that distribute their own jars. After all, the only work that would be saved in that case is the work it takes to extract the jar and put it in the lib folder of a new module, while putting the actual workflow or actor in an appropriate location within the module. The amount of work that it would take to learn the command that performed this task for you would be about the amount of time it would take to do it manually for one such 1.0 KAR file with jars. So, unless someone is facing a task of having a lot of 1.0 KAR files with jars into separate modules, I do not think there would be any justification for the less ambitious interpretation of this feature request.

Actions

Copy link

Updated by Aaron Aaron about 14 years ago

David, thank you for explaining to us why having a centralized modules.txt that contains prioritized dependencies is undesirable for a modular system. This is also an issue of using the same Classloader to load all of the classes that are running in Kepler, as opposed to a separate Classloader for each module as other modular systems do (the inclusive approach as opposed to the exclusive approach that you and Tim argued for so long ago). But we have come too far down this path now to solve it at this point and we don't have the resources to design and develop such a system on our own. So perhaps documenting the issues is all we can do for now.

Actions

Copy link

Updated by David Welker about 14 years ago

Aaron,

I don't believe that this is the proper forum for you to air your opinions about the desirability of options that you concede are not feasible.

But, since you have decided to, I will just say your just plain WRONG.

There does not and cannot exist a module system where you can just unthinkingly load any module regardless of what it does. A separate class loader does not ensure semantic compatibility between modules nor does it protect against unstable modules. On the downside, it is less powerful since it does not give the option for overrides. What is the advantage exactly?

What CAN be done in both cases is this. You can design certain core components which provide a framework and THEN assert that non-core components (call them actors) will work, provided that they are stable and do not infringe upon any of the core functionality of the framework. That is, you can think of Kepler as a server, and then adding functionality within certain boundaries.

The thing is, as of now, we haven't defined those boundaries. And until we do, we have to assume that changes to the mix of modules could be problematic.

By the way, I have installed plug-ins in eclipse that have caused eclipse to crash, and that uses your precious OSGi. In fact, I have gotten eclipse into such bizarre states that the easiest solution was simply to re-install eclipse. So, instead of going off with useless comments, maybe you should recognize that these are non-trivial problems. The bottom-line is this. If you want to add modules as opposed to switching between tested suites, you are going to have define standards that modules that are meant to be added must have in order to be considered safe. As is demonstrated by my experiences where plug-ins have crashed eclipse, this is true for OSGi as well. A separate class loader for every module does not somehow make modules magically compatible with each other. And it certainly does not ensure that you can hot-load a module.

Any other questions? Oh wait, that wasn't a question, was it? It should have been.

Actions

Copy link

Updated by Aaron Aaron about 14 years ago

Thanks David, I agree that it is a non-trivial problem. One that we shouldn't have tried to tackle ourselves. Best of luck though, the OSGi guys spent 10 years and many millions working on theirs. Pride is blinding.

Actions

Copy link

Updated by David Welker about 14 years ago

Aaron,

This is an inappropriate conversation for this forum. This issue has already been decided. You have already had the opportunity to express your views and you failed to persuade others. Deal with it. When it comes to Kepler, no one always gets what they want or think best. This definitely includes me.

We had good reasons to take a different approach. We decided against OSGi for many reasons including reasons:

(1) The complexity that it would foist developers. For example, take this comment from an OSGi-advocate about his experiences trying to implement OSGi at IBM:

<blockquote>
"As you may imagine, one of the most important goals was to hide OSGi as much as possible. No services, no Import-Package (but Require-Bundle on one aggregation bundle), basically reduce OSGi to its minimum. Well, the appreciation was… limited. All people I was working with were exceptional bright and determined researchers in MLP. They developed highly sophisticated algorithms on how to analyze unstructured data, but only a few were software engineers. So the code was… working, but not production ready. Enforcing them to apply rules that ultimately made it harder for them getting things done and as a result slowing them down wasn’t something they welcomed very much."
</blockquote>

Our primary mission at Kepler is to facilitate the development of software that advances the interests of scientists, not slow it down. In the quote above, the OSGi advocate explains the frustrations of trying to implement OSGi within a SINGLE institution. We at Kepler are responsible for creating a platform for MANY institutions. Adding the complexities of OSGi to the development process to achieve a small incremental advantage in hiding the implementation of modules from each other (while sacrificing the ability to do overrides) was discussed and rejected, given our mission. It makes more sense to stick to mainstream Java development rather than have kepler-dev become a place where a primary activity is the fielding a constant barrage of OSGi-related questions.

(2) OSGi may have been developed over 10 years, but it was developed outside of the Java Community Process and the problems it was meant to address are different than the problems we are trying to address. The original context of OSGi was primarily cell phones, where the "modules" tend to be unrelated apps. Also, there is the issue of security, where want to close off your module from others. In our context of scientific development, we want our modules to be more open and there it is both permissible and desirable for one module to modify the functioning of another in the context of a particular suite. The overheads of OSGi may be a great solution for cell phones and for development teams within a single organization. That does not mean it is a great solution for us.

(3) Before the flexibility of overrides, many participants in Kepler have found it impossible collaborate with each other. As a result, they tended to fork their own implementations of Kepler with whatever add-on functionality they wanted. If we had made the decision to go with OSGi, that would have been the end of overrides and would have lead to developers to tend to fork. The simple fact of the matter is, the negotiation costs of making every minor change you need in someone else's code is simply too high when you have a deadline to solve a particular scientific problem that is not primarily about Kepler, but used Kepler. Another point is that OSGi was developed over 10-years has another implication which you probably would prefer not to talk about. It is old software. The thing about old software is that mistakes tend to be built into the software and are hard to get rid of. Take for example the issue of closures in Java. The reason that closures were not included in Java in the first place was because of time pressure related to doing the first release. But, it was found that callbacks in Java were simply too tedious. Having inadequate time to implement closures, the creators of Java settled on allowing anonymous class definitions with methods. This is an ugly solution, to say the least. But, it has remained in place. Only now, with the release of JDK 7 in September, is Java going to include support for closures. The thing about old software like OSGi or the JDK is that old mistakes or implementations built for a different context tend to persist. That OSGi is really really old at 10 years of age, is not necessarily an argument in its favor.

(4) With the release of JDK 7, which is currently scheduled for September, there will be built-in Java language features that facilitate modules. Also, JDK 7 itself is being broken up using Project Jigsaw, a project which has language features being developed to support it and which will also be available to others who are interested in taking a similar approach to modularization. IF we were to use a third-party rather than custom approach to modules, it would be much more sensible to compare OSGi to the approach taken by Project Jigsaw. After all, there is a REASON that Sun decided to break up the JDK itself using a new approach rather than using OSGi. Apparently, they didn't want to be saddled with all of the complexities and old decisions of OSGi either. This should be a huge red flag telling you to look before you leap.

(5) Have you installed something using Eclipse lately? I have. I just installed a tool for the Google App Engine. And guess what. I had to restart Eclipse in order for it to work properly. It only partially worked before restarting. Hot deployment using OSGi apparently doesn't always work out so well after all. I m not blaming OSGi, but you would be a fool to think it magically solves our problems, really reduces complexity, or lessens the amount we need to understand with respect to such issues. As developers, we are going to have to research and understand in depth all the issues that make hot deployment successful or not successful and provide guidelines to module developers IF we even decide that this is that desirable of a feature, regardless.

The bottom-line is this. The decision to not go with OSGi was carefully thought through, whether you agree with it or not. You had your chance to make your case. You failed to persuade others. That should be the end of it. Certainly, using Bugzilla right before the 2.0 release to air your grievances about decisions that have already been made and that you participated in is simply not optimal.

Actions

Copy link

Updated by Aaron Aaron about 14 years ago

This is an inappropriate conversation for this forum.

And what is the appropriate forum?

This issue has already been decided.

And who decided this issue? The managers didn't care either way as long as the requirements were met. Have we met the requirements? Can a User develop a module, download it and boot it up in Kepler with a high reliability of success?

You have already had the opportunity to express your views and you failed to persuade others.

I failed to persuade you, you mean. And tell me, have you ever built any OSGi modules? You certainly hadn't when I was proposing OSGi. You spent 4 months of development on the build system before I ever started evaluating OSGi as a solution and then you fought it vigorously because you had already decided to develop everything yourself from scratch. When we were supposed to be doing design work you were developing. You never truly evaluated any other technology and from what I can tell you still haven't (reading articles doesn't count, go and use the technology). You chose a solution on your own and you implemented it. The managers caved because I gave up! Adopting a standard is something we needed to decide as a team, I could not make that decision on my own. And quite frankly I couldn't stand to work with you anymore.

Deal with it. When it comes to Kepler, no one always gets what they want or think best. This definitely includes me.

I have to deal with it every day.

We had good reasons to take a different approach. We decided against OSGi for many reasons including reasons:

(1) The complexity that it would foist developers. For example, take this comment from an OSGi-advocate about his experiences trying to implement OSGi at IBM:

<blockquote>
"As you may imagine, one of the most important goals was to hide OSGi as much
as possible. No services, no Import-Package (but Require-Bundle on one
aggregation bundle), basically reduce OSGi to its minimum. Well, the
appreciation was… limited. All people I was working with were exceptional
bright and determined researchers in MLP. They developed highly sophisticated
algorithms on how to analyze unstructured data, but only a few were software
engineers. So the code was… working, but not production ready. Enforcing them
to apply rules that ultimately made it harder for them getting things done and
as a result slowing them down wasn’t something they welcomed very much."
</blockquote>

You should really include references to your quotes, I found it here:
http://osgi.mjahn.net/2009/07/01/osgi-vs-jigsaw-why-cant-we-talk/

Towards the bottom the writer, Mirko Jahn, defines 6 "Properties of a true module system"
- Isolation
Our build system does not isolate modules it combines them all together using the same classloader and a single classpath
- Information Hiding
In the core module, I have written many public methods that I need in other classes in core but do not want exposed to other modules.
- Enabling Reuse
Can someone who creates modules in our module system reuse them elsewhere?
- Predictability
Overrides in the system causes Unpredictability (btw, you can do overrides in OSGi so please stop using that argument)
- Flexible binding
The author claims that no modular system has achieved this. Has ours?
- Robustness
You experience this in Eclipse, and every other piece of software you have ever run, including Kepler.

The fact that you include this quote makes me wonder if you think our software engineers are not capable of learning OSGi or some other modularization standard? I have worked with everyone here and I have developed simple OSGi applications. And I can assure you that it is not that hard to do. Everyone on our team could come up to speed on OSGi in a couple weeks and be proficient with it. The fact that Mirko's team of mostly non-software engineers had trouble with it is hardly a concern to me.

Our primary mission at Kepler is to facilitate the development of software that advances the interests of scientists, not slow it down. In the quote above, the OSGi advocate explains the frustrations of trying to implement OSGi within a SINGLE institution. We at Kepler are responsible for creating a platform for MANY institutions. Adding the complexities of OSGi to the development process to achieve a small incremental advantage in hiding the implementation of modules from each other (while sacrificing the ability to do overrides) was discussed and rejected, given our mission. It makes more sense to stick to mainstream Java development rather than have kepler-dev become a place where a primary activity is the fielding a constant barrage of OSGi-related questions.

I am amazed that you have the hubris to make this prediction without knowing anything about OSGi! You are convinced that it is so hard because you never bothered to learn it. Your lack of understanding was evidenced to me when I asked about module version ranges in a meeting in October which resulted in this thread-> https://kepler-project.org/developers/kepler-development-forum/build-and-release-team/274120463#500806490
Do you think many people will take the time to learn how Kepler modules work? Do you think we won't get a constant barrage of Kepler module system-related questions? Especially when the system can't do what people want or expect? At least with OSGi (or whatever other standard that is well used out there) there are books and mailing lists and blogs and webpages and all kinds of other sources to help out.

(2) OSGi may have been developed over 10 years, but it was developed outside of the Java Community Process and the problems it was meant to address are different than the problems we are trying to address. The original context of OSGi was primarily cell phones, where the "modules" tend to be unrelated apps. Also, there is the issue of security, where want to close off your module from others. In our context of scientific development, we want our modules to be more open and there it is both permissible and desirable for one module to modify the functioning of another in the context of a particular suite. The overheads of OSGi may be a great solution for cell phones and for development teams within a single organization. That does not mean it is a great solution for us.

The "original context" or how OSGi was developed is inconsequential. It is now a leading java modularization standard and is widely used.

(3) Before the flexibility of overrides, many participants in Kepler have found it impossible collaborate with each other. As a result, they tended to fork their own implementations of Kepler with whatever add-on functionality they wanted. If we had made the decision to go with OSGi, that would have been the end of overrides and would have lead to developers to tend to fork. The simple fact of the matter is, the negotiation costs of making every minor change you need in someone else's code is simply too high when you have a deadline to solve a particular scientific problem that is not primarily about Kepler, but used Kepler.

You can do overrides in OSGi, I am amazed that you don't remember when I discovered this! See the bottom of this page
https://kepler-project.org/developers/teams/framework/design-docs/trade-studies/osgi-adoption/steps-for-kepler-conversion-to-osgi
that links to this page
http://wiki.eclipse.org/Steps_to_use_Fragments_to_patch_a_plug-in
and there are probably better sources about how to do this by now.

Negotiation costs can be handled many ways, certainly, overrides, also forking (i.e. copying modules to new modules and modifying them), and even better! actually talking to the original developers to help them improve their APIs (imagine that! people actually talking to each other). This is a fact of life, overrides are only a quick fix and spell disaster down the road when the overridden code is changed, breaking the override. We've already run into this several times and it is a quick fix solution for a fix that could have been handled easily and properly the first time.

Another point is that OSGi was developed over 10-years has another implication which you probably would prefer not to talk about. It is old software. The thing about old software is that mistakes tend to be built into the software and are hard to get rid of. Take for example the issue of closures in Java. The reason that closures were not included in Java in the first place was because of time pressure related to doing the first release. But, it was found that callbacks in Java were simply too tedious. Having inadequate time to implement closures, the creators of Java settled on allowing anonymous class definitions with methods. This is an ugly solution, to say the least. But, it has remained in place. Only now, with the release of JDK 7 in September, is Java going to include support for closures. The thing about old software like OSGi or the JDK is that old mistakes or implementations built for a different context tend to persist. That OSGi is really really old at 10 years of age, is not necessarily an argument in its favor.

I agree, if there is a newer modularization system that has solved the problems encountered in OSGi (every system has problems btw) then lets use that one. This is not about OSGi, it's about using a standard modularization system that can work in other systems, and that allows us to use other components in our system.

(4) With the release of JDK 7, which is currently scheduled for September, there will be built-in Java language features that facilitate modules. Also, JDK 7 itself is being broken up using Project Jigsaw, a project which has language features being developed to support it and which will also be available to others who are interested in taking a similar approach to modularization. IF we were to use a third-party rather than custom approach to modules, it would be much more sensible to compare OSGi to the approach taken by Project Jigsaw. After all, there is a REASON that Sun decided to break up the JDK itself using a new approach rather than using OSGi. Apparently, they didn't want to be saddled with all of the complexities and old decisions of OSGi either. This should be a huge red flag telling you to look before you leap.

I have no problem with using Jigsaw, have you developed with it? How does it compare? I have no experience with it. Do you think it would be easy to convert to from the system that we have now? Would it have been harder to convert to from OSGi? I am glad you are still looking at these other technologies I hope that you keep up on them and let us know periodically what is going on in that area.

(5) Have you installed something using Eclipse lately? I have. I just installed a tool for the Google App Engine. And guess what. I had to restart Eclipse in order for it to work properly. It only partially worked before restarting. Hot deployment using OSGi apparently doesn't always work out so well after all. I m not blaming OSGi, but you would be a fool to think it magically solves our problems, really reduces complexity, or lessens the amount we need to understand with respect to such issues. As developers, we are going to have to research and understand in depth all the issues that make hot deployment successful or not successful and provide guidelines to module developers IF we even decide that this is that desirable of a feature, regardless.

Eclipse is not OSGi. Do you think that when we introduce a bug into our Java code that it is the fault of Java? Your bad experience with poorly written modules in Eclipse is not my concern, I have used many plugins for eclipse that have worked just fine, and I have used several completely standalone applications built on the OSGi standard, the equinox framework, and the Eclipse Workbench bundles that work just fine.

The bottom-line is this. The decision to not go with OSGi was carefully thought through, whether you agree with it or not. You had your chance to make your case. You failed to persuade others. That should be the end of it. Certainly, using Bugzilla right before the 2.0 release to air your grievances about decisions that have already been made and that you participated in is simply not optimal.

David, I am the only one that still tries to argue with you on these kind of issues because I know that you have some training as a lawyer. You were trained to argue your case as either prosecutor or defense for one goal only, and that is to win your case to the satisfaction of a judge or jury. You have been trained to gloss over truths that hurt your case and expose every possible piece of evidence that helps your case. And this is what you do, to the point of making stuff up and saying things that just aren't true (like OSGi does not support overrides, when it does). I understand this and so I try my best not to get upset with you. But you should understand that in engineering, this is a very detrimental approach. The goal of a design engineer is to understand all of his options as best as he possibly can in the time that he has. He must then make a choice from those options often without knowing all the details or exactly what will happen in the future. Often he must rely on his experience and use his "gut feeling" to make a decision. If he finds out later that his choice is a good one, that solves his problem, then he continues down that path. If his choice does not solve the problem then he must accept that another option should be followed. There is no prosecution or defense here and the only jury are the users of the system

You have worked extremely hard creating a very good build system and streamlining the way in which developers build Kepler. It has also allowed us to come a long way in dividing up our codebase and managing our dependencies in a top-down fashion. I really want to commend you on all that it has achieved. Like any system it has it's quirks but it works very well. The issue that has been poorly addressed is the runtime side of the module system. This is what we need to think about for the future. How can we get our modules to run smoothly at runtime such that a User can download a module and have it run reliably with their Kepler instance? Our module system kind of fakes that at the moment by performing classpath magic on restart (as I understand it) which leaves the system completely open to collisions (aka unintentional overrides). How does it address the 6 "Properties of a true module system" that Mirko outlined in his article? How can we get our system to perform robustly in that kind of plug and play environment? Is that something you want to develop from scratch?

Actions

Copy link

#10