I have now studied the most general file format of the .EUG files. This does not only apply to the .EUG files, but also to the event files, AI and POP files. It is entirely in what context the file is read which determines how the data gets interpreted. The format can at its simplest be described with the following:
Once '#' appears on a line, the rest of that line is discarded (that is until a new line character is detected).
Any number or text of undefined length is read until any delimiters are found. A delimiter is a white space, a tab, a new line, an equal sign, and the '{' and '}' signs. However, the equal sign and the '{' and '}' (that will hereby be incorrectly referred to as "parenthesis"), have special meanings although they do terminate the fields above. Therefore for general separation a space or tab must be used (or a new line if that's preferable). New lines only act as delimiters and do not disturb the general flow of text.
Each .EUG file contains a number of root nodes. That is, everything does not have to start from the same node. A node is defined as something which contains either other nodes or a data element array within it. Its syntax is: "node_name = data_element" or "node_name = { data_element_array/node_names }". A valid name for a node begins with an alphabet character (case does not matter), and may include an underscore character '_'. A name may not include a number or any other character however.
A data element is what is actually used by the game; nodes just tell where the data belong. Data might come in a single piece (when directly following the equal sign after the node), or as a part of an array. Data either is a number, that may include a period decimal sign '.', or a defined or undefined length string. The difference between the two string types is that an undefined length string goes on until a general delimiter is encountered, while a defined length string starts with a quote sign ' " ' and ends with another one. In addition to being declared differently these string types are also used differently in the game engine; UL strings are often (if not always) read and then used as constants that equal other things than the actual text data. DL strings however are used in such a way that the text data they contain are used directly within the engine. As such, UL strings have not been observed as case sensitive, while it often matters to a DL string. A DL string may also contain numbers and any other character until the ' " ' is reached (new lines though are forbidden).
That is about it. I may be wrong about certain details (especially what valid names are for nodes and what characters are valid). It can be easily correct when we learn more about the file format however. Even though everything looks complex it is not so complex that is looks as can be seen above.
Now for the question how I'll implement my test version of this (and how you may implement the Java version). Probably the file itself will have to be represented as an object, which can be saved or loaded from. A loading function would read through the file and create a tree like structure, where each branch would represent a number of nodes or data elements. If only one data element was found, that would mean no parenthesis had been found. A corresponding saving function would then save the contents of this tree to a visually pleasing file with indents for every new level of the tree.
So how would one connect this generic structure to the game data? Well, we could add a validation object (which also could be loaded from a file we've defined ourselves). It would create a pseudo-tree structure with a list of valid nodes and in some cases data elements for each node. A function would then be added either in this object or the file object to check whether the loaded data was valid or not. A clever communication system between both objects would mean that we could look up what node or data element to use for a certain thing we wanted to change in the validation object, and then make the modification with the file object. Of course the different validation objects would have to be created for each game (EU2, Victoria), and different file objects could be created to load different files at the same time.
I find the idea of writing this loader very intriguing, and I'll start tomorrow when my head is no as dizzy as it is now.