|
Server configuration attributes are intended to be used by developers and administrators for many purposes, such as configuration, monitoring, etc. These attributes are mere name-value pairs. Some of the pairs actually impact the performance of the network; e.g., hashSpanFix: AllowBalance value of false means that the particular server's data directories will not auto balance.
Each server has its own set of name-value pairs (or attributes as I will frequently call them throughout this document). Adding, modifying or deleting any attributes will only effect the particular server, if the change impacts the server at all.
Note that some attributes can be editted (see the section on permissions). To save the attributes, though, the user must have permission to set configuration to the specific server, which is enforced by our public/private key security model. This is handled automatically when a user signs in. Any attempt to save attributes by a user with insufficient permissions will simply fail.
Any user, however, can easily start a server or their own network, and they will automatically be able to set the configuration for their server(s). For more information, see Setting up a Server.
Skip ahead to Directions & Screenshot to see how this server attributes are accessed and modified.
Our configuration allows administrators to add, edit and remove simple name value pairs (with some exceptions: see the section on permissions). These can be arbitrary, as I could easily add my name to the configuration, assuming I have write permissions for a server's configuration:
This wouldn't serve much of a purpose, but it would appear in the Configuration anytime I or anyone else opens it.
However, say I added the following:
That value is used by various separate interfaces to show a name for the
server, such as our Google Maps page. Without this value, we would not be able
to provide this value, and would have to use a less friendly Tranche url, which
is tranche://141.214.182.209:443.
Consider the following:
This value actually impacts server-side performance! This effectively disables the server-side service for that particular server that grabs chunks from other servers that it thinks it should have, as well as repairing broken chunks and performing other important server-side functionality.
Any user with write permissions to the network (i.e., anyone who can log in) can open up a server configuration, though only certain users (our admins) can save any changes to the configuration. Any user can edit the configuration for any server on their computer (see the screenshot — it is a very simple editor), but unless they have permissions to save the configuration, any attempt to save will fail.
If you start a server of your own, you will be able to save changes to the configuration. This is because the user who starts a server is automatically an admin. (An admin can also add other admins. See Setting up a Server.)
However, there is another layer of permissions, as individual attributes can have the following server attributes. (Even an admin cannot bypass these permissions, though a developer can modify permission rules between releases.):
Again, for users who cannot save changes to the configuration, the most that can be done is simply read; these permissions apply primarily to admin. However, hidden applies to both admin and non-admin.
To view the attributes of a local server (one on your computer):
To view the attributes of a remote server (one on another computer — this is how we generally administrate servers):
You can right-click (or control-click) on attributes to edit or remove them, and you can click on Attributes Menu > Add Attribute (scroll to top of attributes tab) to add a new attribute.
As we illustrated earlier, attributes play different roles. E.g.,
The following explain the default values. Note any can be changed if the user has the proper admin privileges. (If you start your own server, you have the proper admin privileges.)
| Name | Default value | Description |
|---|---|---|
| actualBytesUsed: /path/to/directory | Varies | Displays the number of bytes currently residing in a data directory used by this server. (Each server has one or more data directories, which can be on one or more disks or partitions.) If this server is using more than one directory, there will be multiple attributes of this kind. Example value: 107374182400 if a data directory has exactly 100GB. |
| currentlyConnectedUsers | Varies | Displays the number of internet connections the server is currently servicing. These represent concurrent users (either actual end-users or services such as backup and monitoring tools). |
| dataBlockUtil: KnownDataFileCount | Varies | The number of data chunks that are loaded into the server. These are the Tranche file system's way to save and retrieve data, and do not necessarily match the number of files that were uploaded or can be downloaded. |
| dataBlocUtil: KnownMetaDataFileCount | Varies | The number of meta data chunks that are loaded into the server. These are the Tranche file system's way to restore the original files from data chunks, as well as assemble the project's files back into a project. |
| dataBlockUtil: MergeQueueSize | Varies | The current size of the queue (line) of files that need to be merged back into the Tranche file system. This will generally be zero unless there are current or recent uploads, and the Tranche file system is cleaning up. This will only be non-zero when the file system is doing something called splitting (similar to how self-balancing trees split data when they are unbalanced). |
| flatFileTrancheServer: KnownProjectCount | Varies | The number of meta data chunks on the server that represent a project. Note that this does not reflect the number of total projects that can be downloaded from that server alone, which varies along many dimensions. This is mostly intended for internal use, but can be used to make an imperfect, rule-of-thumb observation about the percentage of the network on the server. |
| freeMemory | Varies | The number of free bytes of memory available to the server at any given time. This changes rapidly in real-time on the server. This can be useful for tracking down memory issues to alleviate swap or find conditions that cause the server to run out of memory. |
| hashSpanFix: AllowDelete | true | Setting to false disables the delete feature in the server's healing thread. A server will only delete a chunk if the server has a hash span and the chunk is not in that hash span but has enough copies on the overall network to remain available after a delete. |
| hashSpanFix: AllowRun | true | Setting to false disables the healing thread completely, meaning that the server will not repair its chunks nor look for chunks to download or delete. This is a safety switch to allow an admin to stop behavior that she thinks might cause problems on the network. This also allows the admin to stop the healing thread if they think it requires too many resources. |
| hashSpanFix: DataCopied | Varies | The number of data chunks this server has downloaded based on its hash span(s). A server without a hash span will not copy data, since the hash span is intended to mirror a slice of the network. |
| hashSpanFix: DataDeleted | Varies | The number of data chunks this server has deleted based on its hash span(s). Data is deleted from a server if there are enough copies on the network and the data does not fit in one of the server 0+ hash spans belonging to the server. (A server without a hash span will eventually delete everything if their are sufficient copies elsewhere.) |
| hashSpanFix: DataLocalChunkThrewException | Varies | The number of exceptions (in this case, errors) thrown when loading the data chunks. Unless there are I/O errors, this should remain zero. In the case of a non-zero value, there may be hardware (or software) issues that need to be addressed. This is used to monitor the servers to maintain data integrity and availability. |
| hashSpanFix: DataNotRepaired | Varies | This is a complex value, and a non-zero value is not necessarily bad (though zero is generally good). Every time the healing thread finds a bad chunk, it tries one server at a time to find a copy of the chunk. If it finds a good copy, it replaces the chunk. If it finds another bad copy, it increments this value by one to signify that another chunk was not replaced. However, this is normal if multiple bad chunks exist. As long as one good chunk exists, it will eventually be replaced. (We specifically test this with unit tests and stress tests.) |
| hashSpanFix: DataRepaired | Varies | A zero or non-zero value is not necessarily bad. If a bad chunk is replaced by a good copy of the chunk from another server in the server's healing thread, this value is incremented by one. In conditions where we know there are a lot of bad chunks on the network, this helps us know which servers are repairing themselves (along with hashSpanFix: DataNotRepaired). |
| hashSpanFix: DataSkipped | Varies | The number of chunks the server finds it already has when trying to mirror a slice of the network specified by its hash span. While the server does not have a hash span (which could be indefinite), this number will not change. |
| hashSpanFix: GlobalIterationCount | Varies | The total number of times that the healing thread has performed all of its operations once, including adding new data and meta data, deleting irrelevant data and meta data, and repairing any broken data and meta data. Depending on the amount of data and meta data on a server, a single iteration may take a long while, from hours to days, or even weeks or months. |
| hashSpanFix: GlobalLocalRepairCompleteIterations | Varies | In the case that the server does not have a hash span, this is the total number of times that the healing thread has checked all of its data and meta data for bad chunks and replaced any with any good copies it might find. This process continues as long as the server is running since bad chunks can appear (albeit rarely) at any time under any circumstances that involve I/O errors. |
| hashSpanFix: MetaDataCopied | Varies | The number of meta data chunks this server has downloaded based on its hash span(s). A server without a hash span will not copy any data (meta data or data), since the hash span is intended to mirror a slice of the network. |
| hashSpanFix: MetaDataDeleted | Varies | The number of meta data chunks this server has deleted based on its hash span(s). Meta data is deleted from a server if there are enough copies on the network and the data does not fit in one of the server 0+ hash spans belonging to the server. (A server without a hash span will eventually delete everything if their are sufficient copies elsewhere.) |
| hashSpanFix: MetaDataLocalChunkThrewException | Varies | The number of exceptions (in this case, errors) thrown when loading the meta data chunks. Unless there are I/O errors, this should remain zero. In the case of a non-zero value, there may be hardware (or software) issues that need to be addressed. This is used to monitor the servers to maintain data integrity and availability. |
| hashSpanFix: MetaDataNotRepaired | Varies | This is a complex value, and a non-zero value is not necessarily bad. Every time the healing thread finds a bad chunk, it tries one server at a time to find a copy of the chunk. If it finds a good copy, it replaces the chunk. If it finds another bad copy, it increments this value by one to signify that another chunk was not replaced. However, this is normal if multiple bad chunks exist. As long as one good chunk exists, it will eventually be replaced. (We specifically test this with unit tests and stress tests.) |
| hashSpanFix: MetaDataRepaired | Varies | A zero or non-zero value is not necessarily bad. If a bad chunk is replaced by a good copy of the chunk from another server in the server's healing thread, this value is incremented by one. In conditions where we know there are a lot of bad chunks on the network, this helps us know which servers are repairing themselves (along with hashSpanFix: DataNotRepaired). |
| hashSpanFix: MetaDataSkipped | Varies | The number of chunks the server finds it already has when trying to mirror a slice of the network specified by its hash span. While the server does not have a hash span (which could be indefinite), this number will not change. |
| hashSpanFix: NumRepsInHashSpanRequiredForDelete | 1 | If a server finds a chunk (data or meta data) that does not belong (either doesn't fit in one of its hash spans or it doesn't have a hash span), it checks various conditions to determine whether the chunk should be deleted. This particular value is the number of required copies of the chunk in other servers that have a hash span that includes that chunk. (I.e., with the default value of one, at least one other server must have a hash span that holds that chunk AND that server must already have a copy of the chunk). Increasing this value results in more stringent criteria before a deletion will occur, and lessening it (not recommended) results in less stringent criteria for a deletion. |
| hashSpanFix: NumTotalRepsRequiredForDelete | 3 | If a server finds a chunk (data or meta data) that does not belong (either doesn't fit in one of its hash spans or it doesn't have a hash span), it checks various conditions to determine whether the chunk should be deleted. This particular value is the number of other copies of this chunk must exist before the chunk is deleted, regardless of hash spans. (I.e., with the default value of three, at least three other servers must currently have a copy of the chunk before considering a deletion.) Increasing this value results in more stringent criteria before a deletion will occur, and lessening it (not recommended) results in less stringent criteria for a deletion. |
| serverURL | Varies |
The Tranche URL of the server. Example value: tranche://141.214.182.209:443, which is one of the UM Biological Chemistry servers from the Phillip Andrews Lab. |
| totalMemory | Varies | The total amount of primary memory (i.e., RAM) available to the server. If a server runs out of memory, an OutOfMemoryError will occur, and the server might crash. |
| hashSpanFix: GlobalServerBeingChecked | Varies | The remote server with which the healing thread for the local server is currently communicating for downloading data or meta data to the local server, or for some other activity. (In essence, the remote server from which the particular server is currently pulling data.) This will be Unknown if not communicating with another server, which happens under a variety conditions. |
| coreServer: mirrorEveryUpload | Varies | If set to true, the server will mirror every upload to the Tranche core Proteomics network. (Actually, every upload is sent to the server by the client's interpretation of this value. If a client cannot contact the server, it will not receive a copy of the chunk until (acutally, unless) the healing thread downloads a copy. For this reason, it is not an actual mirror. |
| hashSpanFix: GlobalLocalDeleteCompleteIterations | Varies | The number of complete iterations that the healing thread has performed checking the server's chunks for data and meta data to delete, based on the configuration settings. All servers try to download data in their hash span(s) as well as delete chunks not in their hash spans (under ideal conditions and contingent on the various server configuration attributes). A server without any hash spans ideally deletes all of its chunks when they have sufficient copies. |
| hashSpanFix: BatchSizeForDownloadChunks | 50 | The size of a batch of chunks to check for potential new chunks in a servers hash span. One of four values that compete for healing thread's time: downloading, deleting, balancing and healing. |
| hashSpanFix: BatchSizeForDeleteChunks | 50 | The size of a batch of chunks to check for potential chunks to delete from a server if not in the server's hash span and if there are sufficient copies. One of four values that compete for healing thread's time: downloading, deleting, balancing and healing. |
| hashSpanFix: BatchSizeForHealChunks | 50 | The size of a batch of chunks to check for chunks to repair (e.g., the chunks were corrupted). One of four values that compete for healing thread's time: downloading, deleting, balancing and healing. |
| hashSpanFix: BatchSizeForBalanceDirectories | 50 | The number of attempts to balance the data directories. This plays the important role of maintaining a relatively proportional use of a server's various data directories. (E.g., if a new data directory is added, it will be empty, even if another directory is full. This will slowly but steadily move data to less full directories.) One of four values that compete for healing thread's time: downloading, deleting, balancing and healing. |
| hashSpanFix: CurrentActivity | Varies | The current activity handled by the healing thread. This should generally involve deletion, downloading, healing or repairing, though it might mention the fact that it is preparing for startup or taking care of housekeeping duties. |
| hashSpanFix: TimeSpentDeleting | Varies | Time (actual time and percentage) spent by healing thread looking for chunks to delete. |
| hashSpanFix: TimeSpentDoingNothing | Varies | Time (actual time and percentage) spent by healing thread doing ancilliary duties, such as waiting for startup. |
| hashSpanFix: TimeSpentDownloading | Varies | Time (actual time and percentage) spent by healing thread downloading chunks that belong on this server. |
| hashSpanFix: TimeSpentHealing | Varies | Time (actual time and percentage) spent by healing thread looking for corrupted chunks to heal. |
| hashSpanFix: TimeSpentHealing | Varies | Time (actual time and percentage) spent by healing thread balancing the server's data directories. |
| hashSpanFix: PauseInMillisAfterEachOperation | 100 | If want to free up resources by slowing down the healing thread, can set a value (in milliseconds) to pause between operations. (This is a candidate option that might be auto adjusted after develop a server performance monitor daemon.) |
| remoteTrancheServer: ServerSideCallbacks | Varies | Human-readable message about the number of callbacks, as well as two servers causing majority of callbacks. Helpful for determining troublesome servers on network. (These not only slow down server-side operations, such as the healing thread, but can slow down client-side operations, such as uploading and downloading projects.) |
| flatFileTrancheServer: TotalSize | Varies | The total human-readable size available to the server as a sum of all the limits in the DataDirectoryConfiguration objects. (I.e., sum of all available space in configured data directories.) If overflow is detected, will be noted. |
| flatFileTrancheServer: TotalSizeUsed | Varies | The total human-readable size used by the server as a sum of all the used space in the DataDirectoryConfiguration objects. (I.e., sum of all used space in configured data directories.) If overflow is detected, will be noted. |
| flatFileTrancheServer: TotalSizeAvailable | Varies | Human-readable space remaining in server when subtract value of flatFileTrancheServer: TotalSizeUsed from flatFileTrancheServer: TotalSize. If overflow is detected, will be noted. |
| dataBlockUtil: MergedDataBlocksTotal | Varies | The number of data blocks that have been merged. This is the process by which the B-tree splits a leaf to become a node, and is mostly useful for debugging purposes. Note this may not be the sum of dataBlockUtil: MergedDataBlocksSuccesses and dataBlockUtil: MergedDataBlocksFailures if a failure occurs during a specific portion of the code. |
| dataBlockUtil: MergedDataBlocksSuccesses | Varies | The total number of successfully merged data blocks. See dataBlockUtil: MergedDataBlocksTotal for more information. |
| dataBlockUtil: MergedDataBlocksFailures | Varies | The total number of failed data blocks merges. This could happen if the data block were corrupted, there are permission problems, or any other IO problems. Tranche servers are programmed to deal with this inevitability, and a small number of these should not cause alarm. However, if there are a large number (say, over 50), you should contact the development team for an investigation. See dataBlockUtil: MergedDataBlocksTotal for more information. |
| corruptedDataBlock: CorruptedDataBlockCount | Varies | The total number of corrupted data blocks detected on the server. Corrupted data blocks may happen under normal circumstances, such as a "hard shutdown" (whenever process is killed), but may also occur when I/O errors (which might be sign of failing hardware). |
| corruptedDataBlock: SalvagedChunksFromCorruptedBlockCount | Varies | The total number of chunks recovered from all corrupted data blocks by salvaging intact data. If something is not salvaged, it might be downloaded. |
| corruptedDataBlock: DownloadedChunksFromCorruptedBlockCount | Varies | The total number of chunks recovered from all corrupted data blocks by downloading the missing data from other servers. Anything not salvaged or downloaded is considered lost. However, salvaging and downloading should hopefully prevent lost data. (However, if information is lost from the header of a data block, it cannot be retried. The healing thread should make up for this sort of loss over time.) |
| corruptedDataBlock: LostChunksFromCorruptedDataBlockCount | Varies | If chunks from corrupted data blocks are not salvaged or downloaded, they are considered lost. There should not be many (if any) lost chunks, as salvaging and downloading combined should reconstitute the majority of lost data. The healing thread should make up for any other type of loss over time. |
| corruptedDataBlock: CorruptedDataBlockHeaderCount | Varies | All corrupted data blocks must either be corrupted in either the header or body. This is the total count of data blocks corrupted in the header. This variety is more troublesome because there is no way to determine what was lost. The healing thread, however, should make up for this sort of loss over time. |
| corruptedDataBlock: CorruptedDataBlockBodyCount | Varies | All corrupted data blocks must either be corrupted in either the header or body. This is the total count of data blocks corrupted in the body. |
| corruptedDataBlock: AllowedToFixCorruptedDataBlock | true | Whether the server is allowed to fix corrupted data blocks. Because this feature deletes data block files, the switch to turn it off (or on again) is available for troubleshooting. |
| coreServer: isServerReadOnly | false | If set to true, then a server will not accept new chunks, nor will the healing thread download chunks in its hash span. This allows a server to share its existing data, even if an administrator does not want the server to accept any new data. |
| hashSpanFix: AllowBalance | Currently undecided | If true, then the data directories will be balanced over time. This allows new (or under-used) data directories to assume data from other directories that are filling up. |
| hashSpanFix: RequiredPercentageDifferenceToBalanceDataDirectories | 15.0 | The required difference between a directory with a lot of used space and a directory with less. This is determined by calculating the percentage of space used by each and subtracting the values. Note that a higher value means less balancing will occur. A value at or near zero will simply shuffle data back and forth, which is non-productive. A good balance prevents non-productive shuffling and non-shuffling. |
| hashSpanFix: RequiredPercentageForMostUsedDataDirectoryBeforeBalance | 60.0 | This is the percentage of a directories total space that must be used before it is a candidate for shuffling. This prevents early shuffling which may server little purpose. Since a new server (or server that holds data only for short periods of time, like those without hash spans) do not need shuffling before a certain directory reaches a certain size, there is no need to shuffle. |
| hashSpanFix: TotalDataBlocksMovedToBalance | varies | The total number of data blocks shuffled as a result of a servers balancing. Balancing is a slow process by design, and it takes resources. There is little reason to balance quickly (even moving one data blocks frees an average of 50MB, which means that a new upload could proceed for a while). However, a large number will result as the server accumulates enough data. (Unless the directories are well balanced by chance, which might not be uncommon.) |
* It is not a good idea to edit or remove an attribute unless you know what the impact will be. No changes will take place until the configuration is saved. Only some users can save configuration changes, and anyone who attempts to change them without proper permissions will get a simple message telling them that they do not have permissions to change the configuration.
The Tranche project has been modified extensively to accomodate the various failures that the developers and administrators have witnessed, like disk failures, disconnected servers, etc.
Each server on a Tranche network has a healing thread that runs as long as a server is running. This healing thread serves four primary functions:
Special case: If a server does not have a hash span, as a few smaller or older servers do not, they will only hold on to data until it is replicated sufficiently. These servers will delete (everything, if sufficient resources) and repair, but not download.
Note that these four activities compete for the healing thread's time (i.e., for in the parlance of programmers, they all run on a single thread). These were designed so that administrators can adjust the focus. (Eventually, we will design automatons to do this work intelligently.)
This is accomplished by assigning how much of work each will do in succession. Each of the four handles a batch of chunks at the same time. However, this value is changed if a user sets a different value for the following three configuration attributes:
So using the default values, every time fifty chunks are checked for new data to download to a server, 5 more chunks are checked to be deleted from a server if there are sufficient replications and the chunk doesn't belong, and 10 chunks are checked to see whether they were corrupted. Also, if there is sufficient disparity between use of data directories, up to two data blocks will be moved to mitigate the difference.
So why would we want to change these values while the server is running?
It is not possible to predetermine every scenario, but runtime adjustments allows the network administrators to respond to non-ideal circumstances and protect data.