Latest revision as of 00:36, 9 December 2009

Cache Format

General Information

The old engine cache is made up two types of files.

Data file

The data file holds all of the files in the cache and is named main_file_cache.dat. It is therefore very big, typically ~10-20 megabytes..

Index file

There are several index files, named main_file_cache.idx and then postfixed with a number. Each index file holds 'pointers' to where a file is located in the main cache. Each index file represents a type of file.

Format

Index file format

The index file is made up of 6 byte blocks which hold information about where a file can be located in the data file. The format of a single block is as follows:

tribyte fileSize
tribyte initialDataBlockId

Data file format

The data file is made up of 520 byte blocks. The format of each of these blocks is as follows:

short nextFileId
short currentFilePartId
tribyte nextDataBlockId
byte nextFileTypeId
byte[512] blockData

Explanation

An example will be used here as it is easier to follow.

Let us say, the client wishes to fetch file type 2, file id 17.

First off, it will open the main_file_cache.idx2 file and seek to the index 17 * 6 (102). It will then read two tribytes.

fileSize = 1200
intialDataBlockId = 4

The client will now open the main_file_cache.dat file and seek to the index 4 * 520 (2080). The values it reads will be:

nextFileId = 17
currentFilePartId = 0
nextDataBlockId = 5
nextFileTypeId = 2
blockData = ...

It will read the first 512 bytes of the file and then knows that there is 688 bytes left. Therefore, it has to read the next block.

nextFileId = 17
currentFilePartId = 1
nextDataBlockId = 6
nextFileTypeId = 2
blockData ...

It reads these next 512 bytes of the file and now knows that there are 176 bytes left. So for a final time, it will read the next block.

nextFileId = 18
currentFilePartId = 2
nextDataBlockId = 7
nextFileTypeId = 2
blockData = ...

It can ignore most of these values (the next ones are meaningless at this stage) and read the final 176 bytes. The whole 1200 byte file has now been read.

@@ Line 1: / Line 1: @@
 = Cache Format =
-v
+== General Information ==
+The old engine cache is made up two types of files.
+=== Data file ===
+The data file holds all of the files in the cache and is named '''main_file_cache.dat'''. It is therefore very big, typically ~10-20 megabytes..
+=== Index file ===
+There are several index files, named '''main_file_cache.idx''' and then postfixed with a number. Each index file holds 'pointers' to where a file is located in the main cache. Each index file represents a type of file.
 == Format ==
-itodag
+=== Index file format ===
+The index file is made up of 6 byte blocks which hold information about where a file can be located in the data file. The format of a single block is as follows:
+ tribyte fileSize
+ tribyte initialDataBlockId
 === Data file format ===
@@ Line 53: / Line 68: @@
 It can ignore most of these values (the next ones are meaningless at this stage) and read the final 176 bytes. The whole 1200 byte file has now been read.
-= Named Files =
-All the files in cache 0 have an archive-like format which allows named files (e.g. '''BADENC.TXT''' is a file which contains bad words in the '''wordenc''' archive).
-== Format ==
- tribyte uncompressedsize
- tribyte compressedsize
-If the uncompressed and compressed sizes are equal, the whole file is not compressed but the individual entries are compressed using bzip2. If they are not equal, the entire file is compressed using bzip2 but the individual entries are not.
-Also note, the magic id at the start of the bzip2 entries are not included in the cache. If you use an existing API to read the files and want to add this back, you must append the four characters: BZh1 before you decompress.
- short fileCount
-Each file entry has the format:
- int nameHash
- tribyte uncompressedSize
- tribyte compressedSize
-When you are looping through the files, you need to keep track of the file offset yourself. This psuedocode demonstrates how:
- int offset = buffer.getCurrentOffset() + numFiles * 10;
- for(int i = 0; i < numFiles; i++) {
-    // read values
-    int thisFileOffset = offset;
-    offset += thisFileCompressedSize;
- }
-To get a named file by its name, you should first hash the name using this method:
- public static int hash(String name) {
-    int hash = 0;
-    name = name.toUpperCase();
-    for(int j = 0; j < name.length(); j++) {
-        hash = (hash * 61 + name.charAt(j)) - 32;
-    }
-    return hash;
- }
-Then, loop through the file entries you loaded earlier to find a matching hash. Read the compressed file size from the offset. If the whole file is not compressed, you should decompress the individual entry.

Difference between revisions of "Cache Format"

Latest revision as of 00:36, 9 December 2009

Contents

Cache Format

General Information

Data file

Index file

Format

Index file format

Data file format

Explanation

Navigation menu

Page actions

Page actions

Personal tools

Navigation

affiliates

Search

Tools