.soy

pysoy's data format

<!> This document is a draft and under active development

Concept & Purpose

There are many formats for 3d mesh data, textures, sounds, etc. Some of these formats aim to become industry standards, some only used by one modeling program or one engine, but none organize data in the same manner which PySoy uses internally nor do they allow the kind of linking which makes PySoy development so easy.

In this format we want many different types of data and parameters able to be combined in the way it's used. For example, a game character may have the base 3d mesh, several morph targets, vertex groups with an armature, this armature with bones each with their own mass and connected by joints, low-level behavior for animation, one or more textures for the character's skin, voice samples and other sound effects, and possible more data in the form of extension classes.

Many formats would provide these in different files. It's common, for example, to provide textures in separate PNG, BMP, or TIFF files which are reference by filename by the mesh data. These formats, while useful in that they're supported by many editors, are for images not textures. They do not support 3d textures, for example.

Another issue taken into account is downloading data from a network. Muxing different types of data together in a progressive manner would allow full objects to be used before it's fully transfered. With larger objects, such as the animated character example above, it would make sense to prioritize the mesh data and a low-scale texture with the behavior and animation, then provide the higher scale texture and voice samples later in the file, so while still incomplete the object can still be displayed (in the distance).

soy.File API

Objects of type soy.transports.File are created with a single attribute, the path of the file to be handled. IE:

>>> tree = soy.transports.File('pinetree.soy')

Instances of soy.transports.File behave akin to Python dictionaries. Object names are available as keys and their values are instances of those objects. IE:

>>> tree['needlecolor']
<Color #228b22>

New objects can be added to the file by adding them to the dictionary by name, IE:

>>> tree['barkcolor'] = soy.colors.BurlyWood()

Changes a file's objects will be saved when the .save() method is called.

Soy Stream Header

The first 3 bytes of the stream should read "soy" followed by a char identifying the major PySoy version. This can be used as magic to identify the file.

Following that is an unsigned integer (32-bits) are two strings identifying the class name and object name respectivly.

The major version of this format version is 0. Decoding should not be attempted for an unsupported major version.

char[3] "s", "o", "y"
uchar   Major Version (0)
uint    Number of Objects, for each:
  uchar   Class Name length (0-255 in bytes)
  char[X] Class Name
  uchar   Object Name length (0-255 in bytes)
  char[X] Object Name

Concept & Purpose

Here's the goals for .soy:

  • A simple, fast data storage object for PySoy
    • 32-bit word aligned for fast int/float reading
    • Data arranged as it's used internally
    • Stored pre-optimized for immediate use
  • Multiple Objects/File
    • Allows bundling "complete" objects, ie, mesh w/ textures
    • Files can be quickly scanned for object types and names
  • Backwards/Forwards Compatable
    • New data types and parameters can be easily added
    • Objects can be partially decoded w/ new features ignored
  • 3rd Party Extendable
    • Cross-game object classes (special effects/etc) can be easily added
    • Game implementors may include data for their classes

Data Types

Char8-bit number/id
Str#String of specified length
UI1616-bit unsigned "short" integer (check short length)
UI3232-bit unsigned integer (check int length)
Float 32-bit float

Objects

Each object consists of an 64-byte header followed by zero or more data blocks. Each object must be immediatly followed with the start of the next object header (identified by the "soy" magic) or end-of-file (EOF). Any other condition must result in discarding the preceeding object as this indicates either file corruption or a broken exporter/editor was used on it. A reader may scan ahead until the next "soy" magic and attempt to continue decoding.

This format allows multiple objects to be combined into a single file by "cat file1.soy file2.soy file3.soy >all.soy".

Object Header

Each object begins with an 64 byte header. The 3-byte "soy" magic can be used both to identify the file type and recover corrupted files. The version may be considered part of the magic, ie, "soy\x01" for PySoy v1. Unknown versions should not be decoded as the header format may change. The version of this format version is "0".

Str3  "s", "o", "y"
Char  PySoy Version (Major)
UI32  Object Size (in bytes, not including 64-byte header)
Str36 Object Type (0-35 character string, null-padded, null-terminated)
Str20 Object Name (0-19 character string, null-padded, null-terminated)

Version and Type

Standard PySoy types begin with the class name "soy.". Extended types, those provided by add-on modules or games, are labeled by the name of the class implementing the load and save functions expecting this data.

The load function is sent the Version as it's first argument. This is useful when the version is from an earlier PySoy version for, potentially, loading an older version of the same class.

Object Blocks

Each data block has it's own 12 byte header. The block type is a label passed to the load function responsible for this type. Unknown block types of known object versions should be silently ignored.

UI32  Block Size
Str8  Block Type

Data Types

Mesh Data

The IDs are divided into the following groups:

0 - 31 Attributes of Vertices
32 - 63 Representations of Vertex Data
64 - 95 Attributes of Faces (beside vertices)
96 + Faces and other top-level data

Positions (ID: 0)

UInt  number_of
... (
Float x
Float y
Float z
)

Normals (ID: 1)

UInt number_of
... (
Float x
Float y
Float z
)

UV Coords (ID: 2)

UInt number_of
... (
Float u
Float v
)

Vertices (ID: 32)

UInt number_of
... (
UInt Position
UInt Normal
Map32 Bitmap for optional attributes
UInt Color
...
)

Materials (ID: 64)

UInt number_of (0 = None)
... (
Str20 Material Name
)

Faces (ID: 96)

UInt number_of
... (
UInt  vertex1
UInt  vertex2
UInt  vertex3
Map32 Bitmap for additional face attributes (materials, etc):
UInt  materialindex
UInt  uvcoord1
UInt  uvcoord2
UInt  uvcoord3
...
)

Entities Data

Mesh (ID: 1)

Str20 meshname

Matrix (ID: 2)

Floats m00 m10 m20 m30
Floats m01 m11 m21 m31
Floats m02 m12 m22 m32
Floats m03 m13 m23 m33

Float properties (ID: 256)

UInt number_of
... (
Str32 name
Float value
)

Int properties (ID: 257)

UInt number_of
... (
Str32 name
UInt value
)

Bool properties (ID: 258)

UInt number_of
... (
Str32 name
UInt value
)

Timer properties (ID: 259)

UInt number_of
... (
Str32 name
Float value
)

Level/Node Data

Entities (ID: 1)

UInt number_of
... (
Str20 objectname
)

Nodes (ID: 2)

UInt number_of
... (
Str20 objectname
)

Matrix (ID: 3)

Floats m00 m10 m20 m30
Floats m01 m11 m21 m31
Floats m02 m12 m22 m32
Floats m03 m13 m23 m33

Float properties (ID: 256)

UInt number_of
... (
Str32 name
Float value
)

Int properties (ID: 257)

UInt number_of
... (
Str32 name
UInt value
)

Bool properties (ID: 258)

UInt number_of
... (
Str32 name
UInt value
)

Timer properties (ID: 259)

UInt number_of
... (
Str32 name
Float value
)

Material Data

Attributes (ID: 0)

Float diffuse_r
Float diffuse_g
Float diffuse_b
Float specular_r
Float specular_g
Float specular_b
Float specularity (shininess)

Texture (ID: 1)

UInt number_of
... (
UInt texture_type  (0:COL, 1:NOR, 2:CSP, 3:CMIR, 4:REF, 5:SPEC, 6:EMIT, 7:ALPHA, 8:HARD, 9:RAYMIR, 10:TRANSLU, 11:AMB, 12:DISP)
Map32 name_map
Str32 image_name_part1  (This is only a suggestion. This will allow for filenames with an upper limit of 255 characters.)
Str32 image_name_part2  (We could also just do a 256 char field. In this context, I'm not convinced that always reserving a 255)
Str32 image_name_part3  (character space for filenames is that bad.)
Str32 image_name_part4
Str32 image_name_part5
Str32 image_name_part6
Str32 image_name_part7
Str32 image_name_part8
)

Authors

Attachments