Optimization: SQL Indexes¶

Like any database PGD relies on SQL Indexes for improved performance. This is a description of indexes used, and some that didn’t work.

Please see the section on Memory Table Indexes for more information about the types of indexes used with memory tables.

Protein¶

Primary Key Index¶

The primary key index is used when specific proteins have been selected by Code (primary key)

Resolution Index¶

The index on resolution is used in most cases. It filters large sets of proteins. The default query, with resolution <= 1.2 reduces the number of proteins from 16,000 to 2500.

As the number of proteins nears the total number of proteins MySQL will switch to performing a full table scan. Even with indexes on other fields it does not appear to use them.

Failed Indexes¶

We also attempted to create indexes with resolution and other fields. No noticeable increase was detected, MySQL always opted for the individual Resolution Index.

Protein Joined to Residue¶

When joining a Residue with its Protein an index on Residue.protein_id is used

Failed Indexes¶

We attenmpted to add additional fields to the protein_id index. It was actually slower than the protein_id index alone.

Residue Joined to Residue¶

Residues are joined to Residues for the previous and next relationships using the Primary Key index on Residue.

Note on Join Direction for Previous and Next¶

Residues join from residue_0.next to residue_1.id

SELECT * FROM pgd_core_residue r0 INNER JOIN pgd_core_residue r1 ON (r0.next = r1.id)

instead of residue_0.id to residue_1.prev

SELECT * FROM pgd_core_residue r0 INNER JOIN pgd_core_residue r1 ON (r0.next = r1.id)

The latter appeared to be a faster query but is not possible with Django. The custom clause requires adding the where clause with queryset.extra(). But django will also add the original clause.