Yu-Chin Juan, the author of FFM, has open-sourced the C++ version of the code libffm on GitHub. Since the daily data processing is in a Python environment, expect to find a Python version of FFM. Related projects on Github There are many on Github, such as this one: A Python wrapper for LibFFM.
Installation of libffm in Windows+Anaconda environment
Installation of libffm-python package
The project is installed on Windows as follows.
- Download the project locally and unzip it.
- Install the mingw32 environment. conda install mingw32
- Add mingw32 path to environment variable PATH: C:\RBuildTools\3.5\mingw_32\bin
- Modify the compilation settings in Python. D:\ProgramData\Anaconda3\Lib\distutils\distutils.cfg If you don’t have this file then create it yourself, add the content as.
1
2
|
[build]
compiler=mingw32
|
- Execute: python setup.py install in the project directory
However, when using it, the following error is reported.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
|
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-2-244abf364e9b> in <module>
----> 1 import ffm
D:\ProgramData\Anaconda3\lib\site-packages\ffm-7e8621d-py3.6-win-amd64.egg\ffm\__init__.py in <module>
----> 1 from .ffm import FFMData, FFM, read_model
D:\ProgramData\Anaconda3\lib\site-packages\ffm-7e8621d-py3.6-win-amd64.egg\ffm\ffm.py in <module>
70 FFM_Problem_ptr = ctypes.POINTER(FFM_Problem)
71
---> 72 _lib = ctypes.cdll.LoadLibrary(get_lib_path())
73
74 _lib.ffm_convert_data.restype = FFM_Problem
D:\ProgramData\Anaconda3\lib\ctypes\__init__.py in LoadLibrary(self, name)
424
425 def LoadLibrary(self, name):
--> 426 return self._dlltype(name)
427
428 cdll = LibraryLoader(CDLL)
D:\ProgramData\Anaconda3\lib\ctypes\__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
346
347 if handle is None:
--> 348 self._handle = _dlopen(self._name, mode)
349 else:
350 self._handle = handle
OSError: [WinError 87] 参数错误。
|
The main reason is that the libffm.so file was not compiled and generated during the installation on Windows. The installation failed.
Compilation of Libffm on Windows
Since I had problems with the Python package, I thought I would compile it directly using the C++ version of the code. After reading the project description, only v1.21 of libffm supports Windows environment:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
Building Windows Binaries
=========================
The Windows part is maintained by different maintainer, so it may not always support the latest version.
The latest version it supports is: v1.21
To build them via command-line tools of Visual C++, use the following steps:
1. Open a DOS command box (or Developer Command Prompt for Visual Studio) and go to LIBFFM directory. If environment
variables of VC++ have not been set, type
"C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64\vcvars64.bat"
You may have to modify the above command according which version of VC++ or
where it is installed.
2. Type
nmake -f Makefile.win clean all
|
Follow the above procedure to install, the first error encountered: “nmake” cannot be found
1
2
3
4
5
6
7
|
nmake : 无法将“nmake”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包括路径,请确保路径正
确,然后再试一次。
所在位置 行:1 字符: 1
+ nmake -f Makefile.win clean all
+ ~~~~~
+ CategoryInfo : ObjectNotFound: (nmake:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
|
The initial solution was to add the directory where “nmake” is located to the environment variable PATH. However, the error is still reported after execution, and this time the main error is that the referenced file cannot be loaded.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
PS E:\Download\libffm-121> nmake -f Makefile.win clean all
Microsoft (R) Program Maintenance Utility Version 14.00.24210.0
Copyright (C) Microsoft Corporation. All rights reserved.
erase /Q *.obj *.exe windows\.
rd windows
mkdir windows
cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp -c ffm.cpp
ffm.cpp
ffm.cpp(21): warning C4068: unknown pragma
ffm.cpp(22): fatal error C1034: algorithm: no include path set
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin\cl.exe"' : return code '0x2'
Stop.
|
After searching online, I found that the water of setting environment variables in VC++ is still deep, you need to add PATH, LIB and INCLUDE. The main reason is that ucrt is added in VS2015, so it needs to introduce Windows 10 SDK, and uuid.lib has to be found in Windows 8.x SDK, so it is still quite troublesome to configure.
- PATH C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin;C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE
- LIB C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\lib;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.10240.0\ucrt\x86;C:\Program Files (x86)\Windows Kits\10\Lib\10.0.10240.0\ucrt\x86 Program Files (x86)\Windows Kits\8.1\Lib\winv6.3\um\x86
- INCLUDE C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\include;C:\Program Files (x86)\Windows Kits\10\Include\10.0.10240.0\ ucrt
Adjust the specific path accordingly according to the location of your installation. After finishing, execute it again to compile successfully. As follows, only a few warning messages appear.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
PS E:\Download\libffm-121> nmake -f Makefile.win clean all
Microsoft (R) Program Maintenance Utility Version 14.00.24210.0
Copyright (C) Microsoft Corporation. All rights reserved.
erase /Q *.obj *.exe windows\.
rd windows
mkdir windows
cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp -c ffm.cpp
ffm.cpp
ffm.cpp(21): warning C4068: unknown pragma
cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp -c timer.cpp
timer.cpp
cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp ffm-train.cpp ffm.obj timer.obj -Fewindows\ffm-train.exe
ffm-train.cpp
ffm-train.cpp(1): warning C4068: unknown pragma
cl.exe /nologo /O2 /EHsc /D "_CRT_SECURE_NO_DEPRECATE" /D "USEOMP" /D "USESSE" /openmp ffm-predict.cpp ffm.obj timer.obj -Fewindows\ffm-predict.exe
ffm-predict.cpp
|
After compilation, a new windows folder will be created under the source folder and 2 exe files will be generated.
- ffm-predict.exe
- ffm-train.exe
Use of ffm-train.exe and ffm-predict.exe
The simpler method is to call it directly from the command line, using the method described in the project documentation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
|
Command Line Usage
==================
- `ffm-train'
usage: ffm-train [options] training_set_file [model_file]
options:
-l <lambda>: set regularization parameter (default 0.00002)
-k <factor>: set number of latent factors (default 4)
-t <iteration>: set number of iterations (default 15)
-r <eta>: set learning rate (default 0.2)
-s <nr_threads>: set number of threads (default 1)
-p <path>: set path to the validation set
--quiet: quiet model (no output)
--no-norm: disable instance-wise normalization
--auto-stop: stop at the iteration that achieves the best validation loss (must be used with -p)
By default we do instance-wise normalization. That is, we normalize the 2-norm of each instance to 1. You can use
`--no-norm' to disable this function.
A binary file `training_set_file.bin' will be generated to store the data in binary format.
Because FFM usually need early stopping for better test performance, we provide an option `--auto-stop' to stop at
the iteration that achieves the best validation loss. Note that you need to provide a validation set with `-p' when
you use this option.
- `ffm-predict'
usage: ffm-predict test_file model_file output_file
|
Alternatively it can be used by calling the command line via Python at
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
import os
import subprocess
os.getcwd()
os.chdir(r'E:\Download\libffm-121\windows')
os.getcwd()
os.system("start ffm-train.exe")
os.startfile("ffm-train.exe")
os.system("start ffm-predict.exe")
os.startfile("ffm-predict.exe")
#使用缺省参数训练模型
cmd = 'ffm-train bigdata.tr.txt model'
subprocess.call(cmd, shell=True)
#使用bigdata.te.txt作为validation数据
cmd = 'ffm-train -p bigdata.te.txt bigdata.tr.txt model'
subprocess.call(cmd, shell=True)
#使用5折交叉验证
cmd = 'ffm-train -v 5 bigdata.tr.txt'
subprocess.call(cmd, shell=True)
#用–quiet参数训练时不打印训练信息
cmd = 'ffm-train –quiet bigdata.tr.txt'
subprocess.call(cmd, shell=True)
#预测
cmd = 'ffm-predict bigdata.te.txt model output.txt'
subprocess.call(cmd, shell=True)
#基于磁盘的训练
cmd = 'ffm-train –no-rand –on-disk bigdata.tr.txt'
subprocess.call(cmd, shell=True)
#使用–auto-stop参数,当达到最优的validation损失时停止训练
cmd = 'ffm-train -p bigdata.te.txt -t 100 bigdata.tr.txt'
subprocess.call(cmd, shell=True)
|
The address of the training file used for the sample code is
https://github.com/keyunluo/python-ffm/tree/master/example/libffm-format
As the above call is very troublesome, I found a separate open source project to further encapsulate it, the encapsulated code is
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
|
from __future__ import print_function, absolute_import
import os, sys, subprocess, shlex, tempfile, time, sklearn.base, math
import numpy as np
import pandas as pd
from pandas_extensions import *
from ExeEstimator import *
class LibFFMClassifier(ExeEstimator, sklearn.base.ClassifierMixin):
'''
options:
-l <lambda>: set regularization parameter (default 0)
-k <factor>: set number of latent factors (default 4)
-t <iteration>: set number of iterations (default 15)
-r <eta>: set learning rate (default 0.1)
-s <nr_threads>: set number of threads (default 1)
-p <path>: set path to the validation set
--quiet: quiet model (no output)
--norm: do instance-wise normalization
--no-rand: disable random update
`--norm' helps you to do instance-wise normalization. When it is enabled,
you can simply assign `1' to `value' in the data.
'''
def __init__(self, columns, lambda_v=0, factor=4, iteration=15, eta=0.1,
nr_threads=1, quiet=False, normalize=None, no_rand=None):
ExeEstimator.__init__(self)
self.columns = columns.tolist() if hasattr(columns, 'tolist') else columns
self.lambda_v = lambda_v
self.factor = factor
self.iteration = iteration
self.eta = eta
self.nr_threads = nr_threads
self.quiet = quiet
self.normalize = normalize
self.no_rand = no_rand
def fit(self, X, y=None):
if type(X) is str: train_file = X
else:
if not hasattr(X, 'values'): X = pd.DataFrame(X, columns=self.columns)
train_file = self.save_reusable('_libffm_train', 'to_libffm', X, y)
# self._model_file = self.save_tmp_file(X, '_libffm_model', True)
self._model_file = self.tmpfile('_libffm_model')
command = 'utils/lib/ffm-train.exe' + ' -l ' + repr(v) + \
' -k ' + repr(r) + ' -t ' + repr(n) + ' -r ' + repr(a) + \
' -s ' + repr(s)
if self.quiet: command += ' --quiet'
if self.normalize: command += ' --norm'
if self.no_rand: command += ' --no-rand'
command += ' ' + train_file
command += ' ' + self._model_file
running_process = self.make_subprocess(command)
self.close_process(running_process)
return self
def predict(self, X):
if type(X) is str: test_file = X
else:
if not hasattr(X, 'values'): X = pd.DataFrame(X, columns=self.columns)
test_file = self.save_reusable('_libffm_test', 'to_libffm', X)
output_file = self.tmpfile('_libffm_predictions')
command = 'utils/lib/ffm-predict.exe ' + test_file + ' ' + self._model_file + ' ' + output_file
running_process = self.make_subprocess(command)
self.close_process(running_process)
preds = list(self.read_predictions(output_file))
return preds
def predict_proba(self, X):
predictions = np.asarray(map(lambda p: 1 / (1 + math.exp(-p)), self.predict(X)))
return np.vstack([1 - predictions, predictions]).T
|
In summary, it is very difficult to use libffm in a Windows environment, either for compiling or calling, and it is recommended to use it in a Linux environment if the environment permits.
Installation of libffm in Linux+Anaconda environment
The installation of the libffm-python package in Anaconda on Linux also has problems. The specific error reported is as follows.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
➜ libffm-python git:(master) python setup.py install
/home/qw/anaconda3/lib/python3.7/site-packages/setuptools/dist.py:481: UserWarning: The version specified ('7e8621d') is an invalid version, this may not work as expected with newer versions of setuptools, pip, and PyPI. Please see PEP 440 for more details.
"details." % self.metadata.version
running install
running bdist_egg
running egg_info
creating ffm.egg-info
writing ffm.egg-info/PKG-INFO
writing dependency_links to ffm.egg-info/dependency_links.txt
writing requirements to ffm.egg-info/requires.txt
writing top-level names to ffm.egg-info/top_level.txt
writing manifest file 'ffm.egg-info/SOURCES.txt'
reading manifest file 'ffm.egg-info/SOURCES.txt'
writing manifest file 'ffm.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/ffm
copying ffm/__init__.py -> build/lib.linux-x86_64-3.7/ffm
copying ffm/ffm.py -> build/lib.linux-x86_64-3.7/ffm
running build_ext
building 'ffm.libffm' extension
creating build/temp.linux-x86_64-3.7
gcc -pthread -B /home/qw/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I. -I/home/qw/anaconda3/include/python3.7m -c ffm.cpp -o build/temp.linux-x86_64-3.7/ffm.o -Wall -O3 -std=c++0x -march=native -DUSESSE -DUSEOMP
cc1plus: 警告:command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
ffm.cpp:578: 警告:忽略 #pragma omp parallel [-Wunknown-pragmas]
578 | #pragma omp parallel for schedule(static) reduction(+: loss)
|
ffm.cpp:726: 警告:忽略 #pragma omp parallel [-Wunknown-pragmas]
726 | #pragma omp parallel for schedule(static) reduction(+: loss)
|
gcc -pthread -B /home/qw/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I. -I/home/qw/anaconda3/include/python3.7m -c timer.cpp -o build/temp.linux-x86_64-3.7/timer.o -Wall -O3 -std=c++0x -march=native -DUSESSE -DUSEOMP
cc1plus: 警告:command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
g++ -pthread -shared -B /home/qw/anaconda3/compiler_compat -L/home/qw/anaconda3/lib -Wl,-rpath=/home/qw/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/ffm.o build/temp.linux-x86_64-3.7/timer.o -o build/lib.linux-x86_64-3.7/ffm/libffm.cpython-37m-x86_64-linux-gnu.so -fopenmp
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
/home/qw/anaconda3/compiler_compat/ld: build/temp.linux-x86_64-3.7/ffm.o: unable to initialize decompress status for section .debug_info
build/temp.linux-x86_64-3.7/ffm.o: file not recognized: file format not recognized
collect2: 错误:ld 返回 1
error: command 'g++' failed with exit status 1
|
At first I thought there was a problem with the libffm code, so I replaced it with the latest version online and found that it still reported errors. So I checked the code again and found that the code was fine and could be compiled normally in a non-Anaconda environment. Anaconda comes with a connector ld which is stored in ~/anaconda3/compiler_compat directory, the solution is very simple, just change the name of the ld in ~/anaconda3/compiler_compat directory and install it again. The solution is very simple.